Impact of different conditions on accuracy of five rules for principal components retention

Polemics about criteria for nontrivial principal components are still present in the literature. Finding of a lot of papers, is that the most frequently used Guttman Kaiser’s criterion has very poor performance. In the last three years some new criteria were proposed. In this Monte Carlo experiment we aimed to investigate the impact that sample size, number of analyzed variables, number of supposed factors and proportion of error variance have on the accuracy of analyzed criteria for principal components retention. We compared the following criteria: Bartlett’s χ2 test, Horn’s Parallel Analysis, Guttman-Kaiser’s eigenvalue over one, Velicer’s MAP and CHull originally proposed by Ceulemans & Kiers. Factors were systematically combined resulting in 690 different combinations. A total of 138,000 simulations were performed. Novelty in this research is systematic variation of the error variance. Performed simulations showed that, in favorable research conditions, all analyzed criteria work properly. Bartlett’s and Horns criterion expressed the robustness in most of analyzed situations. Velicer’s MAP had the best accuracy in situations with small number of subjects and high number of variables. Results confirm earlier findings of Guttman-Kaiser’s criterion having the worse performance.

Exploratory factor analysis (EFA) is de facto psychological method, not just because of its origin, but because it is among the most popular methods in psychology. The idea of identification of the structures underlying measured variables is very close to everyday psychological problems in which phenomena of interest cannot be measured directly, but have to be derived from the direct measures of behavior. Principal components analysis (PCA), in a broader sense one of EFA techniques for factor extraction, is the mostly used one. Reviews of its usage in psychological journals (Conway & Huffcutt, 2003;Fabrigar, Wegener, MacCallum, & Strahan, 1999;Ford, MacCallum, & Tait, 1986) show that the popularity of EFA and PCA, in particular, still holds. After the misconceptions that exploratory is subordinated to confirmatory analysis have been rejected (for example Tukey, 1980;Velicer & Jackson, 1990), the main critique is formed around insufficient preciseness and objectiveness as results Corresponding author: azoric@gmail.com are highly influenced by the researcher's choices. The one of the key questions in EFA is how many factors/components are needed in order to successfully represent the data in the space with smaller number of dimensions. Some authors call these rules determining the number of components stopping rules. The process for factor extraction could be seen as iterative procedure of components extraction, as originally proposed by Hotelling (see in Mulaik, 1971), which is stopped when the optimal number of factors is extracted. As there are many of these stopping rules it is usually up to researcher to select the one which result is closest to researcher's theoretical model and expectations based on it, and as well as interpretability of that factor solution. Determination of correct i.e. appropriate number of factors has strong theoretical implications in psychology, especially in the fields of personality and cognitive abilities.
In the classical factor analysis true scores are postulated as uncorrelated to unique scores. Moreover, the unique scores are not inter-correlated among themselves. In other words, true scores are responsible for correlations between variables, while unique scores are only responsible for the explanation of variance of variables which remained after the partialization of the true scores. These classical postulates do not lead to unique solution of scores, implicating that true and unique scores can be only estimated.
In order to make the model solvable in the terms of linear algebra Guttman in his image theory relaxed the classical postulates, allowing: a) nonzero correlations among true scores of one variable (image scores) and unique scores (anti image) of the other measured variables, and b) non-zero correlations among unique scores of different variables (Guttman, 1953). By this relaxation the model leads to solvable unique solution of two parts of each variable. The less desirable consequence of the model was that all random error variables correlate in finite samples.

METHODS FOR DETECTION OF NONTRIVIAL COMPONENTS
In more than hundred years long history of PCA used as EFA technique, a long list of different stopping rules was suggested. Only the most popular rules will be presented here in chronological order.

Bartlett's χ 2 test
One of the first presented stopping rules, from 1950, was the statistical test developed by Bartlett (Bartlett, 1950). The idea was to detect when the variance of remaining components do not differ statistically. After all true components are extracted, only error components remain. These error components have variances that only fluctuate by chance. To detect this situation, Bartlett basically computed the ratio of geometric and arithmetic mean of variances i.e. eigenvalues of the remaining components. If all eigenvalues are the same, these means are equal and the ratio has value that is close to one, while its logarithm is close to zero (Horn & Engstrom, 1979). If at least one 1 eigenvalue would be substantially greater, that would make greater impact on the arithmetic mean than on the geometric mean, and value of logarithm will jump up. Multiplication of this logarithm with a constant defined by the number of observations, variables and the number of already significant components would result with χ 2 distributed statistic.
By using notation presented in Mulaik (1971), and if we designate with m the number of directly measured variables, with n the sample size, with λ i the eigenvalues obtained from correlation matrix of m measured variables, sorted in non-ascending order and with p (p = 0, ..., m) the number of components already declared as significantly different from the others then: , and Resulting χ 2 statistic will have (m -p -1) (m -p + 2)/2 degrees of freedom. In the first iteration (p = 0) tested hypothesis is in fact that all population eigenvalues are equal. After rejecting this hypothesis one would then, in the next iteration, test the remaining m -1 eigenvalues. The first test with nonsignificant outcome would mean that all of the remaining eigenvalues are just error variances, and that the optimal number of components has been detected. According to some authors, Bartlett's χ 2 test was often found to overestimate the number of nontrivial components, especially with conventional significance levels at either 0.05 or 0.01 (Gorsuch, 1973;Horn & Engstrom, 1979;Hubbard & Allen, 1987). Beside, Gorsuch (1973) noticed that this trend increases with the sample size, making the smaller differences become significant in cases of larger samples. It is important to notice that this test is possible to detect structures with no factors at all, when the test is performed for p = 0.

Guttman-Kaiser's rule
Describing the conditions necessary for common-factor analysis, Guttman noted that after the removal of unique variances, in a way that resulting matrix is still Gramian, its minimal rank must be the number of eigenvalues greater than one. At the same time, this number is also the minimum number of common factors which should be postulated as truly existing ones (Guttman, 1954). Later, Kaiser noted that in order to have a positive reliability a component must have an eigenvalue greater than one. Another interpretation of this criterion could be that there is no much sense to declare something as component if it carries less information then the original (standardized) variable. And this conclusion is something that almost everyone will agree upon, but the other way aroundproclaiming all components with eigenvalues greater than one to be important, is questionable. It is well known that this rule overestimates the number of components (see for example Lorenzo-Seva, Timmerman, & Kieres, 2011), which is probably a consequence of treating the correlation matrix obtained on the sample as a population parameter and not as its estimate. Nevertheless this rule is maybe the mostly used one (Fabrigar et al., 1999;Conway & Huffcutt, 2003).

Horn's Parallel Analysis Test
This test is based on the comparison of eigenvalues obtained from the analyzed matrix and eigenvalues from randomly generated data. All the eigenvalues that are greater from corresponding eigenvalues obtained on random data are designated as important and retained (Horn, 1965). In order to apply this criterion eigenvalues, from the correlation matrix obtained from random data matrix with the same size as the original data matrix, should be calculated. In original paper Horn is simulating just one random matrix, but trough the time method evolved into calculation of larger set of random matrices and complementary eigenvalues. From these eigenvalue distributions for each k-th eigenvalue, reference value is calculated for comparison with the tested eigenvalue.
Finding of Zwick & Velicer (1986) that PA overestimates the number of components actualized the question of choice of the critical value obtained on random data.
In order to make criteria more conservative Buja & Eyuboglu (1992) suggested increase of the threshold to 95 th percentile. Press-Neto et al. (2005) found that in some cases originally proposed limit has better performances, but popularity of 95 th percentile continued (Lorenzo-Seva et al., 2011). On the end some findings (Raîche, Walls, Magis, Riopel, & Blais, 2013) suggested that it is not the increase but the decrease of original threshold, to 5 th percentile, that is improving performance of this criterion.

Cattell's Scree Test
This test is based on the graph i.e. scree plot, of eigenvalues on the ordinate and their ordinal values on the abscissa (Cattell, 1966). The idea is that the last eigenvalues, that will be discarded, are just fluctuation of error variance, and therefore make the linear trend. This trend in a similar fashion to Horn's algorithm can be used to detect the first eigenvalue that is above this threshold line. The major critique of this test is on its subjectivity, taking into the account that elbow where curve diverges from the linear trend is not something that is always easily detectable (Horn & Engstrom, 1979). Horn and Engstrom (1979) noted the resemblance of Bartlett's testing, that all remaining eigenvalues are the same, and Cattell's search of the point that diverges from the linear trend formed from error eigenvalues, concluding that these two test are based on the similar idea. In the same time this is also underlying idea of Horn's criteria. It is just that Horn is making the reference values from random data (which do make linear trend also), or by modeling the data just with the error variance, while Cattell is constructing this trend from the error variances that can be found in the data itself.

Velicer's Minimum Average Partial Test
Inter-correlation matrix could be decomposed like, where λ i is eigenvalue and x i is corresponding eigenvector of intercorrelation matrix R. In that case estimation of this matrix R p based on the last m -p, (p = 0, ..., m) eigenvalues and eigenvectors is where and In other words R p is matrix of partial correlations among variables when the effect of the first p components is partialized. In this notation Velicer's criteria can be formulated as the number p for which average of squared off diagonal elements of matrix R p is minimal (Velicer, 1976). Rationale for this is that after all true components are partialized, inter-correlations between variables would be explained, and resulting matrix of partial correlations, which will in that case represent just inter-correlations of error variables, will be in statistical sense the identity matrix. Continuing to partialize remaining, error components, would only increase partial correlations, as these off diagonal values are now the sum of smaller number of error/random components. As Bartlett's test, this test is also capable to detect structures with no important factors.

CHull method
In the essence, this method comes like numerical operationalization of the Cattell's scree test. Originally it was suggested for the detection of the optimal number of components in the three way data matrices (Ceulemans & Kiers, 2006), but lately it has been popularized for the usage in common-factor analysis (Lorenzo-Seva et al., 2011) and in principal components analysis (Wilderjans, Ceulemans, & Meers, 2013). Basic idea is the same as in Cattell's scree, to identify a point on the graph where the curve makes the elbow.
The graph in general represents the relation of the number of free parameters (fp i ) in evaluated model and some measure of the goodness of fit (f i ) for that model. In that case the algorithm could be described trough following steps: -sort the points by the free parameters value (fp i ) -exclude all (fp i , f i ) points where fp j <fp i and in the same time f j> f i, i.e. where goodness of fit measure (f i ) is not in the same order (ascending/descending) as the number of free parameters (fp i ) -check all triplets of adjacent points and exclude all middle points that are located on or below the line that is connecting its neighboring points i.e. if where the function reaches the maximum. At that point the elbow of the curve is located and the optimal model is detected.
The problem with this criterion emerges in the cases of the first and the last eigenvalue, with former naturally being of much greater importance. As the function is not defined on endpoints, it makes impossible that these points are going to be designated as optimal. In common-factor analysis the null model can be used for this purpose (Lorenzo-Seva et al., 2011), as it is simpler it allows evaluation of model with one common factor. In case of PCA as Wilderjans et al. (2013) suggested that the fit function could be the proportion of explained variance by extracted components i.e. the cumulative sum of eigenvalues. By this application of CHull, the formulation of the criterion can be simplified to the search for the maximum of ratios of succeeding eigenvalues, from second eigenvalue onward.
In order to make solution with one component possible the same authors proposed that the null value should be defined as simple zero point, representing the case when none of components is extracted causing that none of the total variance is explained. Authors also made the notion that this is not the ideal solution as the addition of this "virtual" zero point overestimates the function for the first component. Rationale for this could be that the error variance has smaller impact on the size of the first eigenvalue then on the second. The same logic should hold for all succeeding pairs of eigenvalues, which would on the end result with tendency of this criterion to underestimate the number of true components.
We would like to note that this criterion, in this formulation, is similar to Rnd-Ratio criterion (Peres-Neto et al., 2005), just that in their version the ratio was bootstrapped and its significance was estimated.

AIM OF THE STUDY
Aim of this paper is to explore the impact of different research conditions on accuracy of five rules for component retention. The criteria that have been compared are: Guttman-Kaiser's rule of eigenvalue over one, Bartlett's test, Horn's parallel analysis, Velicer's MAP and CHull method.
* Even that usually convex function is defined as the function where for any t ∈ [0, 1] condition f(t x1 + (1-t)x2) ≤ t f(x1) + (1-t) f(x2) holds true, convexity here is defined like in cited paper The goal was to compare accuracy of those five rules for different number of postulated components, different sample sizes and number of variables, and also different proportions of error variance.

SIMULATION
In order to evaluate different stopping rules simulated data were generated with predefined structures. First, structure matrix is defined for the given number of important components (k) and the given number of variables (m) only allowing the structures where each factor has high loadings with at least three variables. The actual number of high loadings of variables per factor was defined by random integer from 3 to m/k, with the exception for the last factor where high loadings were designated for all remaining variables. For example, if the number of variables was 12, and the specified number of factors 3, the distribution of variables among factors could be 4,4,4 as well as 3,3,6.
All coefficients in structure matrix (F) were filled with random numbers uniformly distributed from 0 to 0.2, and after that, on all coefficients for all variables with high loadings, uniformly distributed random number from 0 to 0.7 was added.
After this the matrix of normally distributed random numbers with k columns and n rows (Z r ) that represented the matrix of factor scores, was multiplied with the structure matrix in order to obtain the matrix of true scores of variables. Z t = Z r F t These true scores were standardized to have the variances of (1-e) where e is the specified proportion of the error in the data. After this the matrix of error scores (Z e ) was summed with this true scores matrix in order to obtain the data matrix (Z) used for evaluation of different criteria. This error n by m matrix (Z e ) was also filled with random numbers from normal distribution and columns scores were standardized to have the variance of e. Z = Z t + Z e This method for error integration into data is not unknown (Josse & Husson, 2012), but we have to note that it is not popular as the Tucker's method with mayor and minor components and unique error variances (Tucker, Koopman, & Linn, 1969).
After construction of the data matrix the correlation matrix and its eigenvalues were calculated and selected stopping criteria were applied.
The simulation was done in R software version 2.15.2 (R Core Team, 2013) and the source code of the procedure can be found on www.kal.rs/simulation.
The choice of the number of variables, sample size and percent of error variance was based on a few meta-analytic studies of EFA practices in psychology and social sciences (Fabrigar et al., 1999;Conway & Huffcutt, 2003;Cangelosi & Goriely, 2007;Costello & Osborne, 2005).
Tested number of postulated factors (k) was limited to values of 1, 2, 3, 5, 8 and 10, and from these values and the rule of at least three variables with high loading per factor, number of variables (m) was derived to be: 9, 15, 22, 35 and 40. Following the same rule, some numbers of components became inapplicable for the some number of variables. For example, for numbers of 35 and 40 variables all combinations of number of components were tested, but for 15 variables only situations with 1, 2, 3 and 5 components were applicable. This resulted with total of 23 different combinations.
Sample size (n) was limited to values of 50, 100, 200, 300 and 600, where 50 and 100 were representing fairly small sample sizes and 600 the large ones. These small sample sizes, especially in situations with large number of variables 35 and 40 are not conditions that are commonly advisable for application of factor analysis, but in clinical psychology, for example, as well as in other scientific fields, these sample sizes are not uncommon, so these situations were as well included in simulation.
Inclusion of error variance in the model makes the model of the data much more realistic, as some amount of error variance is inevitably part of the measurement. In similar Monte Carlo studies, the amount of the error was almost never directly systematically controlled as a factor that is producing the impact on the size of eigenvalues. Underlining premise based on classical measurement model was that correlation between error and true scores should be zero, as well as correlations between errors of different variables. Guttmann (1953) demonstrated that the second hypothesis is true only if measurement was done on the universe of variables, but because of proposed method for decomposition of true and error variances, he was forced to keep the first premise. We can easily agree that correlation between true and error scores should be zero on infinite sample, but sample sizes in psychological researches can hardly be approximated with this model. Cumulatively, those statistically insignificant correlations could build up the size of eigenvalues. After all, on this assumption Horn's parallel criterion is based. By the suggested model for integration of error variance in the data that was applied in this paper, correlations among all true and error scores as well as inter-correlations among error scores were allowed.
In earlier papers (Jackson, 1993;Peres-Neto et al., 2005;Zwick & Velicer, 1986;Zwick & Velicer, 1982) error factors were introduced by minor loadings or by addition of unique variance on diagonal of reproduced correlation matrix in order to achieve standardized variance. Much more realistic approach was suggested by Tucker et al. (1969). Their model consist of three parts: commonfactors (major factors), minor factors and unique factors, where minor and unique factors represent the error in the model. But that model didn't allowed correlations between major (true) components and error components. The model suggested by Hong (1999) resolved that problem which existed in all preceding simulations (Jackson, 1993;Peres-Neto et al., 2005;Zwick & Velicer, 1986;Zwick & Velicer, 1982) allowing the modeling of these inter-correlations. Hong's algorithm as not being based on score matrices (Z r , Z t and Z e ), is insensitive on the sample size.
So for 23 different plausible combinations of number of variables to number of components, all five sample sizes and all six error levels were tested, which resulted in 690 different combinations. For each combination process of data generation was repeated 200 times. On the end 138,000 different data matrices were analyzed.
Bartlett's χ 2 test was applied with two standard significance levels 0.05 (BAR5) and 0.01 (BAR1). In addition test was applied in two similar versions. The first, which is something like the standard interpretation of the test, stops the extraction after the first non-significant component is detected.
The second version aims to find the last significant component, not taking into account if there were one or more non significant outcomes before. These two versions produced almost the same results, even the later was something better performing, and only its results are presented in the paper.
Standard Guttman-Kaiser rule (GK) of eigenvalue larger than one was the third option tested.
In the case of the parallel analysis comparison of different critical values was performed. The tested values were 50 th (PA50) and 95 th (PA95) percentiles of eigenvalues obtained from random data. The 5 th percentile (Raîche et al, 2013) was also tested, but as its performance, in overall comparison, was worse than of PA50 its results are not presented.
Minimum average partial correlation (MAP) test was also included in this comparison.
On the end two versions of convex hull criterion were applied, one version (CHull.CFI) on a problem of number of factors in common-factor analysis where the value of comparative fit index (Bentler, 1990) of a model is plotted against its degrees of freedom (Lorenzo-Seva et al., 2011), and the other suggested for PCA (Wilderjans et al., 2013) as a plot of cumulative values of eigenvalues and their ordinal number in situation with (CHull.PCA0) and without (CHull. PCA1) inclusion of zero point. On both methods that were based on PCA upper limit of possible solutions was limited to m/2 i.e. half of the number of variables analyzed.
Classical eigenvalue decomposition was performed in all cases of all criteria, except for CHull based on CFI where ML common-factor model was used. In this case appropriate procedure from R package "psych" was applied (Rawelle, 2013).

RESULTS AND DISCUSSION
As we can see from Table 1 and 2 and Figure 1 the obtained results are mostly in line with the findings in literature. As the overall comparison is influenced by the selected sets of parameters, and as there is difference among criteria under different parameter values, the straightforward estimation of criteria performances is not possible. Introduction of systematic variation of amount of error variance also provided useful information. The first conclusion is that all criteria, except HULL PCA, work properly in favorable circumstances like high number of subjects per variable and low level of the error in a system. Inaccuracy of all criteria starts to be evident in situations with small n by m ratio, high percent of error variance and high number of supposed factors. But some differences between them also exist.
Results of the accuracy of the Bartlett's χ 2 test are not in concordance with authors who have found that this test overestimates the number of factors especially on large samples (Gorsuch, 1973;Horn & Engstrom, 1979;Hubbard & Allen, 1987;Henson & Roberts, 2006;Raîche et al., 2013). Velicer, Eaton, & Fava (2000) even do not recommend its usage. Our results are much more in line with Ferr é, (1995) who have found that accuracy of Bartlett test increase with sample size.
Results in Table 1 suggest that percent of correct detections of number of components for this criterion is the same as the one obtained by using PA. In the same time, in situations with wrong estimations, average error for this criterion is smaller than average error acquired by PA. Our results imply that this method should be considered as the method of choice except in stations with extremely small samples (see Figure 1). This disagreement between our results and findings of the majority of other authors could be the consequence of the fact that mentioned authors did not used the original Barttlett's formula (Barttlett, 1950) but instead some of its modifications (see Peres-Neto et al., 2005).
A lot of authors reported that Guttman-Kaiser's criterion regularly overestimates the number of components (see for example Lorenzo-Seva at al., 2011;Costello & Osborne, 2005;Josse & Husson, 2012;Raîche et al., 2013;Wilderjans et al., 2013). Despite this it is the mostly used rule in researches in social sciences. The reason for this probably lies in the fact that this criterion is the default criterion in most statistical package s. In our simulation this was the only criterion that was overestimating, while all the others were underestimating the number of postulated factors. But the evaluation of this criterion is not that straightforward as it is accurate as much as other criteria in situations when n by m ratio is over 10 and when error variance is not above 50 percent. In other, non favorable, situations this criterion is the most sensitive one and very quickly loses its accuracy. When EFA is performed on test items, that often have very low reliability, sometimes even lower than .3, the GK criterion should be avoided.
Horn's Parallel Test was in many studies considered as the most accurate (Franklin et al., 1995;Hayton, Allen, & Scarpello, 2004: Zwick & Velicer, 1986Ledesma & Valero-Mora, 2007;Velicer et al., 2000). That was a good argument for its recommendation despite pretty inconvenient procedure for its application. In order to overcome alleged tendency of this criterion to overestimate number of components some authors (Zwick & Velicer, 1986;Buja & Eyuboglu, 1992;Lorenzo-Seva et al., 2011) suggested increase of the cutoff value to 95 th percentile of eigenvalue distribution. Other authors (Press-Neto et al., 2005) have found that usage of average eigenvalues obtained on random matrices could be more appropriate than usage of 95 th percentile. Our findings are in line with this recommendation, as in our simulations in overall comparison median value outperformed the 95 th percentile.
There are some new papers that suggest lowering of threshold value to 5 th percentile (Raîche et al, 2013). In our study this low threshold level was performing better only in the situations with large sample sizes and low error levels, which can be also described as situations which make high correlations more possible. This finding is in line with Press-Neto et al. (2005). In overall comparison this threshold value under performs PA50. Our findings suggest that PA50 criterion is the most accurate one when the number of subject is small, and the percent of error variance is reasonably high but the number of variables is rather small.
In situations with small sample size, large error variance and large number of variables the best criterion is Velicer's MAP. Practically logic of MAP doesn't include sample size but only number of variables and number of factors, so this result should not be unexpected. Estimation of average partial correlation, as a parameter that should be minimized, is much more stable if a system has more variables. So in cases of almost square data matrices the MAP should be the preferred criteria.
CHull criterion, except in situations with the only one component, in overall comparison outperforms the other criteria. This criterion in the case of PCA can be simplified as location of the maximum of ratios between all pairs of consecutive eigenvalues (Wilderjans et al., 2013). And in that case at least theoretically it can correctly detect existence of only one component but in reality it is not the case. Inclusion of zero point underestimates the number of components making a strong affinity toward solution with only one important component. This could explain why on the first glance it looks like much worse than other criteria. In situations with more than one component, standard variant of CHull criterion based on cumulative eigenvalues (without zero point) is one of the most accurate. CHull based on CFI suggested by Lorenzo-Seva et al. (2011) wasn't as accurate as authors claimed in their paper, but it should be noted that this version of CHull does not have a problem with detection of solutions with just one factor.

CONCLUSIONS
Our results partially confirm the results of earlier studies. Accuracy of all criteria decreases with decrease of sample size, and with increase of: number of variables, number of supposed factors and proportion of error variance. There is no unambiguous answer which of analyzed criteria has the best performance. In favorable research conditions all criteria have good accuracy. We can recommend usage of Bartlett's χ 2 test and Horn Parallel Analysis test working properly in all conditions except in situations with small number of subjects and relatively high number of variables. In these situations, with small sample size and relatively high number of variables, we can recommend Velicer's MAP criterion. Mostly used criterion Guttman-Kaiser's is the most sensitive on increase of number of variables and increase of proportion of error variance.
Those practitioners that would like to apply recommended criteria (PA, MAP) can find the macros in SPSS and SAS in the article of O'Connor (2000). By our knowledge this paper is the first one that systematically varied error variance. Besides, non-zero correlations between error scores of different variables, as well as non-zero correlations of error score and true scores were allowed. This fact makes our simulations much more realistic. Results strongly confirm our hypothesis that reliability of analyzed measures (proportion of error variance) has crucial role in determination of number of components. This effect could be neutralized only by increasing the sample size.
Main limitation of this paper emerges from the limited set of values that were varied trough simulations. This limitation is predominantly related to the number of variables, and components. So, we are not sure that trends observed in our analyses could be generalized on those situations with more than 100 or 200 variables, and situations with more than 10 factors. Second limitation is related to the shape of distribution. We simulated data only from normal distribution, but it is not rare in psychological researches that authors applied EFA without proving that the distributions are normal.
It should be noted that even that we used one criterion that is applicable for common-factor model, our findings are strongly related just to situations that PCA is used for factor extraction.