The Effects of Sample Size , Correlation Technique , and Factor Extraction Method on Reliability Coefficients

This study aims to compare reliability coefficients according to sample size (250, 500, 1,000, 2,500, 5,000, and 9,773), EFA factor extraction methods (PCA, PA, ULS, WLS, and MLE), CFA estimation methods (UL, ML, and GL), and correlation matrices (Pearson, phi, and tetrachoric). Therefore, it employs a basic research method. The study was conducted with real data, and the data were collected from students’ answers to a Turkish sub-test in the Test for Transition from Basic Education into Secondary Education administered in 2014. Within the scope of the study, McDonald ω, McDonald ωh, maximal reliability, Armor Ɵ, Heise and Bohrnstedt Ω, Revelle β, and standardized alpha coefficients were compared. Consequently, it was found that sample size in the same correlation matrices did not lead to serious changes. It was also found that McDonald ωh and Revelle β coefficients calculated with a tetrachoric correlation were bigger than 1 in some conditions. It was recommended in consequence that those coefficients should be calculated through phi correlations for congeric one-factor structures. Other findings obtained support the literature, and necessary suggestions are made. Anahtar Kelimeler güvenirlik McDonald ω maximal güvenirlik Armor Ɵ Heise ve Bohrnstedt Ω Revelle β standartlaştırılmış alfa


Introduction
Scientists working in the field of social sciences often measure individuals' properties that cannot be directly observed.Those properties can be the ability to play basketball, attitudes towards a course, interest in professions, or any other cognitive variables, as well as mathematical achievement.Those directly unobservable properties are called "constructs" or "latent traits."In measuring those directly unobservable traits, behaviors thought to be associated with them are measured.Total scores obtained by assigning 1 to correct answers and 0 to incorrect answers in a mathematics achievement test could be an example for measuring directly unobservable latent traits through observable behaviors.However, it is not adequate simply to observe individuals' behaviors.The behaviors should be associated with latent traits, and individuals' scores should be consistent.The consistency of individuals' scores is related to reliability, and the relevance between behaviors and latent variable is related to validity (Meyer, 2010).
Considering reliability within the framework of classical test theory (CTT) and representing the score an individual i can receive from an item j as V ij , then V ij is composed of two components: (1.1)Where t ij stands for an individual's real score, and e ij stands for the error score.It is assumed that the average error score for an item in a sample is 0 and that the error scores are not related to the real score (Green & Yang, 2009).Besides, it can be said that individuals' observed scores slightly deviate from the actual scores according to this equation (Gulliksen, 1950).As the amount of deviation in individuals' observed scores from actual scores increases, reliability decreases; but as the amount of deviation decreases, reliability increases.Accordingly, reliability is described as According to this, it may be said that reliability will increase as the observed score comes closer to the real scorethat is to say, as the error decreases.The sources of error can be examined and the necessary precautions can be taken so as to reduce errors.As the sources of error change, the definitions of reliability can also change.
Reliability is defined in various ways.To exemplify a few for instance, if the scores taken from a test have high correlations with the actual scores, then the test is said to be reliable (Allen & Yen, 1979).Besides, reliability can also be considered as obtaining the same results when a test is administered again in the same conditions (American Educational There are a great number of reliability coefficients that have been developed on the basis of CTT and of which ways of calculation are different.Whereas some of them make calculations on the basis of a variance-covariance matrix, others calculate the reliability coefficient on the basis of factor analysis.Armor's θ and Heise and Bohrnstedt's Ω coefficients to be analyzed in this study are calculated according to exploratory factor analysis (EFA).McDonald ω, McDonald ω h , and Maximal reliability can, on the other hand, be calculated according to confirmatory factor analysis (CFA).Revelle β and standardized alpha coefficients are also calculated on the basis of correlation matrices.
On reviewing the studies regarding reliability coefficients, no studies were found in the literature considering the reliability coefficients considered in this study altogether.Moreover, no studies were found analyzing the change of all these reliability coefficients with different EFA factor extraction methods and with CFA estimation methods.Therefore, it is believed that this study will contribute to the relevant literature.
This study is important in that it compares factor analysis methods and reliability values obtained through different reliability-determining techniques according to factor analysis results.Comparisons of the obtained reliability values that will demonstrate the similarities and differences in various conditions are also important for the experts of the field to plan and interpret prospective studies.In this way, individuals developing tests and using the results of tests will be able to decide appropriately on what technique is appropriate for use with which factor analysis method, with which sample size, and with which correlation technique.In line with the described purpose and importance of the study, answers are sought to the following questions: • How do reliability coefficients calculated with CFA methods change according to sample size, correlation matrices used, and factor extraction method?• How do reliability coefficients calculated with EFA methods change according to sample size, correlation matrices used, and factor extraction method?• How do reliability coefficients calculated with a correlation matrix change according to sample size and correlation matrices used?

Method
Since this study aiming to compare reliability coefficients is concerned with revealing the processes underlying a theory, it is a basic study.Researchers interested in basic studies are not particularly concerned with the results of educational applications.Instead, they develop the processes of a theory or analyze the processes (Fraenkel, Wallen, & Huyn, 2012;Gay, Mills, & Airasian, 2012).

Data
The data coming from a Turkish sub-test of the TEOG exam (test for transition from basic education to secondary education) given in the second semester of the 2013-2014 academic year constituted the data set of the research.A sample was taken from randomly chosen students answering the Turkish test to form the data set of 10,000 students.The response patterns that were repetitive or those showing that the students marked the same answer for all items were deleted from the data set, and thus a data set of 9,773 people was formed.The data set was then divided randomly into groups of 250, 500, 1,000, 2,000, and 5,000, ending up with the whole data set of 9,773.The data sets were analyzed in the form of item scores matrices scored between 1 and 0. A data set of 250 people, for instance, was formed once, and all coefficients were calculated by using this data set.

Analysis
In the current study, Armor θ, Heise and Bohrnstedt's Ω, McDonald ω, McDonald ω h , maximal reliability, Revelle's β, and standardized alpha coefficient were investigated.Because Armor's θ coefficient requires the parallel measurement assumption of Cronbach's alpha coefficient, it is calculated by using principal components factor analysis as an alternative.Thus, Armor's θ coefficient is stated as , 1974).Here, p represents the number of items, whereas 1 λ represents the biggest eigenvalue.
Heise and Bohrnstedt's Ω coefficient is obtained with EFA, as in the case of Armor's θ coefficient.Heise and Bohrnstedt's Ω coefficient is stated as (1.4) (Heise & Bohrnstedt, 1970).Here, h i 2 represents the communalities.Carmines and Zeller (1979) demonstrated that Heise and Bohrnstedt's Ω coefficient can also be represented as Here, a shows the number of items and b shows the sum of correlations between items.
McDonald's ω coefficient is obtained with CFA.If each unique variance is supposed to be composed of errors, then ω is equal to the reliability of the total score.McDonald's ω coefficient is stated as , 1999).Here, i λ represents the unstandardized factor loadings obtained with CFA, and i ϕ represents unique variance.
McDonald's ω h coefficient is calculated with g i λ loads of the i. item on the general factor of McDonald's ω coefficient (Zinbarg, Revelle, Yovel, & Li, 2005).Accordingly, ω h coefficient is stated as Here, g i λ represents the unstandardized factor load of i item to the factor.
Maximal reliability is stated as the reliability of a structure in congeneric measurements (Hancock & Mueller, 2001, 2013).Accordingly, maximal reliability is stated as Here, 2 i l represents the square of standardized factor loadings.
Revelle's β coefficient is generally defined with the averages of item covariance of unequal length test halves (Revelle, 1979).This represents the average covariance of the items available in the different halves (Zinbarg et al., 2005).Revelle's β coefficient is stated as Here, k represents the number of items, and ij σ represents the covariance average between items.
The standardized alpha coefficient is standardized before the observed scores of the components are summed.Standardized alpha is not the lower limit of real reliability.It may be lower or higher than real reliability (Osburn, 2000).The standardized alpha coefficient is stated as (1.10) (Carmines & Zeller, 1987;Osburn, 2000).Here, p stands for the correlation of the average between items, and n stands for the number of items.
Factor extraction methods were set as a condition for reliability coefficients calculated through EFA (Armor θ, Heise and Bohrnstedt's Ω) in this study.The methods of principal components analysis (PCA), principle axis (PA), unweighted least squares (ULS), weighted least squares (WLS), and maximum likelihood (MLE) were used for this purpose.
Estimation methods were set as a condition for reliability coefficients calculated through CFA (McDonald ω, Mc-Donald ω h , maximal reliability).For this purpose, reliability coefficients were calculated separately with the methods of ULS, ML, and generalized least squares (GL).
A Pearson correlation matrix was created as phi and tetrachoric for reliability coefficients, which were calculated with a covariance/correlation matrix, and calculations were made through them.A covariance matrix was also considered as a condition for reliability coefficients calculated through EFA and CFA, and reliability coefficients were calculated separately with Pearson, phi, and tetrachoric covariance matrices.
This study considers a sample size as six conditions (250, 500, 1,000, 2,500, 5,000, and 9,773), factor extraction for EFA as five conditions (PCA, PA, ULS, WLS, and MLE), estimation method for CFA as three conditions (UL, ML, and GL), and correlation/covariance matrix as three conditions (Pearson, phi, and tetrachoric).Accordingly, 6x3x5=90 conditions for each reliability coefficient calculated with EFA, 6x3x3=54 conditions for each reliability coefficient cal-culated with CFA, and 6x3=18 conditions for each reliability coefficient calculated with a correlation/covariance matrix were set.
The samples were chosen randomly from the whole data set (9,773 students) through R program (R Core Team, 2017) in forming the data sets in this study.Tetrachoric correlations were calculated through the psych package (Revelle, 2016) in the R program in doing the analyses by means of correlation matrices with the data set formed.The Polycor package (Fox, 2016) was used for the Pearson correlation matrix.Function was described for a phi correlation matrix, and thus a phi correlation matrix was formed.Having formed the correlation matrices, the psych package was used for EFA and the Lisrel program was used for CFA.Calculations were made with the help of the formulas given in equations 1-7 by using the Excel program included in the Office 2016 package program by using the results obtained from the programs.
The needed restrictions were made with CFA so as to find which CTT measurement models the data set fitted.Evaluations were made as to which of parallel, tau equivalent, essentially tau equivalent, and congeneric measurement models fitted better, and it was found that congeneric models fitted better.

Findings
Table 1 shows all of the coefficients calculated in consequence.Interpretations concerning the sub-problems are also included.

Analysis of Reliability Coefficients Calculated with CFA (McDonald ω, McDonald ω h , and Maximal Reliability)
Table 1 shows reliability coefficients calculated according to such variables as sample size, estimation method, and correlation/covariance matrix.A review of reliability coefficients shows that reliability coefficients can be bigger than 1 when a tetrachoric covariance/correlation matrix is used.Thus, it was observed that the ω h and β coefficients took on values bigger than 1 when calculated with a tetrachoric covariance matrix.Besides, there is no convergence for the GL method in the sample of 250 students with a tetrachoric covariance matrix.On taking the reliability coefficient, which is calculated with the whole data set as a reference in analyzing the change of reliability coefficients according to sample size, the differentiation in the reliability coefficients obtained from the other samples starts after the third decimal place.Accordingly, it may be said that there is not much difference in the condition that the sample size is at least 250.When reliability coefficients are calculated with a Pearson and phi correlation/covariance matrix, the result is smaller than a tetrachoric correlation/covariance.A tetrachoric covariance matrix is used and recommended when the condition of normal distribution is satisfied for binary data (Allen & Yen, 1979;Price, 2017;Skrondal & Rabe-Hesketh, 2004).The average skewness in the data set was -1, and average kurtosis was about 0. Accordingly, the data can be said to have normal distribution (Chou & Bentler, 1995;Curran, West, & Finch, 1996).Even though the data had normal distribution, the ω h and β coefficients calculated with a tetrachoric correlation/covariance were observed to be bigger than 1.
When calculated from the data sets of 250, 2,500, and 5,000 students, the McDonald ω coefficient was higher than the one calculated from the whole data set, and the ω coefficient calculated from the data sets of 500 and 1,000 students was lower than the one calculated with the whole data set.However, the biggest difference was 0.0095, whereas the smallest was 0.0006.Accordingly, it may be said that the sample size in this study did not have serious effects on reliability coefficients.On comparing according to CFA estimation methods, the ω coefficient was found to be the lowest with the ML method and the highest with the GL method.Yet, the differences were small according to estimation methods, as in the case of sample sizes.
On evaluating with covariance matrices and CFA estimation methods, the McDonald ω coefficient took on the smallest value of 0.8657 and the biggest value of 0.9568.While the smallest value was observed in the sample of 500 students with a phi covariance matrix and with the ML method of estimation, the biggest value was observed in the sample of 500 students with a tetrachoric variance matrix and with the GL method of estimation.
The effects of sample size on coefficients were also slight for the McDonald ω h coefficient.On comparing according to CFA methods of estimation, it was found that the lowest estimation was made with the ML method and the highest with the GL method.The differences were also small according to estimation methods.
On evaluating the McDonald ω h coefficient along with all sample sizes, covariance matrices, and CFA estimation methods, it was found to take on the smallest value of 0.8640 and the biggest value of 1.4784.It took on the smallest value in the sample of 500 students with a phi covariance matrix and the ML method of estimation, whereas it took on the biggest value in the sample of 500 students with a tetrachoric covariance matrix and the GL method of estimation.Coefficient ω h took on values bigger than 1 in all samples and methods when calculated with a tetrachoric covariance matrix.
A comparison of maximal reliability coefficients according to CFA estimation methods showed the lowest values with the UL method and the highest values with the GL method.However, the differences were small.The biggest difference, which was approximately 0.02, was observed in the sample of 250 students.
On evaluating the maximal reliability coefficient with all sample sizes, covariance matrices, and CFA methods of estimation, it was found to take on the smallest value of 0.8729 and the biggest value of 0.9655.Whereas it took on the smallest value in the sample of 500 students with a phi covariance matrix and the UL method, it took on the biggest value in the sample of 500 students with a tetrachoric covariance matrix and the GL method.

Analysis of Reliability Coefficients Calculated with EFA (Armor's θ and Heise and Bohrnstedt's Ω)
The examination of Armor's θ coefficient and Heise and Bohrnstedt's Ω coefficients shown in Table 1 makes it clear that the reliability coefficient will be higher when PCA is chosen as the method of factor extraction.In addition, the other methods of factor extraction remained the same in the fourth decimal place, but they began to differ after the fourth decimal place.
Coefficient θ differs slightly according to factor extraction methods in the sample of 250 students when analyzed with a tetrachoric correlation matrix, but it is still the highest with the PCA method.When analyzed with Pearson and phi correlation matrices, however, the PCA method yielded a θ coefficient higher than with the other methods.No big differences were found according to sample size.
On evaluating the θ coefficient along with all sample sizes, correlation matrices, and EFA factor extraction methods, it was found to take on the smallest value of 0.8416 and the biggest value of 0.9452.It took the smallest value in the sample of 500 students with a phi covariance matrix and the MLE method, whereas it took the biggest value in the sample of 250 students with a tetrachoric correlation matrix and the PCA method.The θ coefficient calculated with the MLE method is lower than the ones calculated with other methods.Yet, the difference is observed in the sixth decimal place.
When analyzed with a tetrachoric correlation matrix in a sample of 250 students, the Ω coefficient differs slightly according to factor extraction methods, but the highest values are observed with the PCA method.When analyzed with Pearson and phi correlation matrices, on the other hand, the PCA method yielded a Ω coefficient higher than with the other methods.No big differences were found according to sample size.
On evaluating the Ω coefficient along with all sample sizes, correlation matrices, and EFA factor extraction methods, it was found to take on the smallest value of 0.8663 and the biggest value of 0.9471.While it took on the smallest value in the sample of 500 students with a phi covariance matrix and the MLE method of factor extraction, it took on the biggest value in the sample of 250 students with a tetrachoric correlation matrix and the PCA method.The PCA, ULS, and WLS methods differ in the fifth decimal place.The Ω coefficient calculated with the MLE method was lower than with the other methods, but the difference was observed in the sixth decimal place.
On comparing the θ and Ω coefficients, it was found that the θ coefficient was lower than the Ω coefficient in all conditions.The PA factor extraction method yielded a lower θ coefficient than the other methods in analyses performed for Pearson, phi, and tetrachoric correlation matrices.It was found that the difference began in the fifth decimal place.

Analysis of Reliability Coefficients Calculated with a Correlation Matrix (Revelle β and Standardized Alpha)
The examination of Revelle β and standardized alpha coefficients shown in Table 1 makes it clear that the β coefficient calculated with a tetrachoric correlation matrix is bigger than 1.Standardized alpha, on the other hand, did not differ much according to sample size.The standardized alpha coefficient is bigger when calculated with a tetrachoric correlation matrix.
On evaluating the Revelle β coefficient along with all sample sizes, covariance matrices, and CFA methods of estimation, it was found to take on the smallest value of 0.8631 and the biggest value of 1.4709.Whereas it took on the smallest value in the sample of 500 students with a phi correlation matrix, it took on the biggest value when calculating with a tetrachoric correlation matrix from the whole data set.When the reliability coefficient was calculated with a tetrachoric correlation matrix, the result was bigger than 1.
On evaluating the standardized alpha coefficient along with all sample sizes, covariance matrices, and CFA methods of estimation, it was found to take on the smallest value of 0.8651 and the biggest value of 0.9435.While it took on the smallest value in the sample of 500 students with a phi correlation matrix, it took on the biggest value in the sample of 250 students with a tetrachoric correlation matrix.

Conclusion and Recommendations
On evaluating all of the reliability coefficients considered in this study, it was found that Armor's θ coefficient was lower than the other reliability coefficients in all conditions.The θ coefficient was found to be 0.8415 when estimating with the MLE method in all conditions, and it had the smallest value.The reliability coefficient found to be the highest (excluding those bigger than 1) was the maximal reliability coefficient.Maximal reliability calculated by making estimations with the GL method was found to have the value of 0.9655 at the maximum in all conditions.
While maximal reliability calculated with the CFA method was similar in terms of McDonald ω and McDonald ω h coefficients, the McDonald ω coefficient yielded values bigger than 1.Zinbarg et al. (2005) point out that the ω h coefficient and ω coefficient take on equal values for one-dimensional and unequal factor loadings.However, they take on values bigger than 1 when calculated with a tetrachoric correlation matrix.In a similar vein, Revelle's β coefficient also yielded values bigger than 1 when tetrachoric correlations were used.Zinbarg et al. (2005) also state that the β coefficient will be lower than the ω coefficient.Yet, Revelle's β coefficient has values bigger than 1 when calculated with a tetrachoric correlation matrix.
On assessing in terms of the same correlation matrix, it may be said that reliability coefficients do not differ much according to sample size.This is a finding consistent with the ones obtained by Ercan, Yazici, Sigirli, Ediz, & Kan (2007).It was stated that the θ coefficient calculated with alpha and PCA did not differ according to sample size.Yet, the smallest sample size in this study was 250.Therefore, it was considered that it did not differ according to sample size, since it satisfied the condition of sample size (Comrey, 1988;Gorsuch, 1974;Guadagnoli & Velicer, 1988;Streiner, 1994) for factor analysis.
It may be said that sample size does not differ considerably in evaluating reliability coefficients for the same correlation matrix.Therefore, it is thought that sample size is an ignorable variable for reliability coefficients in this study on the condition that a sample of at least 250 students is available.Because McDonald ω h and Revelle β coefficients were bigger than 1 for a tetrachoric correlation matrix and because there was overestimation, it may be said that calculating these coefficients with phi correlations in congeneric measurements, where variables are assumed to have real binary structure, would be more appropriate.It may also be said that Armor's θ and Heise and Bohrnstedt's Ω coefficients are higher than the other methods only in PCA, and therefore it may be recommended that the coefficients be calculated with the PCA factor extraction method, which tries to maximize the explained variance and thus raises the eigenvalue, since there are no differences in the other methods.The analyses were performed with one-dimensional binary data and with a real data set fitting a congeneric measurement model in this study.Research studies in which the factor structure and measurement model are modified through simulation studies could be recommended to researchers.The reliability coefficients considered in this study could be compared in such different conditions as smaller samples and small correlation averages.The research could also be repeated with ordinal and multi-category data types, and thus probable changes could be analyzed.