Comparison of Different Estimation Methods for Categorical and Ordinal Data in Confirmatory Factor Analysis

In confirmatory factor analysis (CFA), which is used quite often for scale development and adaptation studies, the selected estimation method, affects the results obtained from the data. Because of the selected estimation method, the model parameters and their standard errors, and the model data fit values may alter the results substantially. So that, the purpose of this research is to compare the performance of different estimation methods for CFA. Maximum likelihood (ML), unweighted least squares (ULS) and diagonally weighted least squares (DWLS) are used in this research as estimation methods. These methods are applied in data sets and regression coefficients and their standard errors, t values, fit indexes and iteration numbers obtained from these estimation methods are examined. As a result, ULS method can converge with the minimum number iterations and it seems to be the more accurate method for estimating the parameters.


INTRODUCTION
The social sciences and behavioral sciences rather focus on the latent variables that are not directly visible, and it is attempted to take decisions on latent variables through these variables.The Structural Equation Modelling (SEM) is widely used to identify the relationship between observable variables and latent variables (Jöreskog & Sörbom, 1996a;Sammel, Ryan & Legler, 1997).SEM is a set of statistical methods that allows describing the relationship between one or more continuous or 352 categorical independent variables and one or more continuous or categorical dependent variables (Tabachnick & Fidell, 2007, p. 676).In other words, SEM is a comprehensive statistical approach for testing hypothese based on the relationship between observable variables and latent variables (Hoyle, 1995).
The Confirmatory Factor Analysis (CFA), a customized version of SEM, is often used for studies developing and adapting scales.Brown (2006) states that CFA can be used (1) to psychometrically evaluate measurements, (2) to validate the structure, (3) to test the effect of the method, and (4) to indicate invariance of test measurements.To perform a CFA on a data set, first a theory must exist or there must be a predetermined factor structure.Then, a model is created based on this information and the model is tested through the observed data set (Raykoy & Marcoulides, 2000, p. 95).In brief, the CFA model aims to assess whether data set supports the assumed relationship between a group of measured variables.
In social sciences, data are often collected by scoring multiple-choice items in two categories and ordinal items with three or more categories, but not by an equal-interval scale (Bandalos, 2014).Since there is a relationship of a < b < c < d between ordinal variables in Likert scales and a relationship of a < b between data that is scored in two categories, it is referred to as ordered categorical data while there is no order between coding in non-ordered categorical data (Cai, 2008).Bollen (1989) emphasizes that ordered categorical data are treated as continuous data due to technical limitations on measuring tools.However, this approach to categorical data would result in biased estimation of parameters and unfavourable standard errors (Babakus, Ferguson & Joreskog, 1997;DiStefano, 2002;Rigdon & Ferguson, 1991).And it is important to match between the assumptions underlying the statistical model and the empirical characteristics of the data to be analyzed (Flora & Curran, 2004).Therefore, different estimation methods have been developed according to the data structure.

Estimation Methods
A right decision on what estimation method to use for statistical analyses has a direct influence on results derived from a study.The most common estimation method in SEM is the maximum likelihood (ML) method because it is selected in default in many software packages.This method is capable to make consistent and unbiased estimations on properly defined models, large sample sizes, normally distributed independent, continuous and multivariate data sets (Kline, 2005).It was deduced in the literature that the use of ML as an estimation method particularly for non-normally distributed data sets with a few number of answer categories resulted in bias in factor loadings, standard errors, statistics for chi-square test, and goodness of fit indexes (Babakus et al., 1987;Bollen, 1989;Green, Akey, Fleming, Hershberger, & Marquis, 1997;Hutchinson & Olmos, 1998).However, ML would not give significantly biased results if the number of categories of ordered data is high, the size of the sample is large, and the observed items are almost distributed normally (Mîndrilă, 2010).
For ordered data, it is assumed that there is a continuous variable such as under a variable measured sequentially, such as , in factor analysis, and this continuous variable ranging from -∞ to +∞ indicates characteristics underlying responses at order level (Forero, Maydeu-Olivares & Gallardo-Pujol, 2009;Jöreskog, 1990;Lee, Poon & Bentler, 1990;Muthén, 1984).Tetrachoric or polychoric correlations should be used in SEM when categorical and/or orded data are used to determine the underlying continuous latent variable (Muthén, 1984;Muthén & Kaplan, 1985).It is because a high level of bias occurs in the estimation of parameters, standard errors, and factor loads based on Pearson Product-Moment Correlation Coefficient (PCC) (DiStefano, 2002).So to solve this problem, Jöreskog and Sörbom (1996b) suggested that polychoric correlations as the most consistent and robust estimator.The bias of estimation of parameters, standard errors, and factor loads is reduced by using polychoric correlation instead of PCC matrix (Babakus et al., 1997;Rigdon & Ferguson, 1991).While tetrachoric correlation is used for data with two categories, polychoric correlation is used for data with more than two categories.Some of estimation methods developed for ordinal data are weighted least square (WLS), unweighted least square (ULS) and diagonally weighted least squares (DWLS).Babakus (1985) suggests using polychoric correlation for performing CFA on ordinal data instead of Pearson correlation.In all of these methods, asymptotic covariance matrix is used that is derived from polychoric correlation matrix estimated from the observed categorical variables (Katsikatsou, Moustaki, Yang-Wallentin, & Jöreskog, 2012).The weight matrix of ULS and DWLS is the diagonal form of asymptotic covariance matrix; and the weight matrix of WLS is the reverse of asymptotic covariance matrix.Only diagonal elements of asymptotic covariance matrix are used for DWLS (Yang-Wallentin, Jöreskog, & Luo, 2010).
WLS can be an alternative method for ordinal data in particular that is not distributed normally, highly skewed or kurtic, or both (Muthén, 1993).However, WLS estimation converges to asymptotic features very slowly, therefore its performance on a small sample size is not good (Katsikatsou et al., 2012).ULS, on the other hand, has some features such as lack of distributional assumption and ability to estimate all parameters at a time.But this method requires all observed variables to have a same level of scale (Kline, 2005;159).Recently, the use of DWLS estimation method has become popular for factor analysis of ordinal data.This popularity is attributed to ability of using DWLS as a method to determine measurement invariance in case of using continuous variables and to ability of DWLS to estimate variances smaller than ULS can do (Forero et al., 2009).Katsikatsou et al. (2012) point out that the DWLS and ULS estimation methods are more preferable than WLS and both of these methods display a similar performance in small sample sizes.Likewise, Yang-Wallentin et al. (2010) established in their simulation study that WLS performs poorly under symmetric and non-symmetric 2, 5 and 7 categories conditions compared to ULS, DWLS, and ML and ULS had a remarkable performance.Forero et al. (2009) compared ULS and DWLS methods in their simulation study.They found that ULS estimated parameters more accurately and displayed less variability as well as showing more accurate standard error values and better convergence.Mîndrilă (2010) compared estimation methods DWLS and ML.They found that ML estimated parameters more accurately in continuous and normally distributed data and DWLS estimated parameters more accurately in data sets not normally distributed.The literature has many studies that compared different estimation methods which have effects on results obtained (DiStefano, 2002;Forero et al, 2009;Hu, Bentler & Kano, 1992;Lei, 2009;Muthen & Kaplan, 1985;Rigdon & Ferguson, 1991;Yang-Wallentin et al., 2010).However, there are only limited number of studies performed on real data sets (Katsikatsou et al., 2012).So this study is focused on real data sets.
Taking a correct decision on what estimation method to use when performing a CFA as in all other statistical analyses influences the results obtained from the study because the estimation method used has an influence on model parameters and their standard error values and fit index.When performing a CFA, ML method is widely used which is utilized for continuous variables and assumes that observable variables have a multivariate normal distribution.However, most of variables used for social sciences and psychology are not continuous but categorical / ordinal (Yang-Wallentin et al., 2010), and it is not appropriate to use methods developed for continuous variables regardless of data structure for accuracy of results.

Purpose of the Study
The overall objective of this research was to compare performances of different estimation methods used for CFA.For this, the following research questions were addressed: In ordinal set-1 and set-2, and categorical set-3, 1.What are regression coefficients obtained from ML, ULS and DWLS estimation methods and their standard errors?

Data Collection Tools
Ordinal variables used for this study were obtained from first and fourth year students of primary school teaching in 2007-2008 academic year at seven different universities in seven regions of Turkey using Epistemological Belief Scale.The total number of participants was 548.This instrument was developed by Schommer (1990), adapted by Deryakulu and Büyüköztürk (2002) and revised by Deryakulu and Büyüköztürk (2005).The scale was a three-factor scale including "Belief that Learning Depends on Effort" (17 items), "Belief that Learning Depends on Skills" (8 items) and "Belief that there is only one correct" (9 items), and 34 items in total.The respond category was a five-point Likert scale.
Beck Hopelessness Scale was used for categorical data set.The data set obtained by Dinler-İçöz (2014) from 200 children for their master thesis was used for this study with their permission.Beck Hopelessness Scale contains 20 items.Questions 1,3,5,6,8,10,12,13,15 and 19 have 1 point each for the response "no" and the questions 2,4,7,9,11,14,16,17,18 and 20 have 1 point each for the response "yes".They are instructed to choose "correct" for statements they consider suitable, and "wrong" for statements they consider unsuitable.The score from the scale can vary from 0 to 20, and a high score indicates hopelessness whereas a low score indicates hope in children (Savaşır & Şahin, 1997).

Data Analysis
To compare sequential findings obtained from different sample sizes, two different data sets were created from a sample size of 548 persons as 250 (set-1) and 500 (set-2) randomly selected.Univariate and multivariate test of normality was performed on each data set.For this, first univariate normality assumption was tested for each item.West, Finch and Curran (1995) stated that if skewness value was greater than 2 and kurtosis value was greater than 7, they impaired univariate normality assumption of items.In the present study, the absolute value of skewness values for the data set of 250 persons ranged from 0.042 to 6.936, and kurtosis values ranged from 0.123 to 14.489.The skewness value of 26 items on the scale was greater than 2, the kurtosis value of 9 items was greater than 7, and all p values of chi-square values obtained had a significance level of 0.01 which was significant.For other data set, i.e., data set of 500 persons, the absolute value of skewness values ranged from 0.004 to 10.302 and kurtosis values ranged from 0.388 -24.470.The skewness value of 28 items on the scale was greater than 2, kurtosis value of 2 items was greater than 7, and all p values of chi-square values obtained had a significance level of 0.001 which was significant.Based on these findings, not all of the items in both data sets displayed univariate normal distribution.Mardia (1970)  According to Table 2, ordered and categorical data sets are over identified and so, CFA can be applied these data sets.
For categorical and ordinal data sets, unweighted least square and DWLS were used as estimation method in CFA.For this, an asymptotic covariance matrix was derived from all data sets.Regression values and their standard error values, t values and fit indexes were obtained and compared with this matrix.If the fit index values are χ2/df<3, 0<RMSEA<0.05,0.97≤NFI≤1, 0.97≤CFI≤1, 0.95≤GFI≤1 and 0.95≤AGFI≤1, this indicates a perfect fit, if they are 4<χ2/df<5, 0,05<RMSEA<0.08,0.95≤NFI≤0.97,0.95≤CFI≤0.97,0.90≤GFI≤0.95and 0.90≤AGFI≤0.95,this indicates an acceptable fit (Kline, 2005;Sümer, 2000).The method "weighted least squares" was excluded because there were non-positive elements in the asymptotic covariance matrix and a high sample size (≥1000) was not achieved in this study, which is necessary for this method to estimate parameters (Hoogland & Boomsma, 1998).In addition, parameters were estimated by maximum likelihood method even though data sets were not distributed normally.
Regression coefficient for set-2 ranged from 0.43 -142.29 in estimations made by ML, from 0.02 to 0.86 in estimations made by ULS and DLWS.In evaluation of standard error values, standard error ranged between 0.04 and 7.88 in estimations made by ML, between 0.02 and 0.33 in ULS method, and between 0.03 and 0.30 in DWLS method.The mean of standard error values was 0.806 for ML, 0.089 for ULS and 0.084 for DWLS.The DWLS method estimated parameters with least error for set-2.This finding is similar to that in the study by Mîndrilă (2010).
The DWLS method estimated parameters with least error for set-3.Although similar regression coefficients were obtained from all parameter estimation methods, it appears that the ULS and DWLS methods estimated parameters with less error as compared to the ML method.
The t values which obtained from ML, ULS and DWLS estimation methods are shown in Table 5.In evaluation of t values for regression coefficients, t values for set-1 ranged between 3.84 and 16.83 in estimations made by ML; between 0.13 and 28.67 in estimations made by ULS, and between 0.12 and 28.67 in estimations made by DWLS.The mean of these values was 9.64 for ML, 8.89 for ULS and 8.88 for DWLS.T values for set-2 ranged from 5.44 to 22.00 in estimations made by ML, from 0.43 to 43.00 in ULS method, and from 0.50 to 43.00 in estimations made by DWLS.The mean of these values was 13.66 for ML, 12.25 for ULS and 12.49 for DWLS.
For set-1 and set-2, higher t values were obtained in parameter estimations made by ML.Although standard error values were lower in estimations made by ULS and DWLS, the reason why estimations made by ML had a higher t value was to obtain very low regression values in estimations made by ULS and DWLS for several items.
For set-3, t values ranged between 2.36 and 10.22 in estimations made by ML, between 2.33 and 23.00 in estimations made by ULS method, and between 2.58 and 23.50 in estimations made by DWLS.The mean of these values was 6.48 for ML, 9.54 for ULS and 9.56 for DWLS.For set-3, higher t values were obtained in parameter estimations made by ULS and DWLS.

359
Fit indexes results which obtained from ML, ULS and DWLS estimation methods are shown in Table 6.In evaluation of fit index values, for set-1 and set-2, by comparing with other methods, data sets are more fit to DWLS method.On the other hand, the fit indices were obtained by ULS method is higher than ML method.Fit index values were very similar to each other that were obtained from estimations made by ULS and DWLS.Katsikatsou et al. (2012) and Yang-Wallentin elt al (2010) obtained similar findings from their study.The model created with estimations made by ML fell below the acceptable level.
For set-3, estimations made by ML had higher match in fit indexes χ2/df and RMSEA, and estimations made by ULS and DWLS methods had higher match in fit indexes NFI, GFI and AGFI.Similar values were obtained from all three parameter methods in fit index CFI.
The iteration numbers of ML, ULS and DWLS estimation methods for convergence are shown in Table 7.To have standard error values at an acceptable level, estimation of parameters continues iteratively.When iteration ends, this means that no significant changes will occur in estimations of parameters.
If there is only few number of iterations, this means that relevant parameter estimation method better matches with the data set of sample.The reason is that different parameter estimation methods have different distributional assumptions (Marsh & Grayson, 1995).In evaluation of number of iterations, ULS method estimated parameters with the least number of iterations in all data sets.This finding was similar to that in the study by Forero et al. (2009).For set-1, ML method estimated parameters with the highest number of iterations, and for set-2 and set-3 DWLS method estimated parameters with the highest number of iterations.This finding shows that the estimation method which displayed optimum match with relevant data sets was ULS.

CONCLUSIONS and DISCUSSION
It is found that ULS and DWLS methods estimated parameters with less standard errors in ordinal data sets comparing to ML method.The ULS method yielded better results in lower sample sizes whereas DWLS method yielded better results in higher sample sizes.In estimation of parameters made by ULS and DWLS, fit values for the CFA model created were higher and values obtained were acceptable levels.Even in lower sample sizes, estimation made by ULS and DWLS methods achieved higher fit index values.The model crated with estimations made by ML method fell below an acceptable level.In ordinal data sets it is therefore revealed that fit index values for estimations made by ML method did not have adequate levels in univariate and multivariate Likert-type response patterns not normally distributed.As the size of the sample was increased, fit indexes were somewhat improved but fit indexes RMSE, CFI and GFI did not show the expected level.In estimations made by ULS and DWLS methods, as the size of sample was increased, fit index values were somewhat reduced, and all fit indexes had expected values except for AGFI obtained from estimations made by ULS for set-2.
In categorical data sets, it is found that the ULS and DWLS methods estimated parameters with less standard error as compared to the ML method; and the method which estimated parameters with the least error was DWLS.Since the fit index AGFI give fit values that are adjusted based on the degree of freedom for the model created by the number of variables when different models are applied to the same data set, it is more suitable value used to compare fit indexes obtained from all methods (Mîndrilă, 2010).The fit index AGFI was demonstrated to have a higher value in ULS and DWLS methods than that of ML method.
As provided in findings, the ULS method estimated parameters with the least number of iterations in all data sets, and it is the more accurate method to estimate parameters of relevant sample data.
In future research, the researches will examine the performance of WLS method with larger sample sizes (≥1000).And large number of categories can be used real data analyses for examine the ML method result.Because categorical methodology can outperform continuous methodology with more than five categories (Beauducel & Herzberg, 2006).

, Yılmaz Koğar, E. / Comparison of Different Estimation Methods for Categorical and Ordinal Data in Confirmatory Factor Analysis ___________________________________________________________________________________
multivariate normality test results are listed in Table1.

Table 1 .
Kline, 2005) of Multivariate Statistical TestsKline, 2005).The model is referred to as unidentified if the number of unknown parameters is higher than available information in the model, as fully identified if the number of unknown parameters is equal to available information, and as over identified if the number of unknown parameters is less than available information(Brown, 2006).T-rule(Byrne, 1998)was used to identify the model.T-rule tests whether there is adequate degree of freedom to calculate and compare fit indexes, and there is adequate information for required estimation of parameters(Byrne,  1998).
** degree of freedom, difference between available information and estimation information.

Table 3 .
Regression Coefficients and Standard Error Values (Set-1 and Set-2)

Koğar, H., Yılmaz Koğar, E. / Comparison of Different Estimation Methods for Categorical and Ordinal Data in Confirmatory Factor Analysis ___________________________________________________________________________________
___________________________________________________________________________________________________________________ ISSN: 1309 -6575 Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi Journal of Measurement and Evaluation in Education and Psychology

Table 6 .
Fit Index Values