Stability of the Data-Model Fit over Increasing Levels of Factorial Invariance for Different Features of Design in Factor Analysis

The aim of this study is to provide an empirical evaluation of the influence of different aspects of design in the context of factor analysis in terms of model stability. The overall model stability of factor solutions was evaluated by the examination of the order for testing three levels of Measurement Invariance (MIV) starting with configural invariance (model 0). Model testing was evaluated by the Chi-square difference test (∆x) between two groups, and Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), and Tucker-Lewis Index (TLI). Factorial invariance results revealed that the stability of the models was varying over increasing levels of measurement as a function of Variable-To-Factor (VTF) ratio, Subject-To-Variable (STV) ratio, and their interactions. There were invariant factor loadings and invariant intercepts among the groups indicating that measurement invariance was achieved. For VTF ratios 4:1, 7:1, and 10:1, the models started to show stability over the levels of measurement when the STV ratio was 4:1. Yet, the frequency of stability models over 1000 replications increased (from 77% to 91%) as the STV ratio increased. The models showed more stability at or above 32:1 STV. Keywords-model stability; factorial invariance; level of measurement invariance; model design

INTRODUCTION Confirmatory Factor Analysis (CFA) is a form of Factor Analysis (FA) that tests hypotheses regarding how well the measured indicator variables represent the number of constructs [1]. CFA is a confirmatory method a researcher can use to examine, evaluate, and/or test a number of hypothesized factors underlying the variance/covariances in a set of measured indicator variables. CFA allows the researcher to test hypothetical and plausible alternative latent variable structures for the observed indicator variance/covariances [2]. More recently, CFA has also been used in exploratory analysis too [3][4][5][6][7][8][9]. Three major concerns have emerged repeatedly in the literature related to the use and interpretation of FA in social science research: (a) determining an adequate number of indicator variables to describe the latent trait, (b) factoring a sufficient sample size to have reasonable confidence in the stability of the model estimate, and (c) establishing minimum communality levels to determine which indicator variables can represent a latent trait, especially in simulation studies [8,[10][11][12][13]. FA assumes that the indicator variables used should be linearly related to one another. Otherwise, the number of extracted factors will be the same as the number of original variables [2,15]. Survey instrument length and the number of variables differ based on discipline, purpose, sample frame, and method of data collection. Recently, the online survey has become an important method of data collection for a variety of reasons (e.g. online surveys are easy to design, conduct, and often they are the only option for data collection). According to SurveyMonkey the median length of its paid surveys was 10 questions [9]. Industry-specific surveys and market-research surveys tend to have more questions while event surveys and just-for-fun surveys tend to be shorter (see Figure 1) [9]. If the length of the survey is about 10 questions or fewer, it can lead to a higher completion rate and increase the likelihood that people will choose to take more of the researcher's surveys in the future. More recent studies of factor analysis do not include the VTF ratio 10:1 in their investigations [5,12,[14][15][16][17], nor the way this number is relative to the sample size or the communality magnitude when factor analysis is conducted.

A. Observed Variables
In FA, the observed indicator variables can be viewed as representing a sample of potential variables, all of which measure the same construct or factor [1,14,16,17]. Authors in [18] examined the magnitude of the correlation between the observed variables and the factor components by manipulating sample size, number of variables, number of components, and component saturation. They concluded that the VTF ratio was important for factor stability, with more variables per factor yielding a more stable result. Authors in [19], partially confirmed this conclusion. They found that the necessary minimum indicator variables to attain factor solutions that are adequately stable relative to population factors are dependent on several aspects of any given study, including the level of communality and sample size. Similarly, authors in [20] found that when the VTF ratio increased the factor analysis solution improved.  The issue of variable sampling has been used extensively in conceptual development but has received almost no empirical evaluation of those that has sampled indicator variables at random from the universe of variables. The assumption of random sampling is useful to minimize sampling issues and for developing generalizability rather than a prescription for applied research procedures. Authors in [21] examined the quality of factor analytic research published between 1999 and 2009 in five leading developmental disabilities journals. They found 35% of the studies used some form of FA. However, the guidelines for using FA were largely ignored and failed to account for levels of overdetermination and commonalities among measured variables. Furthermore, the authors in [19] found that there was a lack of validity in some common practice rules used in FA. Thus, anything that influences or changes variance may affect the conclusions related to FA. Researchers should determine an adequate number of indicator variables that is required to produce a stable and precise model in order to describe the latent trait. Authors in [21] investigated the effects of indicator variables on pattern recovery to determine the sufficient number of indicator variables that is likely to produce patterns that closely approximate the population pattern. They reported that the number of indicator variables can strongly affect the degree to which a sample pattern reproduces the population pattern, and that a minimum of three variables per factor is critical. The information about the adequate number of indicator variables that is required to produce a stable and precise model can be used in the design of a study and, retrospectively, in the evaluation of an existing study.

B. Adequate Sample Size
Determining sample size requirements for FA is complicated because it is dependent on other aspects of design, such as VTF and h2. Previous studies in FA revealed several approaches that have been used to propose guidelines for the sample size. However, most of these approaches were concerned with identifying either the Subject-To-Variable (STV) ratio or the absolute sample size, regardless of the effect of these rules on WSV. The examples below describe some reported results about sample size in the context of FA. A larger sample size is better than a smaller sample size because it is minimizing misfit and the probability of errors. In many cases, increasing the sample size may not be possible. In medical research, it is very difficult to collect a large sample of patients suffering from a certain disease [19][20][21]. Investigating the minimum STV ratio or small absolute sample size to obtain the stability of the model is necessary. Only a very limited number of studies on the role of sample size in FA have investigated real or simulated small sample size. Authors in [17] investigated the minimum sample size necessary to obtain reliable factor solutions under various conditions. They concluded that under the conditions of high communality, high number of observed variables, and small number of factors, FA yields a stable estimate model for sample sizes below 50. Selecting the adequate sample size is an important decision in study design. A researcher must determine how large the sample should be and what is the most appropriate sampling frame. One problem is that the proposed recommendations vary dramatically. Clearly, the wide range in these recommendations causes them to be of rather limited value to empirical researchers. Yet, there is a need to conduct studies examining systematically the model estimate stability latent variable variance with different facets of study Experimental Design (ED) and Sampling Design (SD).
Previous research has investigated the stability of factor solutions by the examination of chi-square value ( χ ଶ ) and Overall Model Fit (OMF) indices such as Goodness-of-Fit Index (GFI), Adjusted GFI (AGFI), Tucker-Lewis Index (TLI), Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA), and Root Mean Square Residual (RMSR) [8,10,[22][23][24][25]. OMF indices examined global measures of data-model fit. Examinations of Measurement Invariance (MIV) (configural, weak, and strong) were used to evaluate model stability. The effects of communality magnitude in FA have been mostly vague. Studies have revealed a varied range of communality magnitude and common practice rules [1,3,4,16,26,27]. The communality measures the percent of variance in a given variable explained by the factors. If communalities are high, model stability in the sample data is normally very good [18,26,28,29]. Authors in [20] investigated the quality of factor solutions. They found that when the communalities were high, sample size tended to have less influence on the quality of factor solutions than when communalities are low. Authors in [30] confirmed that communality magnitudes play an important role in determining the adequate sample size. Moreover, authors in [21] found that the communality magnitudes became most relevant in determining the sufficient sample size and the number of variables per component.

A. Simulation Data
Simulation data are used in social science to answer a particular research question, solve a statistical problem, or improve analysis procedure techniques. Statistical program developers and research designers usually perform simulation data techniques for several reasons: gathering real data may be difficult, time-consuming, expensive, or real data sometimes violate distributional assumptions. Simulation data often lead to greater understanding of an analysis and the results one can expect from various oddities of real-life data [31]. Simulation may approximate real-world results yet requires less time and effort and gives the researcher a chance to experiment with data under various conditions. Data can be simulated by several methods. The Monte Carlo technique is one popular method that has been used in social science since the 1940s [28]. A Monte Carlo simulation is a numerical technique that can be used to conduct experiments and repeated random sampling to simulate data for a given mathematical model.
The key point of the simulation model was the development of the matrices for a 5-factor domain as an example of the factor. The statistical software package SAS was used, and the syntax code was written by the researcher.

B. Estimation Methods
Maximum Likelihood (ML) is the most common method of factor extraction that estimates population values for FA by calculating loading that maximizes the probability of sampling the observed correlation matrix from a population [12] and is often used in CFA. The current study used ML as a method of factor extraction. Authors in [21] concluded that if the data are normally distributed, ML is the best estimation option because it allows the computation of a mixed range of indexes of the goodness of fit of the model. The ML estimation method assumes that the data are independently sampled from a multivariate normal distribution with mean µ and variancecovariance matrix that takes this form: Σ = LL′+Ψ, where L is the matrix of factor loadings and ψ is the diagonal matrix of specific variances. Authors in [32] indicated that the ML estimation method is the most precise when the data are continuous and normally distributed, but it does not provide accurate results with ordinal data or when the data violate the assumption of multivariate normality. Table I illustrates the procedure for testing model stability starting with a CFA model relative to a known factor structure for each condition involved in the study separately.

C. Procedure for Testing Stability Across Models
The procedure for testing measurement invariance was performed in order to evaluate the variation over increasingly levels of measurement invariance among models. There are different approaches that could be used to evaluate the measurement invariance among groups. The present study used a Multiple-Group Confirmatory Factor Analysis (MGCFA) model to test invariance among the levels of STV ratios. Table II illustrates the order for testing measurement invariance starting with configural invariance (model 0). Model testing was evaluated by the chi-square difference test (∆χ ଶ ) between two groups [20,30,33], and RMSEA, CFI, and TLI were used to evaluate all the model fits. As referenced above, the criteria values suggested in [20,34] were used in this study: RMSEA: 0.00 -0.05 very good fit, CFI > 0.95 good fit, and TLI ≥ 0.96 good fit. Three levels of MIV were tested. where, λ represents the number of factor patterns across the g ୲୦ group. Configural invariance at best indicates that the group factors are similar but gives no indication that they hold measurement equivalence. ∆χ ଶ was used to judge configural invariance. For instance, if the χ ଶ was not significant, the indicator variables loaded to the same factors across the groups. In other words, there were no differences in factor construct between the groups. Weak Measurement Invariance (M1) indicates that across groups, the corresponding factor loadings are equivalent: λ ୰୭୳୮ଵ = λ ୰୭୳୮ଶ = ⋯ = λ ୰୭୳୮ , where λ ୰୭୳୮ଵ represents the factor loading of the j ୲୦ indicator variable in the group. Factor loadings represent the direct effect of the latent construct on each indicator variable, and factor loadings are represented by lambda (λ) [11]. At this level (weak measurement invariance), variables were loaded to the same factors across the group and the factor loadings across groups were equal. ∆χ ଶ was used to judge weak invariance. For instance, if the χ ଶ was not significant, the factor loadings across groups were equal. In other words, there were no differences in the factor loading of the indicator variables and their construct between the groups.

IV. RESULTS
Table IV presents a frequency analysis of a chi-square pvalues > 0.05 was tested against the null proportion p = 0.05 to determine if there was a statistically significant number of invariance failures. The factorial invariance result revealed that the stability of the models was varying over increasing levels of measurement as a function of VTF, STV, and their interactions.

V. DISCUSSION
The study findings refuted some of the guidelines found in the literature, e.g. authors in [18] reported that the sample size was not an important factor in determining model stability, and authors in [35] reported that the STV ratio should be no lower than 5.
The results of the current study revealed that sample size did have a strong effect on stability of the simulated models. For instance, when VTF ratio was 4:1, the mean values related to data-model fit indices were adequate at STV ratio >= 4:1. However, looking at the frequency of rejections based on conventional thresholds over the 1000 replications depicted a different conclusion. The percentage of stable (invariant) models ranged from 77% at 4:1 STV to 91% at 32:1 STV clearly indicating that larger STV ratios are related to higher stability levels with a model. These findings validated the results in [16] who reported that the percentage of invariance tests varied based on the sample size per group. The findings of the current study do agree with [36] where in some models an STV of 30:1 was needed to produce stable model and minimize the amount of misfit. The study findings also contradict some previous research that investigated the effect of VTF ratio on stability of factor solutions. The authors in [18] concluded that the VTF ratio was important for factor stability with more indicator variables per factor yielding more stable result.
VI. CONCLUSION In general, this study provided an empirical evaluation of the stability of the data-model fit over increasing levels of factorial invariance for different features of design in FA. The study concluded that the stability of the models was varying over increasing levels of measurement as a function of VTF, STV, and their interactions. There were invariant factor loadings and invariant intercepts among the groups indicating that measurement invariance was achieved. For VTF ratios of 4:1, 7:1, and 10:1 the models started to show stability over the levels of measurement when the STV ratio was 4:1. Yet, the frequency of stability models over 1000 replications increased (from 77% to 91%) as STV ratio increased. The models showed more stability at or above 32:1 STV.