The Impact of Ignoring a Level of Nesting Structure in Multilevel Mixture Model

Mixture modeling has gained more attention among practitioners and statisticians in recent years. However, when researchers analyze their data using finite mixture model (FMM), some may assume that the units are independent of each other even though it may not always be the case. This article used simulation studies to examine the impact of ignoring a higher nesting structure in multilevel mixture models. Results indicate that the misspecification results in lower classification accuracy of individuals, less accurate fixed effect estimates, inflation of lower level variance estimates, and less accurate standard error estimates in each subpopulation, the latter result of which in turn affects the accuracy of tests of significance for the fixed effects. The magnitude of the intraclass correlation (ICC) coefficient has a substantial impact. The implication for applied researchers is that it is important to model the multilevel data structure in mixture modeling.


Introduction
In recent years, mixture model has gained more and more attention among practitioners and statisticians (McLachlan & Peel, 2000). Finite mixture models (FMMs) underpin a number of statistical techniques, one of which is growth mixture modeling (GMM), a technique becoming increasingly popular in longitudinal studies due to its flexible analysis framework combining continuous and categorical latent variables (Bauer & Curran, 2004;B. O. Muthén, 2004;B. O. Muthén & Shedden, 1999). In a recent publication, "Handbook for Advanced Multilevel Analysis," several researchers (B. O. Muthén & Asparouhov, 2011;Vermunt, 2011) have pointed out the importance of combining multilevel modeling with mixture models. Despite "the richness of detail that a multilevel growth mixture model can extract from the data" (B. O. Muthén & Asparouhov, 2011, p. 38), "many issues have not yet been fully resolved" due to the fact that "multilevel mixture modeling is a rather new area of statistical methodology" (Vermunt, 2011, p. 78). This article attempts to examine the impact of ignoring the higher level nesting structure in multilevel mixture models (MMMs) and helps to build the body of knowledge in multilevel mixture modeling.
Despite the flexibility provided by FMM, when researchers analyzed their data using FMM, they generally assumed that the participants were independent from each other even though it might not always be true. For example, in educational setting, the data structure is very likely to contain two or more levels (e.g., students nested within schools). Nevertheless, when researchers analyzed their data using FMM, they ignored the higher level nesting structure (i.e., schools) and analyzed the model by assuming that the students were independent from each other (e.g., D'Angiulli, Siegel, & Maggi, 2004). In a literature search we conducted in PSYCINFO (from year 2000 to 2011) for empirical studies applying mixture modeling in different substantive areas, we have found only one recent study using MMM (Van Horn et al., 2008). Some of these studies did not need to use MMM because their data did not have the higher organization level. However, some studies used mixture modeling when they should have used MMM by ignoring the highest level of nesting (e.g., the school level) and mistakenly assume that individuals are independent from each other (reasons for doing so include lack of cluster ID, MMM's model complexity, and/or model convergence issues). In a recent simulation study conducted by Chen, Kwok, Luo, and Willson (2010), the authors have found that when modeling latent growth trajectories, ignoring the highest level results in the redistribution of the variance from the ignored level (i.e., the organization/school level) to the adjacent level (i.e., the 442518S GOXXX10.1177/2 158244012442518ChenSAGE Open 1 University of North Texas, Denton, USA individual/student level). The effects of ignoring clustering have not been studied in the finite mixture modeling setting. It is important to examine its impact and make applied researchers more aware of the consequences of not considering the higher organization level and use caution in their interpretation of statistical results when they had to ignore a higher level.

Purpose of the Study
The purpose of this article is to examine the impact of ignoring a higher nesting structure in MMM on the accuracy of classification of individuals, and the accuracy as well as statistical inference (i.e., Type I error rate and statistical power) of the parameters for the model of each subpopulation. Data structure including students nested within schools is considered. Two latent classes with known group memberships were generated and then analyzed for the true (MMM considering the higher level structure) and misspecified (FMM ignoring the higher level structure) models. Two simulation studies were conducted. In Study 1, the two latent classes were balanced in both sizes and variances, whereas in Study 2, the two latent classes were unbalanced in sizes and variances. Results were presented to show how the hit rate and the relative biases (RBs) for group mean estimates and the respective standard errors were influenced by ignoring the higher level nesting structure.

Brief Review of Multilevel Mixture Models (MMMs)
In this section, key concepts related to multilevel finite (normal) mixture models with continuous indicators are presented. The development of MMMs drew upon two lines of research. One component of MMM is finite mixture modeling (FMM), which assumes that the data under analysis is composed of a discrete number of components. FMM can handle situations where a single parametric family is unable to provide a satisfactory model for local variations in the observed data (McLachlan & Peel, 2000). FMM is similar to multiple group analysis; however, an important difference between mixture modeling and standard multiple group analysis is that in mixture modeling, the group membership is not observed or latent (B. O. Muthén, 2001;Vermunt & Magidson, 2005). This is why some researchers refer FMM as Finite Mixture Modeling (FMM), although statisticians often reserve the term FMM for the situation in which all response variables are categorical (Vermunt, 2007). In this article, we will use the term FMM to refer to mixture model with continuous response variables.
FMM has the capacity of modeling the unknown heterogeneous subpopulations and the random variation of the response variables within latent classes. However, FMM does not consider the situation of multilevel data in which individuals are nested within organizations. Hence, FMM cannot handle nonindependence of individuals due to cluster sampling. As an extension to FMM, the MMMs take the nonindependence of individuals into consideration by specifying a model for each level of the multilevel data. The model for each level could be different, depending on whether we assume heterogeneity and/or model the random effects at the individual level and the organizational level. For example, at the individual level, we can specify a mixture model that models individuals' response patterns and classifies individuals into different subpopulations as well; whereas at the organization level, we can specify a model which only models the variance of organizations, but does not classify organizations into different subpopulations. It is also possible to specify a mixture model at the organization level. However, this article only addressed the more common MMM with classification at the individual level (e.g., students being classified into different subgroups within schools; patients being classified into different subtypes within clinics).

Study 1 Method
Data generation. In Study 1, data with two known subpopulations under a two-level model were first generated with equal population sizes and variances. Then, the data were analyzed as a two-level model (i.e., true model) using multilevel mixture model (MMM) and as a single-level model (i.e., misspecified model) using FMM. The two-level model for data generation is shown below: Level 1: Level 2: where subpopulation ij was a dichotomized variable with 0 and 1 representing two different subpopulations.
Suppose Level 1 is the student level and Level 2 is the school level. There were 40 schools, and within each school there were 20 students. The number of students in each subpopulation was 400, as the mixing proportion was set to be 50% versus 50%. Within each school, there were 20 students coming from two subpopulations, 10 at-risk versus 10 nonat-risk. Altogether, there were 800 students within each replication for data generation. The number of higher level units was set to be 40 given that the recommended minimum number of higher level units for MMMs is 30 (L. K. Muthén, 2003;B. O. Muthén, 2005).
In this two-level model, a total of four parameters needed to be specified: two fixed effect coefficients (i.e., γ 00 and γ 01 ) and two variances of the random effects (i.e., σ 2 and τ 00 ). Before specifying the population parameters in the conditional model, a random intercept model in which there are no subpopulations is presented as follows: Level 1: The variance of the random effect at Level 1 was specified following Raudenbush and Liu's (2001) criteria, namely, For τ 00 * , the intraclass correlation (ICC) formula ICC = + τ σ τ 00 2 00 ( ) was used to obtain the values corresponding to small-and medium-effect size. By fixing ICC equal to .10 as a small ICC which is very common for studies in education (Hox, 2002) and .20 as a medium ICC, the values for a small τ 00 * (.111) and a medium τ 00 * (.250) were obtained.
According to Snijders and Bosker (1999), adding a predictor (i.e., subpopulation ij ) at Level 1 only contributes to the variance of the Level 1 random errors but does not contribute to between-level variance. The formulas for calculating the within and between variances when there is multilevel structure in the data are σ σ 2 2 within = and σ τ σ 2 00 2 between n = +( / ) , where n is the number of students per school.
Using these formulae for calculation, a small (0.161) and a medium (0.300) σ 2 between for the random intercept model was obtained. After adding subpopulation ij as a predictor at Level 1, ß ij was actually the difference between the two subpopulations within each school (cluster), and γ 01 was the average difference between these two subpopulations across all clusters. The effect size R 2 was used to characterize the difference between the two subpopulations with small, medium, and large effect sizes being 0.1, 0.3, and 0.5 according to Cohen (1988Cohen ( , 1992. R 2 = .5 meant that 50% of the variance between the two subpopulations could be explained by their group membership. Therefore, the larger the R 2 , the larger the difference between the two subpopulations. Using the R 2 information, small, medium, and large ß ij values could be calculated and was 0.632, 1.095, and 1.414, respectively. The corresponding σ 2 in the conditional model for small, medium, and high levels of group difference was 0.9, 0.7, and 0.5. τ 00 for the conditional model could be solved using equation , because ICC magnitude stayed the same across the random intercept and the conditional models. For ICC = .1, τ 00 was 0.116, 0.126, and 0.136 for small, medium, and large effect sizes, respectively; for ICC = .2, τ 00 was 0.255, 0.265, and 0.275 for small, medium, and large effect sizes, respectively. After fixing γ 00 to 1, the mean for Subpopulation A and the mean for Subpopulation B were calculated using Equation (1a). The mean of Subpopulation A was 1 in all conditions, and the means for Subpopulation B were 1.632, 2.095, and 2.414 at different levels of R 2 .
The simulation used a 3 (effect sizes-amount of variance explained by group membership) × 2 (magnitude of ICC) factorial design to generate the data. A total of 500 replications were generated for each condition using SAS 9.1, yielding a total of 3,000 data sets. Each data set was then analyzed by a true model (MMM considering the higher/cluster level, type = two-level mixture) and a misspecified model (FMM ignoring the higher/cluster level, type = mixture) using Mplus 4.2 Mixture routine (L. K. Muthén & Muthén, 2006-2007. Analysis. For each condition, valid replications for data analysis were selected because among the replications with converged results, there were latent classes with very few students (i.e., 1 or 2). A valid replication was defined as one of the two subpopulations (or classes) with class size at least equal to or larger than 6% of the total sample size (i.e., 48 out of 800). This 6% criterion was based on the average percentage of sample size for the smallest class in published studies using FMM found in PsycINFO database.
The accuracy of classification of individuals, and the accuracy as well as the test of significance (i.e., Type I error rate and statistical power) of the parameter estimates of the model for each subpopulation were then evaluated.
Hit rate is the percentage of at-risk/non-at-risk students correctly classified as at-risk/non-at-risk. The true and misspecified models were evaluated by comparing the hit rate difference between the two models.
The group mean parameter estimates from the true and misspecified models were summarized across the valid replications for each of the six conditions. The RB for each parameter estimate was calculated using the following equation: where θest is the mean of a parameter estimate across the valid replications and θ pop is the true parameter value. RB equal to zero indicates an unbiased estimate of the parameter. A negative RB indicates an underestimation of the parameter (i.e., the estimated value is smaller than the true parameter value), whereas a positive RB indicates an overestimation of the parameter (i.e., the estimated value is larger than the true parameter value). The cutoff value of 0.05 recommended by Hoogland and Boomsma (1998) was used for acceptable RB of parameter estimates. The RB of estimated standard errors was computed using the following equation: is the mean of the estimated standard errors of the group mean parameter estimate across the valid replications in the misspecified model, and S True θ − is the standard deviation of the parameter estimate across the valid replications in the true model within a particular design condition. The standard deviation was obtained after fitting the correctly specified model to the data (i.e., the model considering the higher level nesting structure), and thus represents the "true" sampling variation, or standard error, that would have been achieved had the model been properly specified. Hoogland and Boomsma (1998) recommended a cutoff value of 0.10 for acceptable RB of estimated standard errors. ANOVAs were conducted to determine the contribution of the two design factors (i.e., R 2 and ICC) and their interaction effect, with η 2 (i.e., η 2 = SS SS Effect Total ) as the effect size indicator. η 2 was used instead of the significance test because with the large number of records, the sum of square error was substantially reduced and any tiny effect could be detected as significant using the F test. Therefore, λ 2 01 ≥ . was adopted as the effect size indicator to filter out the effects trivial in magnitude and to evaluate the impact of design factors.

Results
Hit rate. Table 1 presents the number of valid replications in Study 1 and the average hit rate under true and misspecified models across valid replications. The results show that as group difference increased, the hit rate increased for both true and misspecified model. Besides, within the same design condition, the hit rate under true model is always higher than that under misspecified model. As ICC increased, the difference in hit rate between true and misspecified models increased.
Relative Bias (RB) of group mean estimates. Table 2 presents the mean RB of group mean estimates across valid replications under true and misspecified models. There was an underestimate of Class 1 (the smaller mean) mean and an overestimation of Class 2 (the larger mean) mean under both true and misspecified models when R 2 = 1. When R 2 = .3 and .5, the mean RBs under both models were close to zero, except for the mean RB for Class 1 was underestimated slightly when ICC = .2.   Relative Bias (RB) of variance estimates. Table 3 presents the mean RBs of variance estimates of the true and misspecified model. For the true model, the mean RB of most Level 1 and Level 2 variance estimates were within ±10%, whereas for the misspecified model, there was a trend of overestimation in Level 1 variance estimates.
Relative Bias (RB) of standard errors of group mean estimates. Table 4 presents the mean RBs of standard errors for group mean estimates under the misspecified model. There was an inflation of standard errors for group mean estimates under the misspecified model. ANOVA results show that R 2 , Fs(2, 2257) = 10.017 and 13.931; ps < .001; η 2 s = .009 and .012 for Class 1 and Class 2, respectively, was the major source of impact when RBs of the standard errors for group mean estimates were the dependent variables.

Study 2 Method
Data generation. To extend the findings from Study 1, which was based on the balanced design (i.e., the two classes had exactly same number of observations and variance across clusters), Study 2 was conducted by taking the unbalanced sample size and variance (i.e., unequal class size for the two subpopulations) into account along with other design factors as considered in Study 1. There were two imbalance types, Imbalance Type 1 and Imbalance Type 2. Under Imbalance Type 1, large size was associated with large variance in Group 1 and small size was associated with small size in Group 2; under Imbalance Type 2, large size was associated with small variance in Group 1 and small size was associated with large size in Group 2. The group size and variance varied at Level 1 for the two latent classes. A large group size is a group of 15 students, whereas a small group size is a group of 5 students. A larger variance group has a variance 3 times of the variance of the smaller variance group, so that the variance between the two latent groups was distinguishable. Equation (5) was used to calculate the variances of each individual group based on the size of each group. The value of S 2 p , which was the pooled Level 1 variance of the two latent classes, was set to be 0.9, 0.7, and 0.5, respectively, because the variance accounted for by group membership was 0.1, 0.3, and 0.5 in Study 1.
The simulation used a 3 (amount of variance explained by group membership) × 2 (magnitude of ICC) × 2 (imbalance type) factorial design to generate the data. A total of 500 replications were generated for each condition using SAS 9.1, yielding a total of 6,000 data sets. Each data set was then analyzed by a true model (MMM considering the higher/ cluster level) and a misspecified model (FMM ignoring the higher/cluster level) using Mplus 4.2 Mixture routine (L. K. Muthén & Muthén, 2006-2007.
Analysis. Similar to Study 1, valid replications were selected, with hit rates and RBs of parameter estimates under the 12 conditions for both true and misspecified models calculated and examined. ANOVAs were conducted to determine the contribution of the design factors and all possible interactions.

Results
Hit rate. Table 5 presents the number of valid replications for Study 2 and the average hit rate under true and misspecified models. Similar to the results found in Study 1, as group difference increased, the hit rate increased for both true and misspecified models. Besides, the hit rate under true model was always higher than that under misspecified model within the same condition. As ICC increased, the difference in hit rate between true and misspecified models increased. In addition, Imbalance Type 2 (i.e., large variance associated with small class) always had higher hit rates than Imbalance Type 1 (i.e., large variance associated with large class) when all other conditions remained the same.
ANOVA results indicated that there was an interaction effect between the magnitude of R 2 and imbalance type, F(2, 3642) = 1,028.61; p < .001; η 2 = .15 for true model; F(2, 3642) = 359.02; p < .001; η 2 = .08 for misspecified model, for both the true and misspecified models when the hit rate was the dependent variable. The hit rate increased for both imbalance types as R 2 increased. However, When R 2 was low, the difference between the two imbalance types was larger than when R 2 was high. The hit rate for Imbalance Type 2 was higher than that for Imbalance Type 1. Under the misspecified model, when other conditions stay the same, hit rate was higher when the ICC value was smaller, F(1, 3642) = 79.92; p < .001; η 2 = .01.
There was an interaction effect between the magnitude of R 2 and imbalance type on the hit rate difference between true and misspecified models, F(2, 3642) = 22.56; p < .001; η 2 = .01. The estimated mean hit rate difference between true and misspecified models increased for both imbalance types as R 2 increased. Hit rate under true model was higher than that under the misspecified model. However, at higher levels of R 2 , the difference in hit rate for Imbalance Type 1 is larger than that for Imbalance Type 2. Besides, when other conditions stayed the same, difference in hit rate was larger when the ICC value was larger, F(1, 3642) = 110.85; p < .001; η 2 = .03.
Relative Bias (RB) of group mean estimates. Table 6 presents the mean RBs of group mean estimates under true and misspecified models. There was bias outside the range of ±10% for both the true and misspecified models. ANOVA results indicated that there was an interaction effect between R 2 and ICC, Fs(2, 3642) = 449.637 and 92.023; ps < .001; η 2 s = .15 and .04 for the two classes in the true model; Fs(2, 3642) = 253.900 and 45.950; ps < .001, η 2 s = .09 and .02 for the two classes in the misspecified model, when the RBs of Class 1 and Class 2 were the dependent variables separately. The mean RB decreased for both imbalance types as R 2 increased. There were more biases under Imbalance Type 1 than Imbalance Type 2. There tended to be more biases for Class 1 (smaller mean) mean estimate than that for Class 2 (larger mean).
Relative Bias (RB) of variance estimates. Table 7 presents the mean RBs of variance estimates of the true and misspecified model. Because the Level 1 variances for two groups were estimated separately in the true and the misspecified models, there were two σ 2 s for each model. For the true model, the mean RBs for Level 2 variance estimates were within or close to ±10%, and there was no η 2 01 ≥ . when RB of τ 00 was the dependent variable. For Level 1 variance, there was underestimation for σ 2 1 and overestimation for σ 2 2 under Imbalance Type 1, whereas there was less biases for  Imbalance Type 2. ANOVA results indicated that there was an interaction effect between R 2 and imbalance type, Fs(2, 3642) = 68.793 and 293.125; ps < .001; η 2 s = .035 and .126, respectively. For the misspecified model, there was a trend of overestimation in σ 2 2 under both imbalance types, whereas there was both underestimation and overestimation of σ 2 1 only under Imbalance Type 1. ANOVA results indicated that there was an interaction effect between R 2 and imbalance type, Fs(2, 3642) = 57.494 and 34.857; ps < .001; η 2 s = .027 and .012, respectively. In addition, ICC has a substantial impact on σ 2 2 overestimation, F(1, 3642) = 367.945; ps < .001; η 2 s = .065.
Relative Bias (RB) of standard errors of group mean estimates. Because the Level 1 variances were estimated separately, there were two RBs of standard errors under each model. RBs of SE1 are for the large variance groups and RBs of SE2 are for the smaller variance group under both imbalance types. Again, as shown in Table 8, there was a tendency of inflation of standard errors under the misspecified model under most conditions. ANOVA results indicated that imbalance types, Fs(1, 3642) = 99.10 and 651.57; ps < .001; η 2 s = .03 and .13, and ICC, Fs(1, 3642) = 82.85 and 60.81; ps < .001; η 2 s = .02 and .01, were the two major contributing factors, although there was a slight interaction effect between them for SE1. The misspecified model had more inflation of standard errors under Imbalance Type 1 than Imbalance Type 2. Besides, within the same imbalance type, bias was higher at higher level of ICC.

Study 1
When a higher level structure in cross-sectional data is ignored, the variance at the higher level is redistributed to the lower level, thus affecting the hit rate and group mean and standard error estimates.
Hit rate. R 2 is an important factor influencing hit rate. For both the true and misspecified models, hit rate increases when the R 2 increases, which means that as group difference becomes larger, the classification under both models will become more accurate and this is quite reasonable.
The difference between true and misspecified model is that for true model, ICC magnitude does not affect hit rate much within the same design. Whereas for misspecified model, ICC magnitude affects the hit rate, and the hit rate is higher when ICC is smaller. Under the misspecified model, the Level 2 variance is ignored in model estimation, and more variance is ignored at higher ICC. Obviously ignoring variance at Level 2 will decrease classification accuracy, and the more variance ignored, the less accurate the classification.
Relative Bias (RB) in group mean estimates. The difference in RB for group mean estimates between true and misspecified models are all within ±5%, which indicates that the true and misspecified models do not differ tremendously in the estimates of the group means. In other words, there was no substantial difference on the group mean estimates between the true and misspecified models.
Relative Bias (RB) in standard error estimates. There is an inflation of standard errors for group mean estimates when a higher level nesting structure is ignored. This inflation of standard errors under the misspecified model is due to the redistribution of Level 2 variance to Level 1. When ICC is larger, misspecified model has more inflation of standard errors when all other conditions stay the same.

Study 2
After adding one more design factor-imbalance type-the findings in Study 2 related to R 2 and ICC remain consistent  with findings in Study 1. Therefore, the following discussion focuses on the influence of imbalance type.
Hit rate. When all other conditions stay the same, the hit rate under Imbalance Type 2 is higher than that under Imbalance Type 1. In addition, the difference in hit rate between true and misspecified models is smaller for Imbalance Type 2, in which large group size is associated with smaller variance and small group size associated with larger variance. This means that under Imbalance Type 2, the misspecified model's performance is relatively better than the misspecified model under Imbalance Type 1. This result is not surprising because when a group has smaller variance, it is easier to identify them as coming from the same group. In Imbalance Type 2, when large size is associated with smaller variance, the participants within this group have a higher chance of being classified as the same group. Compared with Imbalance Type 1, where smaller group size is associated with smaller variance, although the participants within this group have a higher chance of being classified as the same group, they are still a smaller percentage of all participants comparing with that in Imbalance Type 2. This is why in general the Imbalance Type 2 has higher hit rates than Imbalance Type 1.
Relative Bias (RB) in group mean estimates. In general, the RBs under Imbalance Type 2 are smaller than that under Imbalance Type 1. For the same reason mentioned before, for Imbalance Type 2, it is easier for both the true and misspecified models to classify the participants into the correct group, therefore resulting in more accurate estimate of the group mean, whereas for Imbalance Type 1, there are more RBs under different levels of R 2 , most likely resulting from the wrong classification of participants into wrong groups.
Relative Bias (RB) in standard error estimates. When a higher level nesting structure is ignored, the standard errors of the fixed effects (i.e., the means of the two latent classes) tend to be inflated under Imbalance Type 1 but have less bias or underestimation under Imbalance Type 2. This may result from either the misclassification of participants, or the inflation of Level 1 variance, or both.

Summary of Findings
This simulation study investigated the impact of ignoring a higher level nesting structure in multilevel mixture modeling on hit rates, the estimated latent class means, and the corresponding standard errors. We examined the impact of three potential factors, including the magnitude of latent class differences, the ICC between the lower and higher levels of data, and the unbalance types under the true and misspecified models.
Our results indicate that first, ignoring a higher level structure may result in less accurate classification of individuals to the correct class the individual belonged to. When the variance and size of the two classes in the generated samples are balanced, the true model has higher hit rates than the misspecified model, and the difference between true and misspecified models is affected by group differences and the ICC. When there is unbalanced group size and variance, the true model still has higher hit rates than the misspecified model; in addition, the hit rate is higher when larger size is associated with smaller variance and smaller size is associated with larger variance compared with when larger size is associated with larger variance and smaller size is associated with smaller variance. Second, ignoring a higher level structure will result in bias in the group mean estimates for the true and misspecified models, but the difference in bias between true and misspecified models is not that large. The difference between true and misspecified models is especially small when the group difference is small, or when the ICC is lower, or when smaller variance is associated with larger size. Third, ignoring a higher level structure will cause the variance at the higher level structure to be redistributed to the lower level and result in the inflation of standard errors for estimated group means, which in turn, results in an inflated Type I error rate. The inflation of standard errors is especially obvious when ICC is at a higher level or when larger variance is associated with larger size and smaller variance is associated with smaller size.

Recommendations
These findings have practical implications for researchers. According to the findings of the study, when ICC is higher, or when large variance is associated with large size and small variance is associated with small size, or when R 2 is large, misspecified model is less accurate in classification of participants, has more RB in parameter estimates, and has less statistical power to detect significant effects. In real data analysis, researchers seldom know in advance the true parameter values (i.e., the difference between groups, the true variance and size of each latent class) and the class membership. Therefore, to have more accurate classification and parameter estimates, and more statistical power in detecting significant effect, it is important to model the nesting structure and use multilevel mixture modeling. However, there are also situations when it is difficult to take into account the nesting structure, such as the lack of identifiers on all possible levels of nesting in data (Moerbeek, 2004) and the difficulty in achieving convergence in model estimation (Van Landeghem, De Fraine, & Van Damme, 2005). If this is the case, then researchers should be cautious when they interpret the findings, especially when they have a marginally significant test result because it might be a significant result if the researchers considered the nesting structure of the data.

Limitations and Suggestions for Future Research
In this study, we only examined the impact of ignoring a higher level structure in mixture model and only two-level structure of the data is considered. In longitudinal study, the data usually contain three levels or more (i.e., repeated measures nested within students nested within schools). In addition, the total sample size in the simulation studies was set to 800 and the cluster size was set to 20. We can change the cluster size and latent class size to see how sample size affects the hit rate and bias of parameter estimates. Another limitation is that, in reality, some data structure is not strictly hierarchical, they are cross-classified in the sense that students come from varied combinations of higher level nesting factors such as schools and neighborhoods. Researchers have found that ignoring the cross-classified structure will result in bias in standard error estimates although the fixed effects estimates were not affected (Luo & Kwok, 2006;Meyers & Beretvas, 2006;Van Landeghem, De Fraine, & Van Damme, 2005). However, there is no software available in the area of latent variable modeling to take into account the cross-classified structure in multilevel mixture modeling. More research and advances in software is needed for the area of multilevel mixture modeling.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research and/or authorship of this article.