Skip to main content
Log in

Latent class models for multiple ordered categorical health data: testing violation of the local independence assumption

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

Latent class models are now widely applied in health economics to analyse heterogeneity in multiple outcomes generated by subgroups of individuals who vary in unobservable characteristics, such as genetic information or latent traits. These models rely on the underlying assumption that associations between observed outcomes are due to their relationship to underlying subgroups, captured in these models by conditioning on a set of latent classes. This implies that outcomes are locally independent within a class. Local independence assumption, however, is sometimes violated in practical applications when there is uncaptured unobserved heterogeneity resulting in residual associations between classes. While several approaches have been proposed in the case of binary and continuous outcomes, little attention has been directed to the case of multiple ordered categorical outcome variables often used in health economics. In this paper, we develop an approach to test for the violation of the local independence assumption in the case of multiple ordered categorical outcomes. The approach provides a detailed decomposition of identified residual association by allowing it to vary across latent classes and between levels of the ordered categorical outcomes within a class. We show how this level of decomposition is important in the case of ordered categorical outcomes. We illustrate our approach in the context of health insurance and healthcare utilization in the US Medigap market.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Note that covariates may differ among equations. For sake of generality, we assume here that all available xs are allowed to freely enter in each equation.

  2. In some empirical studies in health economics, the categories of U are used to capture unobserved individual health status (Shmueli 2003; Dardanoni and Li Donni 2012b) or attitudes to health care utilization (Bago d’Uva 2005; Bago d’Uva and Jones 2009).

  3. To better clarify what \((l_h-1)(l_k-1)\) stands for, suppose \(Y_h\) and \(Y_k\) take three levels (\(l_h=l_k=3\)). Then we have \((3-1)(3-1)=4\) different association parameters one for each \(2\times 2\) marginal table that can be recovered by the \(3\times 3\) table describing the distribution of \(Y_h\) and \(Y_k\). In the technical appendix, we provide details on how the elements of the vector \(\pi (x)\) are related to \(\lambda \)s, and to parameters \(\alpha \), \(\beta \) and the thresholds \(\delta \).

  4. A set of Stata routines to perform these estimations are available upon request from the authors.

  5. Since our focus is on testing the local independence assumption when Ys are categorical ordered variables and different parameterizations of residual association are used, we assume that the \(\lambda \)s are zero. Setting \(\lambda \)s different from zero would be of interest if the focus was on understanding the sources of residual correlation, which is beyond the scope of this paper.

  6. In a situation where the direction of residual associations do not vary, misspecification of the standard LCM results in estimation of spurious classes. In this case, the residual associations can be adequately captured by any of the models which allow for residual associations (RE-M, PRE-M or UE-M). For completeness, we show such an example in “Appendix A”.

  7. It is beyond the scope of the paper to provide a more general model which controls for residual association and also estimates \(\alpha \) and \(\beta \) by allowing them to vary between and within levels of Ys and classes of U, respectively.

  8. We also run the same test procedure for \({\mathcal {H}}_2\), \({\mathcal {H}}_3\) and \({\mathcal {H}}_{4}\). For sake of brevity, we do not report the test, and results confirm no local dependence when the model is “correctly” specified.

  9. Note that Panel C and D in Table 9 are the same since \(Y_1\), \(Y_2\) and \(Y_3\) are binary and then residual association within each latent class is described by only one log odds ratio.

  10. This is only an assumption to keep notation simple.

References

  • Ayyagari P, Deb P, Fletcher J, Gallo W, Sindelar JL (2013) Understanding heterogeneity in price elasticities in the demand for alcohol for older individuals. Health Econ 22(1):89–105

    Google Scholar 

  • Bago d’Uva T (2005) Latent class models for use of primary care: evidence from a British panel. Health Econ 14(9):873–892

    Google Scholar 

  • Bago d’Uva T, Jones AM (2009) Health care utilisation in Europe: new evidence from the ECHP. J Health Econ 28(2):265–279

    Google Scholar 

  • Bartolucci F, Forcina A (2006) A class of latent marginal models for capture–recapture data with continuous covariates. J Am Stat Assoc 101(474):786–794

    Google Scholar 

  • Bartolucci F, Colombi R, Forcina A (2007) An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Stat Sin 17(2):691–711

    Google Scholar 

  • Becker MP, Yang I (1998) Latent class marginal models for cross-classifications of counts. Sociol Methodol 28(1):293–325

    Google Scholar 

  • Chiappori PA, Salanié B (2000) Testing for asymmetric information in insurance markets. J Political Econ 108(1):56–78

    Google Scholar 

  • Colombi R, Forcina A (2001) Marginal regression models for the analysis of positive association of ordinal response variables. Biometrika 88(4):1007–1019

    Google Scholar 

  • Conway KS, Deb P (2005) Is prenatal care really ineffective? Or, is the ‘devil’ in the distribution? J Health Econ 24(3):489–513

    Google Scholar 

  • Cutler DM, Finkelstein A, McGarry K (2008) Preference heterogeneity and insurance markets: explaining a puzzle of insurance. Am Econ Rev 98(2):157–162

    Google Scholar 

  • Dardanoni V, Li Donni P (2012a) Incentive and selection effects of medigap insurance on inpatient care. J Health Econ 31(3):457–470

    Google Scholar 

  • Dardanoni V, Li Donni P (2012b) Reporting heterogeneity in health: an extended latent class approach. Appl Econ Lett 19(12):1129–1133

    Google Scholar 

  • Dardanoni V, Li Donni P (2016) The welfare cost of unpriced heterogeneity in insurance markets. RAND J Econ 47(4):998–1028

    Google Scholar 

  • Dardanoni V, Forcina A, Li Donni P (2018) Testing for asymmetric information in insurance markets: a multivariate ordered regression approach. J Risk Insur 85(1):107–125

    Google Scholar 

  • Deb P, Trivedi PK (1997) Demand for medical care by the elderly: a finite mixture approach. J Appl Econom 12(3):313–336

    Google Scholar 

  • Deb P, Trivedi PK (2002) The structure of demand for health care: latent class versus two-part models. J Health Econ 21(4):601–625

    Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38

    Google Scholar 

  • Ettner SL (1997) Adverse selection and the purchase of medigap insurance by the elderly. J Health Econ 16(5):543–562

    Google Scholar 

  • Fang H, Keane MP, Silverman D (2008) Sources of advantageous selection: evidence from the medigap insurance market. J Political Econ 116(2):303–350

    Google Scholar 

  • Forcina A (2008) Identifiability of extended latent class models with individual covariates. Comput Stat Data Anal 52(12):5263–5268

    Google Scholar 

  • Forcina A (2017) A Fisher-scoring algorithm for fitting latent class models with individual covariates. Econom Stat 3:132–140

    Google Scholar 

  • Haberman SJ (1979) Analysis of qualitative data: new developments, vol 2. Academic Press, New York

    Google Scholar 

  • Hagenaars JA (1988) Latent structure models with direct effects between indicators: local dependence models. Sociol Methods Res 16(3):379–405

    Google Scholar 

  • Hagenaars JA, McCutcheon AL (2002) Applied latent class analysis. Cambridge University Press, Cambridge

    Google Scholar 

  • Huang GH, Bandeen-Roche K (2004) Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika 69(1):5–32

    Google Scholar 

  • Jiménez-Martín S, Labeaga JM, Martńez-Granado M (2002) Latent class versus two-part models in the demand for physician services across the European Union. Health Econ 11(4):301–321

    Google Scholar 

  • Lang JB (1996) Maximum likelihood methods for a generalized class of log-linear models. Ann Stat 24(2):726–752

    Google Scholar 

  • Lindsay B, Clogg CC, Grego J (1991) Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. J Am Stat Assoc 86(413):96–107

    Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Google Scholar 

  • Meijer E, Kapteyn A, Andreyeva T (2011) Internationally comparable health indices. Health Econ 20(5):600–619

    Google Scholar 

  • Morduch JJ, Stern HS (1997) Using mixture models to detect sex bias in health outcomes in Bangladesh. J Econom 77(1):259–276

    Google Scholar 

  • Munkin MK, Trivedi PK (2010) Disentangling incentives effects of insurance coverage from adverse selection in the case of drug expenditure: a finite mixture approach. Health Econ 19(9):1093–1108

    Google Scholar 

  • Oberski DL, Vermunt JK (2018) The expected parameter change (EPC) for local dependence assessment in binary data latent class models. https://arxiv.org/abs/1801.02400

  • Qu Y, Tan M, Kutner MH (1996) Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52(3):797–810

    Google Scholar 

  • Reboussin BA, Ip EH, Wolfson M (2008) Locally dependent latent class models with covariates: an application to under-age drinking in the USA. J R Stat Soc Ser A (Stat Soc) 171(4):877–897

    Google Scholar 

  • Shmueli A (2003) Socio-economic and demographic variation in health and in its measures: the issue of reporting heterogeneity. Soc Sci Med 57(1):125–134

    Google Scholar 

  • Suppes P, Zanotti M (1981) When are probabilistic explanations possible? Synthese 48(2):191–199

    Google Scholar 

  • Torrance-Rynard VL, Walter SD (1997) Effects of dependent errors in the assessment of diagnostic test performance. Stat Med 16(19):2157–2175

    Google Scholar 

  • Vermunt JK, Magidson J (2004) Local independence. In: Lewis-Beck MS, Bryman A, Liao TF (eds) The SAGE encyclopedia of social science research methods, vol 1–3. SAGE Publications, Thousand Oaks, pp 580–581

    Google Scholar 

  • Wouterse B, Huisman M, Meijboom BR, Deeg DJ, Polder JJ (2013) Modeling the relationship between health and health care expenditures using a latent Markov model. J Health Econ 32(2):423–439

    Google Scholar 

  • Yang CC, Yang CC (2007) Separating latent classes by information criteria. J Classif 24(2):183–203

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Li Donni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Scenario of no difference in direction of the residual association within a latent class

In this scenario, we aim to show how potential residual correlation related to model misspecification may lead to the identification of spurious latent classes, which do not reflect the underlying unobserved heterogeneity in the population. Suppose the population has three subgroups which differ in their attitude towards healthcare utilization. One can think of these groups as low, medium or heavy healthcare users. These groups are intrinsically unobservable. The size of each group in terms of the share of the total population is 30%, 45%, and 25%. Healthcare is measured by \(K=5\) indicators Y which can be either binary or ordered categorical variables. We set \(Y_4\) and \(Y_5\) as ordered categorical variables taking three levels, while the remaining Ys are binary variables. The data generating process of \({\varvec{Y}}\) is fully described by the following system of equations:

$$\begin{aligned} Y_{1}^\star= & {} \sum \limits _{u=1}^{3}\alpha _1(u)U(u) + \epsilon _{1i} \nonumber \\ Y_{2}^\star= & {} \sum \limits _{u=1}^{3}\alpha _2(u)U(u) + \epsilon _{2i}\nonumber \\ Y_{3}^\star= & {} \sum \limits _{u=1}^{3}\alpha _3(u)U(u) + \epsilon _{3i}\nonumber \\ Y_{4}^\star= & {} \sum \limits _{u=1}^{3}\alpha _4(u)U(u) + 1.7x_1 +1.3x_2+0.7x_3 +\epsilon _{4i} \nonumber \\ Y_{5}^\star= & {} \sum \limits _{u=1}^{3}\alpha _5(u)U(u) + 2.4x_1 +0.5x_2+0.7x_3 +\epsilon _{5i} \end{aligned}$$
(11)

where \(\alpha _1=[1.5,-1,2]\), \(\alpha _2=[1,-1.5,2.5]\), \(\alpha _3=[1.5,-2.5,1]\), \(\alpha _4=[-0.2,1.2,1.4]\), \(\alpha _5=[-1.9,-1.5,2.1]\), and the two cut-points: \(\delta _{4,2}\) and are \(\delta _{5,2}\) equal to \(-0.8\) and \(-1.7\), respectively.

We begin by estimating the system of Eq. (11) using the SOC-M under different number of latent classes M and compare the AIC, BIC and the sBIC to choose the appropriate number of classes. Estimating the system of Eq. (11) involves: (i) \(M-1\) parameters capturing the membership probabilities as described in Eq. (4), (ii) \(M\times K\) random intercepts (\(\alpha \)s), and (iii) six \(\beta \)s and two cut-points \(\delta \)s.

We first estimate the “correctly” specified model as described in (11) under different number of latent classes \(M=2\), 3 and 4. Here, and also in the subsequent Section, with “correctly” we refer to the model in which we know how the latent and the covariates affect the Ys. As expected, model selection criteria and the number of parameters, reported in Panel A of Table 8, show that \(M=3\) latent classes are adequate to capture the underlying unobserved U. To evaluate the existence of potential sources of local dependence, we also estimate the RE-M, which includes residual association parameters for each pair of Y. This implies estimating in addition to the previous parameters required by the SOC-M, ten \(\lambda _{k;h}\) (with \(k,h=1,\ldots ,5\) and \(k\ne h\)) parameters. Selection criteria and model information are reported in the Panel B of Table 8, while the estimated \(\lambda \)s are reported in Panel A of Table 9. The tables reveal that three classes are adequate to capture the underlying heterogeneity in the population, while \(\lambda \)s are small and not significant indicating that conditional on U and \({\varvec{x}}\), there is no within class residual association between the Ys. For robustness, we also test the hypothesis \({\mathcal {H}}_1\). The LR test statistics are asymptotically distributed as \(\chi ^2\) with 1 dof, and are reported together with the corresponding p-value in Panel A of Table 9. As expected results indicate that if the model is fully specified, there is no residual heterogeneity conditional on U.Footnote 8

Table 8 Selection criteria for models
Table 9 Estimated association parameters

Let us assume that \(x_1\) is not directly observable or it is simply not available to the researcher and thus cannot be used as a confounder, potentially causing model misspecification. Without any information on how \({\varvec{x}}\) affects \(Y_1,\ldots ,Y_5\), we now include \(x_2\) and \(x_3\) among regressors in each equation of system (11) and start estimating this model with the SOC-M under different number of latent classes. Notice that the effect of \(x_1\) can be captured by including additional latent classes. Panel C of Table 8 reports model information for the misspecified SOC-M . The number of latent classes required to capture the underlying heterogeneity is six, while there are only three unobserved classes. However, when some of this residual association is taken into account within the RE-M, the model with three latent classes is again preferred as compared to all the others (see Panel D of Table 8).

The corresponding \(\lambda \)s presented in Panel B of Table 9 reveal that the residual association is positive and statistically significant only between the last two variables. This indicates that conditional on \(x_2\), \(x_3\) and the latent classes, there is still some local dependence between \(Y_4\) and \(Y_5\).

Finally, in this simulation we estimate the PRE-M and our UE-M (Panels C and D of Table 9).Footnote 9 The latter clearly shows the large and statistically significant residual heterogeneity between \(Y_4\) and \(Y_5\) for latent classes 1 and 3. The LR test for local independence (\({\mathcal {H}}_{4}\)) is clearly rejected in the case of \(Y_4\) and \(Y_5\) (Panel D of Table 9).

This simulated example highlights how our approach can be used as a model specification test for the basic case when the direction of the residual association is the same across levels of the Ys.

Technical notes

To clarify how Eqs. 3, 5 and 8 are related, consider an individual i with an observed set of characteristics \({\varvec{x}}\). Assume that we observe \(K=3\) indicators: one binary (\(Y_1\)), with \(l_1=0,1\), and two categorical (\(Y_2\) and \(Y_3\)) taking, respectively, \(l_2,l_3=0,1,2\) values. Let us indicate with U a discrete latent variable taking 2 classes. Suppose also that each indicator depends on the same set of covariates \({\varvec{x}}\),Footnote 10 and we want to estimate residual correlation between \(Y_2\) and \(Y_3\) conditional on U. The system of equations can then be written;

$$\begin{aligned} \begin{array}{cl} \Pr (Y_{i1}\ge 1|U,{\varvec{x}}_i)= &{} {\varLambda }\biggl ( \sum \limits _{u=1}^{M}\alpha _1(u)U(u)+ {\varvec{x}}^\prime _i {\varvec{\beta }}_1 \biggl ) \\ \vdots &{} \vdots \\ \Pr (Y_{i3}\ge 2|U,{\varvec{x}}_i) = &{} {\varLambda }\biggl (\sum \limits _{u=1}^{M}\alpha _3(u)U(u) + {\varvec{x}}^\prime _i {\varvec{\beta }}_3 \biggl ) \end{array} \end{aligned}$$
(12)

where \({\varvec{\beta }}\) and \({\varvec{\alpha }}\) are vectors of unknown parameters, and \({\varLambda }\) is the logit link function. Generally the system of equation is completed by an additional equation which models individual class membership probabilities:

$$\begin{aligned} \Pr (U=u) = \frac{\exp (\alpha _{U}(u))}{\sum _{u=1}^M \exp (\alpha _{U}(u))}, \quad \alpha _U(1) = 0, \quad u=1. \end{aligned}$$
(13)

Estimation of 1213 requires modelling directly the complete data likelihood described by the joint distribution of \((U,Y_1,Y_2,Y_3)\). Using some developments in marginal modelling proposed by Bartolucci and Forcina (2006) and Bartolucci et al. (2007), which provide a way to define a structure of canonical parameters \({\varvec{\lambda }}\) which describe relevant aspects of the joint distribution \((U,Y_1,\ldots , Y_K ) \mid {\varvec{x}}\), such as the univariate marginal distributions and their association. This approach provides an invertible mapping such that \({\varvec{\lambda }}({\varvec{x}})={\varvec{\lambda }}({\varvec{\pi }}({\varvec{x}}))\), where \({\varvec{\pi }}({\varvec{x}})\) denotes the vector of dimension \(M\cdot \prod _1^3 l_k\) containing the cell probabilities of the joint distribution of \((U, Y_1,Y_2,Y_3 ) \mid {\varvec{x}}\) as modelled in the system of Eqs.  34.

The space of probabilities \({\varvec{\pi }}\), when the latent is mixture of three groups (\(M=3\)), is given by a \(2\times 2 \times 3\times 3=36\) possible combinations between U, \(Y_1\), \(Y_2\) and \(Y_3\), which is simply given by:

$$\begin{aligned} \left( { \begin{array}{*{20}{c}} {0,0,0,0} \\ {0,0,0,1} \\ {0,0,0,2} \\ \vdots \\ {0,1,2,2} \\ {1,0,0,0} \\ \vdots \\ \begin{array}{l} {0,1,2,1} \\ {2,1,2,2} \\ \end{array} \\ \end{array}} \right) = \left( { \begin{array}{*{20}{c}} \pi _{0000} \\ \pi _{0001} \\ \pi _{0002} \\ \vdots \\ \pi _{0122} \\ \pi _{1000} \\ \vdots \\ \begin{array}{l} \pi _{0121} \\ \pi _{1122} \\ \end{array} \\ \end{array}} \right) ={\varvec{\pi }}\left( {\varvec{x}}_i \right) \end{aligned}$$
(14)

where e.g. \(\pi _{02}\) represents the probability of having \(y_1 = 0\) and \(y_2=2\) for the i-th individual. In view of Eq. 8 and keeping in mind Eqs. 1213, the simultaneous models can be specified as [see Lang (1996), page 726, eq. 1.1]:

$$\begin{aligned} {\varvec{C}}\log [ {\varvec{M}} {\varvec{\pi }}( {\varvec{z}} )] = {\varvec{X}} {\varvec{\beta }}\end{aligned}$$
(15)

where \({\varvec{M}}\) is the matrix required to produce the appropriate marginal probabilities and \({\varvec{C}}\) is a matrix of linear independent contrast. These matrices are often named marginalization matrix and contrast matrix, respectively. Colombi and Forcina (2001) provide an algorithm to construct \({\varvec{C}}\) and \({\varvec{M}}\) given the type of logit (e.g. global) for each variable and the hierarchical structure of the model. In our case, the contrast matrix has dimension \(8\times 24\) matrix, while \({\varvec{M}}\) is a marginalization matrix of dimension \(24\times 9\). Applying Eq. (15), the joint distribution (\(U,Y_1,Y_2,Y_3\)) can then be modelled such that:

(16)

where e.g. \(\pi _{1\cdot \cdot \cdot }={{\pi _{{\mathrm{1000}}}} + {\pi _{{\mathrm{1001}}}} + {\pi _{{\mathrm{1002}}}} + {\pi _{{\mathrm{1010}}}} + \cdots + {\pi _{{\mathrm{1120}}}} + {\pi _{{\mathrm{1121}}}} + {\pi _{{\mathrm{1122}}}}}\) is the marginalization of \({\varvec{\pi }}\) with respect to \(U=1\). Note that the first entry corresponds to the logit associated with (13), while the remaining correspond to Eq. (12), which models conditional on the latent U the marginal distribution of \(Y_1\), \(Y_2\) and \(Y_3\) and their associations. The expression above can then be rewritten as:

$$\begin{aligned} \log \left( {\frac{{{\pi _{{\mathrm{1}} \cdot \cdot \cdot }}}}{{{\pi _{{\mathrm{0}} \cdot \cdot \cdot }}}}} \right)= & {} {\lambda _{U = 1}} \nonumber \\ \log \left( {\frac{{{\pi _{{\mathrm{01}} \cdot \cdot }}}}{{{\pi _{{\mathrm{00}} \cdot \cdot }}}}} \right)= & {} {\lambda _{U = 0,{Y_1} = 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{11}} \cdot \cdot }}}}{{{\pi _{{\mathrm{10}} \cdot \cdot }}}}} \right)= & {} {\lambda _{U = 1,{Y_1} = 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{0}} \cdot {\mathrm{1}} \cdot }} + {\pi _{{\mathrm{0}} \cdot {\mathrm{2}} \cdot }}}}{{{\pi _{{\mathrm{0}} \cdot {\mathrm{0}} \cdot }}}}} \right)= & {} {\lambda _{U = 0,{Y_2} \ge 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{0}} \cdot {\mathrm{2}} \cdot }}}}{{{\pi _{{\mathrm{0}} \cdot {\mathrm{0}} \cdot }} + {\pi _{{\mathrm{0}} \cdot {\mathrm{1}} \cdot }}}}} \right)= & {} {\lambda _{U = 0,{Y_2} \ge 2}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{1}} \cdot {\mathrm{1}} \cdot }} + {\pi _{{\mathrm{1}} \cdot {\mathrm{2}} \cdot }}}}{{{\pi _{{\mathrm{1}} \cdot {\mathrm{0}} \cdot }}}}} \right)= & {} {\lambda _{U = 1,{Y_2} \ge 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{1}} \cdot {\mathrm{2}} \cdot }}}}{{{\pi _{{\mathrm{1}} \cdot {\mathrm{0}} \cdot }} + {\pi _{{\mathrm{1}} \cdot {\mathrm{1}} \cdot }}}}} \right)= & {} {\lambda _{U = 1,{Y_2} \ge 2}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{0}} \cdot \cdot {\mathrm{1}}}} + {\pi _{{\mathrm{0}} \cdot \cdot {\mathrm{2}}}}}}{{{\pi _{{\mathrm{0}} \cdot \cdot 0}}}}} \right)= & {} {\lambda _{U = 0,{Y_3} \ge 1}} \end{aligned}$$
$$\begin{aligned} {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{0}} \cdot \cdot {\mathrm{2}}}}}}{{{\pi _{{\mathrm{0}} \cdot \cdot {\mathrm{0}}}} + {\pi _{{\mathrm{0}} \cdot \cdot {\mathrm{1}}}}}}} \right)= & {} {\lambda _{U = 0,{Y_3} \ge 2}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{1 \cdot \cdot {\mathrm{1}}}} + {\pi _{{\mathrm{1}} \cdot \cdot {\mathrm{2}}}}}}{{{\pi _{{\mathrm{1}} \cdot \cdot 0}}}}} \right)= & {} {\lambda _{U = 1,{Y_3} \ge 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{{\pi _{{\mathrm{1}} \cdot \cdot {\mathrm{2}}}}}}{{{\pi _{{\mathrm{1}} \cdot \cdot {\mathrm{0}}}} + {\pi _{{\mathrm{1}} \cdot \cdot {\mathrm{1}}}}}}} \right)= & {} {\lambda _{U = 1,{Y_3} \ge 2}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{{\mathrm{0}} \cdot 1{\mathrm{1}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{12}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{21}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{22}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{00}}}})}}{{({\pi _{{\mathrm{0}} \cdot {\mathrm{01}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{02}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{10}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{20}}}}) }}} \right)= & {} {\lambda _{U = 0,{Y_2} \ge 1,{Y_3} \ge 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{{\mathrm{0}} \cdot {\mathrm{00}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{01}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{12}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{22}}}})}}{{({\pi _{{\mathrm{0}} \cdot {\mathrm{10}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{11}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{20}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{21}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{02}}}})}}} \right)= & {} {\lambda _{U = 0,{Y_2} \ge 1,{Y_3} \ge 2}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{{\mathrm{0}} \cdot {\mathrm{00}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{10}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{21}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{22}}}})}}{{({\pi _{{\mathrm{0}} \cdot {\mathrm{01}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{02}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{11}}}} + {{\mathrm{p}}_{{\mathrm{0}} \cdot {\mathrm{12}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{20}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{20}}}})}}} \right)= & {} {\lambda _{U = 0,{Y_2} \ge 2,{Y_3} \ge 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{{\mathrm{0}} \cdot {\mathrm{00}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{01}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{10}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{11}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{20}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{21}}}})}}{{({\pi _{{\mathrm{0}} \cdot {\mathrm{02}}}} + {\pi _{{\mathrm{0}} \cdot {\mathrm{12}}}})({\pi _{{\mathrm{0}} \cdot {\mathrm{22}}}})}}} \right)= & {} {\lambda _{U = 0,{Y_2} \ge 2,{Y_3} \ge 2}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{{\mathrm{1}} \cdot 1{\mathrm{1}}}} + {\pi _{1 \cdot {\mathrm{12}}}} + {\pi _{1 \cdot {\mathrm{21}}}} + {\pi _{1 \cdot {\mathrm{22}}}})({\pi _{1 \cdot {\mathrm{00}}}})}}{{({\pi _{1 \cdot {\mathrm{01}}}} + {\pi _{1 \cdot {\mathrm{02}}}})({\pi _{1 \cdot {\mathrm{10}}}} + {\pi _{1 \cdot {\mathrm{20}}}})}}} \right)= & {} {\lambda _{U = 1,{Y_2} \ge 1,{Y_3} \ge 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{1 \cdot {\mathrm{00}}}} + {\pi _{1 \cdot {\mathrm{01}}}})({\pi _{1 \cdot {\mathrm{12}}}} + {\pi _{1 \cdot {\mathrm{22}}}})}}{{({\pi _{1 \cdot {\mathrm{10}}}} + {\pi _{1 \cdot {\mathrm{11}}}} + {\pi _{1 \cdot {\mathrm{20}}}} + {\pi _{1 \cdot {\mathrm{21}}}})({\pi _{1 \cdot {\mathrm{02}}}})}}} \right)= & {} {\lambda _{U = 1,{Y_2} \ge 1,{Y_3} \ge 2}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{1 \cdot {\mathrm{00}}}} + {\pi _{1 \cdot {\mathrm{10}}}})({\pi _{{\mathrm{1}} \cdot {\mathrm{21}}}} + {\pi _{1 \cdot {\mathrm{22}}}})}}{{({\pi _{1 \cdot {\mathrm{01}}}} + {\pi _{1 \cdot {\mathrm{02}}}} + {\pi _{1 \cdot {\mathrm{11}}}} + {\pi _{{\mathrm{1}} \cdot {\mathrm{12}}}})({\pi _{1 \cdot {\mathrm{20}}}} + {\pi _{1 \cdot {\mathrm{20}}}})}}} \right)= & {} {\lambda _{U = 1,{Y_2} \ge 2,{Y_3} \ge 1}} \nonumber \\ {\mathrm{log}}\left( {\frac{{({\pi _{{\mathrm{1}} \cdot {\mathrm{00}}}} + {\pi _{{\mathrm{1}} \cdot {\mathrm{01}}}} + {\pi _{{\mathrm{1}} \cdot {\mathrm{10}}}} + {\pi _{1 \cdot {\mathrm{11}}}})({\pi _{1 \cdot {\mathrm{22}}}})}}{{({\pi _{1 \cdot {\mathrm{02}}}} + {\pi _{1 \cdot {\mathrm{12}}}})({\pi _{1 \cdot {\mathrm{20}}}} + {\pi _{1 \cdot {\mathrm{21}}}})}}} \right)= & {} {\lambda _{U = 1,{Y_2} \ge 2,{Y_3} \ge 2}} \nonumber \\ \end{aligned}$$
(17)

Notice that the first one captures the class membership of an individual to a specific latent class. The subsequent ten \(\lambda \)s model the conditional marginal distribution of \(Y_1\), \(Y_2\) and \(Y_3\), while the remaining \(\lambda \)s model the conditional residual association between \(Y_2\) and \(Y_3\) as described by Eq. 5.

To better understand how the \(\lambda \)s modelling the conditional marginal distribution of \(Y_1\), \(Y_2\) and \(Y_3\), depend on \(\alpha \), \(\beta \) and \(\delta \), let us consider the \(\lambda \)s for \(Y_2\).

$$\begin{aligned} {\lambda _{U = 0,{Y_2} \ge 1}}= & {} \alpha _2(u=0) + {\varvec{x}}^\prime _i {\varvec{\beta }}_2\nonumber \\ {\lambda _{U = 0,{Y_2} \ge 2}}= & {} \alpha _2(u=0) + \delta _{2,2}+ {\varvec{x}}^\prime _i {\varvec{\beta }}_2 \nonumber \\ {\lambda _{U = 1,{Y_2} \ge 1}}= & {} \alpha _2(u=1) + {\varvec{x}}^\prime _i {\varvec{\beta }}_2 \nonumber \\ {\lambda _{U = 1,{Y_2} \ge 2}}= & {} \alpha _2(u=1) + \delta _{2,2} +{\varvec{x}}^\prime _i {\varvec{\beta }}_2 \end{aligned}$$
(18)

The set of logits described in the system of equations above recalls the standard ordered logit model, where the random intercepts \(\alpha \)s capture the effect of the latent, while the threshold \(\delta \)s are assumed to be linearly additive and the \(\beta \)s do not differ between both the categories of the latent classes. Although this assumption can be relaxed, this is in practice unfeasible since the number of parameters increase substantially making estimation unstable.

Estimation of the system of Eq. (17) requires a mapping from \({\varvec{\pi }}\) to the vectors of \(\lambda \) (see Bartolucci et al. 2007) to recover the joint distribution of (\({\varvec{Y}},U\)), and to use the EM to deal with the unobservable latent U. The EM algorithm consists of two steps. The idea behind these steps is that, if the joint frequency table (UY) were known, maximum likelihood estimation would be equivalent to estimation of a regression model within the multinomial distribution. In the expectation step, the posterior probability of latent U given the observed configuration \(Y_1,\ldots ,Y_K\) is computed. The M-step maximizes a likelihood function via a Fisher scoring algorithm. Details on estimation and identification of model can be derived by looking at Bartolucci and Forcina (2006) and Forcina (2008). Since the joint distribution corresponding to a parameter vector cannot be computed with an explicit formula, the Fisher scoring algorithm for marginal models could be computationally demanding even when stratification of the observations based on all combinations of all covariates is implemented. This is especially more relevant when the number of observation is relatively large and many Y and latent classes are involved. A modified Fisher scoring algorithm has been proposed by Forcina (2017), and it could be used when estimation time increases substantially as long as empirical information matrix is positive definite.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li Donni, P., Thomas, R. Latent class models for multiple ordered categorical health data: testing violation of the local independence assumption. Empir Econ 59, 1903–1931 (2020). https://doi.org/10.1007/s00181-019-01685-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-019-01685-6

Keywords

Navigation