Skip to main content
Log in

Minimum \(\phi \)-Divergence Estimation in Constrained Latent Class Models for Binary Data

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The main purpose of this paper is to introduce and study the behavior of minimum \(\phi \)-divergence estimators as an alternative to the maximum-likelihood estimator in latent class models for binary items. As it will become clear below, minimum \(\phi \)-divergence estimators are a natural extension of the maximum-likelihood estimator. The asymptotic properties of minimum \(\phi \)-divergence estimators for latent class models for binary data are developed. Finally, to compare the efficiency and robustness of these new estimators with that obtained through maximum likelihood when the sample size is not big enough to apply the asymptotic results, we have carried out a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. For the sake of simplicity we have adopted the usual terminology 1 for correct answers and 0 otherwise. However, the model applies for any dichotomous question (low–high, success–fail, agree–disagree,...). For example, in Sect. 4 we are considering a situation whose possible answers are “low” and “high.”

  2. The variant we used in step 2 consists in permuting randomly the \(t+u\) parameters \((\varvec{\lambda }, \varvec{\eta })\) for each initial point \(i.\) The additional improvement consists in seeking a better point in the vector from the initial point to the final point obtained through the full iteration of the Hooke and Jeeves algorithm in double or half spacing steps toward exterior or interior relative to this vector. At most we need \(2(t+u)+4\) evaluations of \(D_{\phi }\). The criterion \(D_{\phi }^{in}\) is used in order to discard non promising initial points from a finer and most costly improvement.

References

  • Abar, B., & Loken, E. (2010). Self-regulated learning and self-directed study in a pre-college sample. Learning and Individual Differences, 20, 25–29.

    Article  PubMed Central  PubMed  Google Scholar 

  • Berkson, J. (1980). Minimum chi-square, not maximum likelihood!. Annals of Statisitcs, 8(3), 482–485.

    Google Scholar 

  • Biemer, P. (2011). Latent class analysis and survey error. Hoboken, NJ: Wiley.

    Google Scholar 

  • Caldwell, L., Bradley, S., & Coffman, D. (2009). A person-centered approach to individualizing a scool-based universal preventive intervention. American Journal of Drug and Alcohol Abuse, 35(4), 214–219.

    Article  PubMed Central  PubMed  Google Scholar 

  • Clogg, C. (1995). Latent class models: Recent developments and prospects for the future. In C. G. Arminger & M. Sobol (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 311–352). New York: Plenum.

    Chapter  Google Scholar 

  • Coffman, D., Patrick, M., Polen, L., Rhoades, B., & Ventura, A. (2007). Why do high school seniors drink? Implication for a targeted approach to intervention. Prevention Science, 8, 1–8.

    Article  Google Scholar 

  • Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press.

    Google Scholar 

  • Collins, L., & Lanza, S. (2010). Latent class and latent transition analysis for the social, behavioral, and health sciences. New York: Wiley.

    Google Scholar 

  • Cressie, N., & Pardo, L. (2002). Phi-divergence statisitcs. In A. H. Elshaarawi & W. W. Plegorich (Eds.), Encyclopedia of environmetrics (Vol. 13, pp. 1551–1555). New York: Wiley.

    Google Scholar 

  • Cressie, N., & Read, T. R. C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Series B, 8, 440–464.

    Google Scholar 

  • Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2, 299–318.

    Google Scholar 

  • Feldman, B., Masyn, K., & Conger, R. (2009). New approaches to studying behaviors: A comparison of methods for modelling longitudinal, categorical and adolescent drinking data. Development Psycology, 45(3), 652–676.

    Article  Google Scholar 

  • Formann, A. (1976). Schätzung der Parameter in Lazarsfeld Latent-Class Analysis. In Res. Bull., number 18. Institut für Psycologie der Universität Wien. In German.

  • Formann, A. (1977). Log-linear latent class analyse. In Res. Bull., number 20. Institut für Psycologie der Universität Wien. In German.

  • Formann, A. (1978). A note on parametric estimation for Lazarsfeld’s latent class analysis. Psychometrika, 48, 123–126.

    Article  Google Scholar 

  • Formann, A. (1982). Linear logistic latent class analysis. Biometrical Journal, 24, 171–190.

    Article  Google Scholar 

  • Formann, A. (1985). Constrained latent class models: Theory and applications. British Journal of Mathematics and Statistical Psicology, 38, 87–111.

    Article  Google Scholar 

  • Formann, A. (1992). Linear logistic latent class analysis for polytomous data. Journal of the Amearican Statistical Association, 87, 476–486.

    Article  Google Scholar 

  • Gerber, M., Witterkind, A., Grote, G., & Staffelbach, B. (2009). Exploring types of career orientation: A latent class analysis approach. Journal of Vocational Behavior, 75, 303–318.

    Article  Google Scholar 

  • Gill, P. E. & Murray, W. (1979). Conjugate-gradient methods for large-scale nonlinear optimization. Technical Report SOL 79–15. Department of Operations Research, Stanford University.

  • Goodman, L. A. (1974). Exploratory latent structure analysis using Goth identifiable and unidentifiable models. Biometrika, 61, 215–231.

    Article  Google Scholar 

  • Hagenaars, J. A., & Cutcheon, A. L. M. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Hooke, R., & Jeeves, T. A. (1961). Direct search solution of numerical and statistical problems. Journal of the Association for Computing Machinery, 8, 212–229.

    Article  Google Scholar 

  • Langeheine, R., & Rost, J. (1988). Latent trait and latent class models. New York: Plenum Press.

    Book  Google Scholar 

  • Laska, M., Pash, K., Lust, K., Story, M., & Ehlinger, E. (2009). Latent class analysis of lifestyle characteristics and health risk behaviors among college youth. Prevention Sciences, 10, 376–386.

    Article  Google Scholar 

  • Lazarsfeld, P., & Henry, N. (1968). Latent structure analysis. Boston: Houghton-Mifflin.

    Google Scholar 

  • Lazarsfeld, P. (1950). The logical and mathematical foundation of latent structure analysis. Studies in social psycology in world war II (Vol. IV, pp. 362–412). Princeton, NJ: Princeton University Press.

    Google Scholar 

  • McHugh, R. (1956). Efficient estimation and local identification in latent class analysis. Psychometrika, 21, 331–347.

    Article  Google Scholar 

  • Morales, D., Pardo, L., & Vajda, I. (1995). Asymptotic divergence of estimates of discrete distributions. Jounal of Statistical Planning and Inference, 48, 347–369.

    Article  Google Scholar 

  • Nylund, K., Bellmore, A., Nishina, A., & Grahan, S. (2007). Subtypes, severity and structural stability of peer victimization: What does latent class analysis say? Child Prevention, 78, 1706–1722.

    Google Scholar 

  • Pardo, L. (2006). Statistical inference based on divergence measures. New York: Chapman & Hall CRC.

    Google Scholar 

  • Powell, M. (1970). A hybrid method for nonlinear algebraic equations. In P. Rabinowitz (Ed.), Numerical methods for nonlinear algebraic equations. London: Gordon and Breach.

    Google Scholar 

  • Rost, J., & Langeheine, R. (1997). Applications of latent trait and latent class models in the social sciences. Münster: Waxmann.

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by Grant MTM2012-33740. Part of the computations of this work were performed in EOLO, the HPC of Climate Change of the International Campus of Excellence (CEI) of Moncloa, funded by MECD and MICINN; this is a contribution of CEI Moncloa. We also thank the anonymous referees for their comments and remarks that have improved the final version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Miranda.

Appendix

Appendix

Remark 1

We are going to develop the calculations for \({{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }}}\) and \({{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \eta _{\beta }}}.\)

For \({{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }}}\) note that

$$\begin{aligned} p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta }) \!&= \! \sum _{j=1}^m w_j \prod _{i=1}^k \left( {exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r \!+\! c_{ji}}\right) \over 1 \!+\!exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r \!+\! c_{ji}}\right) }\right) ^{y_{\nu i}} \left( 1\!-\! {exp \left( { \sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \over 1 +exp \left( { \sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } \right) ^{1-y_{\nu i}} \\&= \sum _{j=1}^m w_j \prod _{i=1}^k {exp \left( y_{\nu i}\left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } \end{aligned}$$

Now,

$$\begin{aligned} {\partial \left( {exp \left( y_{\nu i}\left( { \sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } \right) \over \partial \lambda _{\alpha }}&= {exp \left( y_{\nu i}\left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) q_{ji\alpha }\over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) }\\&\quad \times \left[ y_{\nu i} - {exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) }\right] \\&= {exp \left( y_{\nu i}\left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } q_{ji\alpha }\left( y_{\nu i} - p_{ji}\right) , \end{aligned}$$

whence

$$\begin{aligned} {\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }} = \sum _{j=1}^m w_j \left[ Pr (\mathbf{y_{\nu }} | P_{\nu }\in C_j ) \sum _{i=1}^k q_{ji\alpha } (y_{\nu i}- p_{ji})\right] , \, \, \alpha =1,\ldots , t. \end{aligned}$$

Similarly,

$$\begin{aligned} p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta }) = \sum _{j=1}^m {exp ({ \sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j}})\over {\sum \nolimits _{j=1}^m exp (\sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j})}} Pr (\mathbf{y_{\nu }} | P_{\nu }\in C_j ). \end{aligned}$$

Now,

$$\begin{aligned} {\partial \left( {exp \left( {\sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j}}\right) \over {\sum \nolimits _{h=1}^m exp \left( \sum \nolimits _{r=1}^u v_{hr} \eta _r + d_{h}\right) }} \right) \over \partial \eta _{\beta } }&= {exp \left( {\sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j}}\right) \over {\sum \nolimits _{h=1}^m exp \left( \sum \nolimits _{r=1}^u v_{hr} \eta _r + d_{h}\right) }}\\&\times \left[ v_{j\beta } - {{\sum \nolimits _{h=1}^m exp (\sum _{r=1}^u v_{hr} \eta _r + d_{h}})v_{h\beta }\over {\sum \nolimits _{h=1}^m exp \left( \sum \nolimits _{r=1}^u v_{hr} \eta _r + d_{h}\right) }}\right] \\&= w_j \left[ v_{j\beta } - \sum _{h=1}^m w_h v_{h\beta }\right] , \end{aligned}$$

whence

$$\begin{aligned} {\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \eta _{\beta }} = \sum _{j=1}^m w_j Pr (\mathbf{y_{\nu }} | P_{\nu }\in C_j )\left[ v_{j\beta } - \sum _{h=1}^m w_h v_{h\beta } \right] ,\, \, \beta =1,\ldots , u. \end{aligned}$$

Proof of Theorem 1

Let \(l^{2^k}\) be the interior of the \(2^k\)-dimensional unit cube; then, the interior of \(\Delta _{2^k}\) is contained in \(l^{2^k}.\) Let \(W\) be a neighborhood of \((\varvec{\lambda }_0, \varvec{\eta }_0 ),\) the true value of the unknown parameter \((\varvec{\lambda }, \varvec{\eta }),\) on which

has continuous second partial derivatives. Let

$$\begin{aligned} \mathbf{F}:=(F_1,\ldots , F_{t+u}): l^{2^k} \times W \rightarrow \mathbb {R}^{t+u} \end{aligned}$$

whose components \(F_j,\, j=1,\ldots , t+u\) are defined by

$$\begin{aligned} F_j(\tilde{p}_1,\ldots , \tilde{p}_{2^k}; \lambda _1,\ldots , \lambda _t; \eta _1,\ldots , \eta _u ):= {\partial D_{\phi }({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta })) \over \partial s_j},\, j=1,\ldots , t+u, \end{aligned}$$

where \(s_j\) is defined in (8).

It holds

$$\begin{aligned} F_j(p_1(\varvec{\lambda }_0 , \varvec{\eta }_0 ),\ldots , p_{2^k}(\varvec{\lambda }_0 , \varvec{\eta }_0 );\lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)=0,\, \forall j=1,\ldots , t+u \end{aligned}$$

due to

$$\begin{aligned} {\partial D_{\phi } ({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial \lambda _{\alpha }}&= \sum _{\nu =1}^{2^k} \left\{ \phi \left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) - {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) \right\} {\partial p_{\nu }( \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }} ,\, \alpha =1,\ldots , t.\\ {\partial D_{\phi } ({\hat{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial \eta _{\beta }}&= \sum _{\nu =1}^{2^k} \left\{ \phi \left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) - {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) \right\} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial \eta _{\beta }} ,\, \beta =1,\ldots , u. \end{aligned}$$

In the following we shall rewrite the two previous expressions by

$$\begin{aligned} {\partial D_{\phi } ({ \tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}},\, j=1,\ldots , t+u. \end{aligned}$$

Since

$$\begin{aligned} {\partial \over \partial s_r}\left( {\partial D_{\phi } ({ \tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right)&= -\sum _{\nu =1}^{2^k} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })^2} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r}} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{j}} \\&+ \sum _{\nu =1}^{2^k} \phi ''\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })^2} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r}} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{j}} {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })}\\&+ \sum _{\nu =1}^{2^k} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })^2} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r}} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{j}} \\&+ \sum _{\nu =1}^{2^k} {\partial ^2 p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r} s_j} \left\{ \phi \left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) - \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })}\right\} , \end{aligned}$$

and denoting \(\pi _{\nu }=p_{\nu }(\varvec{\lambda }_0 , \varvec{\eta }_0), \nu =1,\ldots , 2^k,\) the \((t+u)\times (t+u)\) matrix \(\mathbf{J}_\mathbf{F}\) associated with function \(\mathbf{F}\) at point \((\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))\) is given by

$$\begin{aligned} {\partial \mathbf{F}\over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )}&= \left( {\partial \mathbf{F}\over \partial (\varvec{\lambda }, \varvec{\eta })} \right) _{({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta }))= (\pi _1,\ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)} \\&= \left( \left( {\partial \over \partial s_r}\left( {\partial D_{\phi } ({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right) \right) _{\mathop {r=1,\ldots , t+u}\limits ^{j=1,\ldots , t+u}} \right) _{({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta })) =(\pi _1,\ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)} \\&= \phi ''(1) \left( \sum _{l=1}^{2^k} {1\over p_l(\varvec{\lambda }_0 , \varvec{\eta }_0 )}{\partial p_l(\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial s_{r}} {\partial p_l(\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial s_{j}} \right) _{\mathop {r=1,\ldots , t+u}\limits ^{j=1,\ldots , t+u}} \end{aligned}$$

To get the last expression we are using that \(\phi (1)=\phi '(1)=0.\) Recall that if \(\mathbf{B}\) is a \(p\times q\) matrix with \(rank(\mathbf{B})=p\) and \(\mathbf{C}\) is a \(q\times s\) matrix with \(rank(\mathbf{C})=q,\) then \(rank(\mathbf{BC})=p.\) Taking

$$\begin{aligned} \mathbf{B}= \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T, \, \, \mathbf{C}=\mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{-{1\over 2}}, \end{aligned}$$

it follows that \(\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T=\mathbf{BC}\) has rank \(t+u\) applying the fourth condition of Birch. Also,

$$\begin{aligned} rank (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))=rank (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T)=rank (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))=t+u. \end{aligned}$$

Therefore, the \((t+u)\times (t+u)\) matrix \({\partial \mathbf{F}\over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0)}\) is nonsingular at \((\pi _1, \ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)\).

Applying the Implicit Function Theorem, there exists a neighborhood \(U\) of \((\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))\) such that the matrix \(\mathbf{J}_\mathbf{F}\) is nonsingular (in our case \(\mathbf{J}_\mathbf{F}\) at \((\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))\) is positive definite and then it is continuously differentiable). Also, there exists a continuously differentiable function

$$\begin{aligned} \tilde{\varvec{\theta }}:A\subset l^{2^k} \rightarrow \mathbb {R}^{t+u} \end{aligned}$$

such that \(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)\in A\) and

$$\begin{aligned} \left\{ ({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta }))\in U : \mathbf{F}({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta }))=0\right\} =\left\{ ({\tilde{\mathbf{p}}}, \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}})): { \tilde{\mathbf{p}}}\in A \right\} . \end{aligned}$$
(12)

We can observe that \(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))\) is an argmin of

$$\begin{aligned} \psi (\varvec{\lambda }, \varvec{\eta }):= D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\varvec{\lambda }, \varvec{\eta })) \end{aligned}$$

because \(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)\in A\) and then

$$\begin{aligned} \mathbf{F}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))) = {\partial D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)))) \over \partial (\varvec{\lambda }, \varvec{\eta })} = \mathbf{0}. \end{aligned}$$

On the other hand, applying (12),

$$\begin{aligned} (\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))) \in U, \end{aligned}$$

and then \(\mathbf{J}_\mathbf{F}\) is positive definite at \( (\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))).\) Therefore,

$$\begin{aligned} D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)))) = \inf _{(\varvec{\lambda }, \varvec{\eta })\in \Theta } D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\varvec{\lambda }, \varvec{\eta })), \end{aligned}$$

and by the \(\phi \)-divergence properties \(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))= (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T,\) and

$$\begin{aligned} {\partial \mathbf{F}\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) } - {\partial \mathbf{F} \over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0)} {\partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)} =\mathbf{0}. \end{aligned}$$

Further, we know that

$$\begin{aligned} {\partial \mathbf{F} \over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )} = \phi ''(1) \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \end{aligned}$$

and we shall establish later that the \((t+u)\times 2^k\) matrix \( {\partial \mathbf{F} \over \partial \varvec{\pi }}\) is

$$\begin{aligned} {\partial \mathbf{F} \over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)} = \phi '' (1) \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}}. \end{aligned}$$
(13)

Therefore, the \((t+u)\times 2^k\) matrix \({\partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) }\) is

$$\begin{aligned} {\partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) } = (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}}. \end{aligned}$$

The Taylor expansion of the function \(\tilde{\varvec{\theta }}\) around \(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \) yields

$$\begin{aligned} \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}}) = \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) + \left( {\partial \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}}) \over {\tilde{\mathbf{p}}}}\right) _{{ \tilde{\mathbf{p}}}=\varvec{\pi }} ({\tilde{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) + o(\Vert {\tilde{\mathbf{p}}} -\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ). \end{aligned}$$

As \(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) = (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T,\) we obtain from here

$$\begin{aligned} \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}})&= (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T+ (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 }^{-{1\over 2}} ({ \tilde{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))\\&+ o(\Vert { \tilde{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ). \end{aligned}$$

We know that \( {\hat{\mathbf{p}}}{\overset{a.s.}{\longrightarrow }} \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 ),\) so that \({\hat{\mathbf{p}}}\in A\) and, consequently, \( \tilde{\varvec{\theta }}({\hat{\mathbf{p}}})\) is the unique solution of the system of equations

$$\begin{aligned} {\partial D_{\phi }( {\hat{\mathbf{p}}}, \mathbf{p} ( \tilde{\varvec{\theta }}({\hat{\mathbf{p}}})))\over s_j} =0,\, j=1,\ldots , t+u, \end{aligned}$$

and also \(( {\hat{\mathbf{p}}}, \tilde{\varvec{\theta }}({\hat{\mathbf{p}}}))\in U.\) Therefore, \(\tilde{\varvec{\theta }}({\hat{\mathbf{p}}})\) is the minimum \(\phi \)-divergence estimator, \(\hat{\varvec{\theta }}_{\phi }\), satisfying the relation

$$\begin{aligned} \hat{\varvec{\theta }}_{\phi }&= (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T+ (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}} ({\hat{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 ))\\&+ o(\Vert {\hat{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )\Vert ). \end{aligned}$$

Finally, we are going to establish (13). We compute the \((i,j)\)-th element of the \((t+u)\times 2^k\) matrix \({\partial \mathbf{F}\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) }.\)

$$\begin{aligned} {\partial \over \partial p_i} \left( {\partial D_{\phi } ({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right)&= {\partial \over \partial p_i} \left( \sum _{l=1}^{2^k} \left\{ \phi \left( {\tilde{p}_{l} \over p_l(\varvec{\lambda }, \varvec{\eta })} \right) - \phi ' \left( {\tilde{p}_{l}\over p_l(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{l}\over p_l(\varvec{\lambda }, \varvec{\eta })}\right\} {\partial p_l(\varvec{\lambda }, \varvec{\eta })\over \partial s_j} \right) \\&= {1\over p_i(\varvec{\lambda }, \varvec{\eta })} \left( -{p_i\over p_i (\varvec{\lambda }, \varvec{\eta })} \phi ''\left( {p_i\over p_i (\varvec{\lambda }, \varvec{\eta })}\right) \right) {\partial p_i(\varvec{\lambda }, \varvec{\eta })\over \partial s_j} \end{aligned}$$

and for \((\pi _1,\ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)\) we have

$$\begin{aligned} {\partial \over \partial p_i} \left( {\partial D_{\phi } ({ \tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right) = {1\over p_i (\varvec{\lambda }_0 , \varvec{\eta }_0 )} \phi ''\left( 1\right) {\partial p_i(\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial s_j}. \end{aligned}$$

Since \(\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)=\mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}} \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0),\) then (13) holds. \(\square \)

Proof of Theorem 2

Applying the previous theorem, it holds

$$\begin{aligned} \sqrt{N}(\hat{\varvec{\theta }}_{\phi } -(\varvec{\lambda }_0 , \varvec{\eta }_0)^T)&= \left( \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0 )^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \right) ^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{-{1\over 2}} \sqrt{N} (\hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))\\&+\, \sqrt{N} \, \, \, o(\Vert \hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ). \end{aligned}$$

Note that

$$\begin{aligned} \sqrt{N} \, \, \, o(\Vert \hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ) = o_p(1). \end{aligned}$$

On the other hand, as \(\hat{\mathbf{p}}\) is the sample proportion, we can apply the Central Limit Theorem to conclude

$$\begin{aligned} \sqrt{N} (\hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) {\overset{L}{\longrightarrow }} \mathcal{N}(\mathbf{0} , \varvec{\Sigma }_{\mathbf{p}(\varvec{\lambda }_0, \varvec{\eta }_0)} ), \end{aligned}$$

where \(\varvec{\Sigma }_{\mathbf{p}(\varvec{\lambda }_0, \varvec{\eta }_0)} \) is given by

$$\begin{aligned} \varvec{\Sigma }_{\mathbf{p}(\varvec{\lambda }_0, \varvec{\eta }_0)} = \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T. \end{aligned}$$

Therefore, it follows

$$\begin{aligned} \sqrt{N}(\hat{\varvec{\theta }}_{\phi } -(\varvec{\lambda }_0 , \varvec{\eta }_0 )) {\overset{L}{\longrightarrow }} \mathcal{N}(\mathbf{0} , \varvec{\Sigma }^*), \end{aligned}$$

where \(\varvec{\Sigma }^* \) is given by

$$\begin{aligned} \varvec{\Sigma }^* = \left( \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0 )^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \right) ^{-1} - \mathbf{B} \mathbf{B}^T \end{aligned}$$

with \(\mathbf{B}:=\left( \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0 )^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \right) ^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{1\over 2}.\)

It is not difficult to see that

$$\begin{aligned} \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{{1\over 2}} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)=0, \end{aligned}$$

whence \(\mathbf{B}=\mathbf{0}\) and the result holds. \(\square \)

Proof of Theorem 3

Using Theorem 2, it suffices to apply the delta method. Then, we can conclude that

$$\begin{aligned} \sqrt{N}(\mathbf{p}( \hat{\varvec{\theta }}_{\phi })- \mathbf{p} (\varvec{\lambda }_0, \varvec{\eta }_0 )) {\overset{L}{\longrightarrow }} \mathcal{N}(\mathbf{0} , \nabla \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\left( \mathbf{A}(\varvec{\lambda }_0, \varvec{\eta }_0 )^t \mathbf{A}(\varvec{\lambda }_0, \varvec{\eta }_0 ) \right) ^{-1} \nabla \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)). \end{aligned}$$

Now, as \(\nabla \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)= \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0)\), the theorem is proved. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Felipe, A., Miranda, P. & Pardo, L. Minimum \(\phi \)-Divergence Estimation in Constrained Latent Class Models for Binary Data. Psychometrika 80, 1020–1042 (2015). https://doi.org/10.1007/s11336-015-9450-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-015-9450-4

Keywords

Navigation