Abstract
The main purpose of this paper is to introduce and study the behavior of minimum \(\phi \)-divergence estimators as an alternative to the maximum-likelihood estimator in latent class models for binary items. As it will become clear below, minimum \(\phi \)-divergence estimators are a natural extension of the maximum-likelihood estimator. The asymptotic properties of minimum \(\phi \)-divergence estimators for latent class models for binary data are developed. Finally, to compare the efficiency and robustness of these new estimators with that obtained through maximum likelihood when the sample size is not big enough to apply the asymptotic results, we have carried out a simulation study.
Similar content being viewed by others
Notes
For the sake of simplicity we have adopted the usual terminology 1 for correct answers and 0 otherwise. However, the model applies for any dichotomous question (low–high, success–fail, agree–disagree,...). For example, in Sect. 4 we are considering a situation whose possible answers are “low” and “high.”
The variant we used in step 2 consists in permuting randomly the \(t+u\) parameters \((\varvec{\lambda }, \varvec{\eta })\) for each initial point \(i.\) The additional improvement consists in seeking a better point in the vector from the initial point to the final point obtained through the full iteration of the Hooke and Jeeves algorithm in double or half spacing steps toward exterior or interior relative to this vector. At most we need \(2(t+u)+4\) evaluations of \(D_{\phi }\). The criterion \(D_{\phi }^{in}\) is used in order to discard non promising initial points from a finer and most costly improvement.
References
Abar, B., & Loken, E. (2010). Self-regulated learning and self-directed study in a pre-college sample. Learning and Individual Differences, 20, 25–29.
Berkson, J. (1980). Minimum chi-square, not maximum likelihood!. Annals of Statisitcs, 8(3), 482–485.
Biemer, P. (2011). Latent class analysis and survey error. Hoboken, NJ: Wiley.
Caldwell, L., Bradley, S., & Coffman, D. (2009). A person-centered approach to individualizing a scool-based universal preventive intervention. American Journal of Drug and Alcohol Abuse, 35(4), 214–219.
Clogg, C. (1995). Latent class models: Recent developments and prospects for the future. In C. G. Arminger & M. Sobol (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 311–352). New York: Plenum.
Coffman, D., Patrick, M., Polen, L., Rhoades, B., & Ventura, A. (2007). Why do high school seniors drink? Implication for a targeted approach to intervention. Prevention Science, 8, 1–8.
Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press.
Collins, L., & Lanza, S. (2010). Latent class and latent transition analysis for the social, behavioral, and health sciences. New York: Wiley.
Cressie, N., & Pardo, L. (2002). Phi-divergence statisitcs. In A. H. Elshaarawi & W. W. Plegorich (Eds.), Encyclopedia of environmetrics (Vol. 13, pp. 1551–1555). New York: Wiley.
Cressie, N., & Read, T. R. C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Series B, 8, 440–464.
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2, 299–318.
Feldman, B., Masyn, K., & Conger, R. (2009). New approaches to studying behaviors: A comparison of methods for modelling longitudinal, categorical and adolescent drinking data. Development Psycology, 45(3), 652–676.
Formann, A. (1976). Schätzung der Parameter in Lazarsfeld Latent-Class Analysis. In Res. Bull., number 18. Institut für Psycologie der Universität Wien. In German.
Formann, A. (1977). Log-linear latent class analyse. In Res. Bull., number 20. Institut für Psycologie der Universität Wien. In German.
Formann, A. (1978). A note on parametric estimation for Lazarsfeld’s latent class analysis. Psychometrika, 48, 123–126.
Formann, A. (1982). Linear logistic latent class analysis. Biometrical Journal, 24, 171–190.
Formann, A. (1985). Constrained latent class models: Theory and applications. British Journal of Mathematics and Statistical Psicology, 38, 87–111.
Formann, A. (1992). Linear logistic latent class analysis for polytomous data. Journal of the Amearican Statistical Association, 87, 476–486.
Gerber, M., Witterkind, A., Grote, G., & Staffelbach, B. (2009). Exploring types of career orientation: A latent class analysis approach. Journal of Vocational Behavior, 75, 303–318.
Gill, P. E. & Murray, W. (1979). Conjugate-gradient methods for large-scale nonlinear optimization. Technical Report SOL 79–15. Department of Operations Research, Stanford University.
Goodman, L. A. (1974). Exploratory latent structure analysis using Goth identifiable and unidentifiable models. Biometrika, 61, 215–231.
Hagenaars, J. A., & Cutcheon, A. L. M. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.
Hooke, R., & Jeeves, T. A. (1961). Direct search solution of numerical and statistical problems. Journal of the Association for Computing Machinery, 8, 212–229.
Langeheine, R., & Rost, J. (1988). Latent trait and latent class models. New York: Plenum Press.
Laska, M., Pash, K., Lust, K., Story, M., & Ehlinger, E. (2009). Latent class analysis of lifestyle characteristics and health risk behaviors among college youth. Prevention Sciences, 10, 376–386.
Lazarsfeld, P., & Henry, N. (1968). Latent structure analysis. Boston: Houghton-Mifflin.
Lazarsfeld, P. (1950). The logical and mathematical foundation of latent structure analysis. Studies in social psycology in world war II (Vol. IV, pp. 362–412). Princeton, NJ: Princeton University Press.
McHugh, R. (1956). Efficient estimation and local identification in latent class analysis. Psychometrika, 21, 331–347.
Morales, D., Pardo, L., & Vajda, I. (1995). Asymptotic divergence of estimates of discrete distributions. Jounal of Statistical Planning and Inference, 48, 347–369.
Nylund, K., Bellmore, A., Nishina, A., & Grahan, S. (2007). Subtypes, severity and structural stability of peer victimization: What does latent class analysis say? Child Prevention, 78, 1706–1722.
Pardo, L. (2006). Statistical inference based on divergence measures. New York: Chapman & Hall CRC.
Powell, M. (1970). A hybrid method for nonlinear algebraic equations. In P. Rabinowitz (Ed.), Numerical methods for nonlinear algebraic equations. London: Gordon and Breach.
Rost, J., & Langeheine, R. (1997). Applications of latent trait and latent class models in the social sciences. Münster: Waxmann.
Acknowledgments
This work was partially supported by Grant MTM2012-33740. Part of the computations of this work were performed in EOLO, the HPC of Climate Change of the International Campus of Excellence (CEI) of Moncloa, funded by MECD and MICINN; this is a contribution of CEI Moncloa. We also thank the anonymous referees for their comments and remarks that have improved the final version of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Remark 1
We are going to develop the calculations for \({{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }}}\) and \({{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \eta _{\beta }}}.\)
For \({{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }}}\) note that
Now,
whence
Similarly,
Now,
whence
Proof of Theorem 1
Let \(l^{2^k}\) be the interior of the \(2^k\)-dimensional unit cube; then, the interior of \(\Delta _{2^k}\) is contained in \(l^{2^k}.\) Let \(W\) be a neighborhood of \((\varvec{\lambda }_0, \varvec{\eta }_0 ),\) the true value of the unknown parameter \((\varvec{\lambda }, \varvec{\eta }),\) on which
has continuous second partial derivatives. Let
whose components \(F_j,\, j=1,\ldots , t+u\) are defined by
where \(s_j\) is defined in (8).
It holds
due to
In the following we shall rewrite the two previous expressions by
Since
and denoting \(\pi _{\nu }=p_{\nu }(\varvec{\lambda }_0 , \varvec{\eta }_0), \nu =1,\ldots , 2^k,\) the \((t+u)\times (t+u)\) matrix \(\mathbf{J}_\mathbf{F}\) associated with function \(\mathbf{F}\) at point \((\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))\) is given by
To get the last expression we are using that \(\phi (1)=\phi '(1)=0.\) Recall that if \(\mathbf{B}\) is a \(p\times q\) matrix with \(rank(\mathbf{B})=p\) and \(\mathbf{C}\) is a \(q\times s\) matrix with \(rank(\mathbf{C})=q,\) then \(rank(\mathbf{BC})=p.\) Taking
it follows that \(\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T=\mathbf{BC}\) has rank \(t+u\) applying the fourth condition of Birch. Also,
Therefore, the \((t+u)\times (t+u)\) matrix \({\partial \mathbf{F}\over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0)}\) is nonsingular at \((\pi _1, \ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)\).
Applying the Implicit Function Theorem, there exists a neighborhood \(U\) of \((\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))\) such that the matrix \(\mathbf{J}_\mathbf{F}\) is nonsingular (in our case \(\mathbf{J}_\mathbf{F}\) at \((\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))\) is positive definite and then it is continuously differentiable). Also, there exists a continuously differentiable function
such that \(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)\in A\) and
We can observe that \(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))\) is an argmin of
because \(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)\in A\) and then
On the other hand, applying (12),
and then \(\mathbf{J}_\mathbf{F}\) is positive definite at \( (\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))).\) Therefore,
and by the \(\phi \)-divergence properties \(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))= (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T,\) and
Further, we know that
and we shall establish later that the \((t+u)\times 2^k\) matrix \( {\partial \mathbf{F} \over \partial \varvec{\pi }}\) is
Therefore, the \((t+u)\times 2^k\) matrix \({\partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) }\) is
The Taylor expansion of the function \(\tilde{\varvec{\theta }}\) around \(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \) yields
As \(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) = (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T,\) we obtain from here
We know that \( {\hat{\mathbf{p}}}{\overset{a.s.}{\longrightarrow }} \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 ),\) so that \({\hat{\mathbf{p}}}\in A\) and, consequently, \( \tilde{\varvec{\theta }}({\hat{\mathbf{p}}})\) is the unique solution of the system of equations
and also \(( {\hat{\mathbf{p}}}, \tilde{\varvec{\theta }}({\hat{\mathbf{p}}}))\in U.\) Therefore, \(\tilde{\varvec{\theta }}({\hat{\mathbf{p}}})\) is the minimum \(\phi \)-divergence estimator, \(\hat{\varvec{\theta }}_{\phi }\), satisfying the relation
Finally, we are going to establish (13). We compute the \((i,j)\)-th element of the \((t+u)\times 2^k\) matrix \({\partial \mathbf{F}\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) }.\)
and for \((\pi _1,\ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)\) we have
Since \(\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)=\mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}} \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0),\) then (13) holds. \(\square \)
Proof of Theorem 2
Applying the previous theorem, it holds
Note that
On the other hand, as \(\hat{\mathbf{p}}\) is the sample proportion, we can apply the Central Limit Theorem to conclude
where \(\varvec{\Sigma }_{\mathbf{p}(\varvec{\lambda }_0, \varvec{\eta }_0)} \) is given by
Therefore, it follows
where \(\varvec{\Sigma }^* \) is given by
with \(\mathbf{B}:=\left( \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0 )^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \right) ^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{1\over 2}.\)
It is not difficult to see that
whence \(\mathbf{B}=\mathbf{0}\) and the result holds. \(\square \)
Proof of Theorem 3
Using Theorem 2, it suffices to apply the delta method. Then, we can conclude that
Now, as \(\nabla \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)= \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0)\), the theorem is proved. \(\square \)
Rights and permissions
About this article
Cite this article
Felipe, A., Miranda, P. & Pardo, L. Minimum \(\phi \)-Divergence Estimation in Constrained Latent Class Models for Binary Data. Psychometrika 80, 1020–1042 (2015). https://doi.org/10.1007/s11336-015-9450-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-015-9450-4