Minimum $$\phi $$ -Divergence Estimation in Constrained Latent Class Models for Binary Data

Felipe, A.; Miranda, P.; Pardo, L.

doi:10.1007/s11336-015-9450-4

Minimum $\phi $-Divergence Estimation in Constrained Latent Class Models for Binary Data

Published: 28 February 2015

Volume 80, pages 1020–1042, (2015)
Cite this article

Psychometrika Aims and scope Submit manuscript

A. Felipe¹,
P. Miranda¹ &
L. Pardo¹

240 Accesses
5 Citations
Explore all metrics

Abstract

The main purpose of this paper is to introduce and study the behavior of minimum $\phi $-divergence estimators as an alternative to the maximum-likelihood estimator in latent class models for binary items. As it will become clear below, minimum $\phi $-divergence estimators are a natural extension of the maximum-likelihood estimator. The asymptotic properties of minimum $\phi $-divergence estimators for latent class models for binary data are developed. Finally, to compare the efficiency and robustness of these new estimators with that obtained through maximum likelihood when the sample size is not big enough to apply the asymptotic results, we have carried out a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Parsimonious ultrametric Gaussian mixture models

Article Open access 01 April 2024

Carlo Cavicchia, Maurizio Vichi & Giorgia Zaccaria

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

Bhavya Mor, Sunita Garhwal & Ajay Kumar

Notes

For the sake of simplicity we have adopted the usual terminology 1 for correct answers and 0 otherwise. However, the model applies for any dichotomous question (low–high, success–fail, agree–disagree,...). For example, in Sect. 4 we are considering a situation whose possible answers are “low” and “high.”
The variant we used in step 2 consists in permuting randomly the $t+u$ parameters $(\varvec{\lambda }, \varvec{\eta })$ for each initial point $i.$ The additional improvement consists in seeking a better point in the vector from the initial point to the final point obtained through the full iteration of the Hooke and Jeeves algorithm in double or half spacing steps toward exterior or interior relative to this vector. At most we need $2(t+u)+4$ evaluations of $D_{\phi }$. The criterion $D_{\phi }^{in}$ is used in order to discard non promising initial points from a finer and most costly improvement.

References

Abar, B., & Loken, E. (2010). Self-regulated learning and self-directed study in a pre-college sample. Learning and Individual Differences, 20, 25–29.
Article PubMed Central PubMed Google Scholar
Berkson, J. (1980). Minimum chi-square, not maximum likelihood!. Annals of Statisitcs, 8(3), 482–485.
Google Scholar
Biemer, P. (2011). Latent class analysis and survey error. Hoboken, NJ: Wiley.
Google Scholar
Caldwell, L., Bradley, S., & Coffman, D. (2009). A person-centered approach to individualizing a scool-based universal preventive intervention. American Journal of Drug and Alcohol Abuse, 35(4), 214–219.
Article PubMed Central PubMed Google Scholar
Clogg, C. (1995). Latent class models: Recent developments and prospects for the future. In C. G. Arminger & M. Sobol (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 311–352). New York: Plenum.
Chapter Google Scholar
Coffman, D., Patrick, M., Polen, L., Rhoades, B., & Ventura, A. (2007). Why do high school seniors drink? Implication for a targeted approach to intervention. Prevention Science, 8, 1–8.
Article Google Scholar
Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press.
Google Scholar
Collins, L., & Lanza, S. (2010). Latent class and latent transition analysis for the social, behavioral, and health sciences. New York: Wiley.
Google Scholar
Cressie, N., & Pardo, L. (2002). Phi-divergence statisitcs. In A. H. Elshaarawi & W. W. Plegorich (Eds.), Encyclopedia of environmetrics (Vol. 13, pp. 1551–1555). New York: Wiley.
Google Scholar
Cressie, N., & Read, T. R. C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Series B, 8, 440–464.
Google Scholar
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2, 299–318.
Google Scholar
Feldman, B., Masyn, K., & Conger, R. (2009). New approaches to studying behaviors: A comparison of methods for modelling longitudinal, categorical and adolescent drinking data. Development Psycology, 45(3), 652–676.
Article Google Scholar
Formann, A. (1976). Schätzung der Parameter in Lazarsfeld Latent-Class Analysis. In Res. Bull., number 18. Institut für Psycologie der Universität Wien. In German.
Formann, A. (1977). Log-linear latent class analyse. In Res. Bull., number 20. Institut für Psycologie der Universität Wien. In German.
Formann, A. (1978). A note on parametric estimation for Lazarsfeld’s latent class analysis. Psychometrika, 48, 123–126.
Article Google Scholar
Formann, A. (1982). Linear logistic latent class analysis. Biometrical Journal, 24, 171–190.
Article Google Scholar
Formann, A. (1985). Constrained latent class models: Theory and applications. British Journal of Mathematics and Statistical Psicology, 38, 87–111.
Article Google Scholar
Formann, A. (1992). Linear logistic latent class analysis for polytomous data. Journal of the Amearican Statistical Association, 87, 476–486.
Article Google Scholar
Gerber, M., Witterkind, A., Grote, G., & Staffelbach, B. (2009). Exploring types of career orientation: A latent class analysis approach. Journal of Vocational Behavior, 75, 303–318.
Article Google Scholar
Gill, P. E. & Murray, W. (1979). Conjugate-gradient methods for large-scale nonlinear optimization. Technical Report SOL 79–15. Department of Operations Research, Stanford University.
Goodman, L. A. (1974). Exploratory latent structure analysis using Goth identifiable and unidentifiable models. Biometrika, 61, 215–231.
Article Google Scholar
Hagenaars, J. A., & Cutcheon, A. L. M. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.
Book Google Scholar
Hooke, R., & Jeeves, T. A. (1961). Direct search solution of numerical and statistical problems. Journal of the Association for Computing Machinery, 8, 212–229.
Article Google Scholar
Langeheine, R., & Rost, J. (1988). Latent trait and latent class models. New York: Plenum Press.
Book Google Scholar
Laska, M., Pash, K., Lust, K., Story, M., & Ehlinger, E. (2009). Latent class analysis of lifestyle characteristics and health risk behaviors among college youth. Prevention Sciences, 10, 376–386.
Article Google Scholar
Lazarsfeld, P., & Henry, N. (1968). Latent structure analysis. Boston: Houghton-Mifflin.
Google Scholar
Lazarsfeld, P. (1950). The logical and mathematical foundation of latent structure analysis. Studies in social psycology in world war II (Vol. IV, pp. 362–412). Princeton, NJ: Princeton University Press.
Google Scholar
McHugh, R. (1956). Efficient estimation and local identification in latent class analysis. Psychometrika, 21, 331–347.
Article Google Scholar
Morales, D., Pardo, L., & Vajda, I. (1995). Asymptotic divergence of estimates of discrete distributions. Jounal of Statistical Planning and Inference, 48, 347–369.
Article Google Scholar
Nylund, K., Bellmore, A., Nishina, A., & Grahan, S. (2007). Subtypes, severity and structural stability of peer victimization: What does latent class analysis say? Child Prevention, 78, 1706–1722.
Google Scholar
Pardo, L. (2006). Statistical inference based on divergence measures. New York: Chapman & Hall CRC.
Google Scholar
Powell, M. (1970). A hybrid method for nonlinear algebraic equations. In P. Rabinowitz (Ed.), Numerical methods for nonlinear algebraic equations. London: Gordon and Breach.
Google Scholar
Rost, J., & Langeheine, R. (1997). Applications of latent trait and latent class models in the social sciences. Münster: Waxmann.
Google Scholar

Download references

Acknowledgments

This work was partially supported by Grant MTM2012-33740. Part of the computations of this work were performed in EOLO, the HPC of Climate Change of the International Campus of Excellence (CEI) of Moncloa, funded by MECD and MICINN; this is a contribution of CEI Moncloa. We also thank the anonymous referees for their comments and remarks that have improved the final version of the paper.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, Faculty of Mathematics, Complutense University of Madrid, 28040 , Madrid, Spain
A. Felipe, P. Miranda & L. Pardo

Authors

A. Felipe
View author publications
You can also search for this author in PubMed Google Scholar
P. Miranda
View author publications
You can also search for this author in PubMed Google Scholar
L. Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Miranda.

Appendix

Remark 1

We are going to develop the calculations for ${{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }}}$ and ${{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \eta _{\beta }}}.$

For ${{\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }}}$ note that

$$\begin{aligned} p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta }) \!&= \! \sum _{j=1}^m w_j \prod _{i=1}^k \left( {exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r \!+\! c_{ji}}\right) \over 1 \!+\!exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r \!+\! c_{ji}}\right) }\right) ^{y_{\nu i}} \left( 1\!-\! {exp \left( { \sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \over 1 +exp \left( { \sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } \right) ^{1-y_{\nu i}} \\&= \sum _{j=1}^m w_j \prod _{i=1}^k {exp \left( y_{\nu i}\left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } \end{aligned}$$

Now,

$$\begin{aligned} {\partial \left( {exp \left( y_{\nu i}\left( { \sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } \right) \over \partial \lambda _{\alpha }}&= {exp \left( y_{\nu i}\left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) q_{ji\alpha }\over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) }\\&\quad \times \left[ y_{\nu i} - {exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) }\right] \\&= {exp \left( y_{\nu i}\left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) \right) \over 1 +exp \left( {\sum \nolimits _{r=1}^t q_{jir} \lambda _r + c_{ji}}\right) } q_{ji\alpha }\left( y_{\nu i} - p_{ji}\right) , \end{aligned}$$

whence

$$\begin{aligned} {\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }} = \sum _{j=1}^m w_j \left[ Pr (\mathbf{y_{\nu }} | P_{\nu }\in C_j ) \sum _{i=1}^k q_{ji\alpha } (y_{\nu i}- p_{ji})\right] , \, \, \alpha =1,\ldots , t. \end{aligned}$$

Similarly,

$$\begin{aligned} p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta }) = \sum _{j=1}^m {exp ({ \sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j}})\over {\sum \nolimits _{j=1}^m exp (\sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j})}} Pr (\mathbf{y_{\nu }} | P_{\nu }\in C_j ). \end{aligned}$$

Now,

$$\begin{aligned} {\partial \left( {exp \left( {\sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j}}\right) \over {\sum \nolimits _{h=1}^m exp \left( \sum \nolimits _{r=1}^u v_{hr} \eta _r + d_{h}\right) }} \right) \over \partial \eta _{\beta } }&= {exp \left( {\sum \nolimits _{r=1}^u v_{jr} \eta _r + d_{j}}\right) \over {\sum \nolimits _{h=1}^m exp \left( \sum \nolimits _{r=1}^u v_{hr} \eta _r + d_{h}\right) }}\\&\times \left[ v_{j\beta } - {{\sum \nolimits _{h=1}^m exp (\sum _{r=1}^u v_{hr} \eta _r + d_{h}})v_{h\beta }\over {\sum \nolimits _{h=1}^m exp \left( \sum \nolimits _{r=1}^u v_{hr} \eta _r + d_{h}\right) }}\right] \\&= w_j \left[ v_{j\beta } - \sum _{h=1}^m w_h v_{h\beta }\right] , \end{aligned}$$

whence

$$\begin{aligned} {\partial p(\mathbf{y_{\nu }}, \varvec{\lambda }, \varvec{\eta })\over \partial \eta _{\beta }} = \sum _{j=1}^m w_j Pr (\mathbf{y_{\nu }} | P_{\nu }\in C_j )\left[ v_{j\beta } - \sum _{h=1}^m w_h v_{h\beta } \right] ,\, \, \beta =1,\ldots , u. \end{aligned}$$

Proof of Theorem 1

Let $l^{2^k}$ be the interior of the $2^k$-dimensional unit cube; then, the interior of $\Delta _{2^k}$ is contained in $l^{2^k}.$ Let $W$ be a neighborhood of $(\varvec{\lambda }_0, \varvec{\eta }_0 ),$ the true value of the unknown parameter $(\varvec{\lambda }, \varvec{\eta }),$ on which

has continuous second partial derivatives. Let

$$\begin{aligned} \mathbf{F}:=(F_1,\ldots , F_{t+u}): l^{2^k} \times W \rightarrow \mathbb {R}^{t+u} \end{aligned}$$

whose components $F_j,\, j=1,\ldots , t+u$ are defined by

$$\begin{aligned} F_j(\tilde{p}_1,\ldots , \tilde{p}_{2^k}; \lambda _1,\ldots , \lambda _t; \eta _1,\ldots , \eta _u ):= {\partial D_{\phi }({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta })) \over \partial s_j},\, j=1,\ldots , t+u, \end{aligned}$$

where $s_j$ is defined in (8).

It holds

$$\begin{aligned} F_j(p_1(\varvec{\lambda }_0 , \varvec{\eta }_0 ),\ldots , p_{2^k}(\varvec{\lambda }_0 , \varvec{\eta }_0 );\lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)=0,\, \forall j=1,\ldots , t+u \end{aligned}$$

due to

$$\begin{aligned} {\partial D_{\phi } ({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial \lambda _{\alpha }}&= \sum _{\nu =1}^{2^k} \left\{ \phi \left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) - {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) \right\} {\partial p_{\nu }( \varvec{\lambda }, \varvec{\eta })\over \partial \lambda _{\alpha }} ,\, \alpha =1,\ldots , t.\\ {\partial D_{\phi } ({\hat{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial \eta _{\beta }}&= \sum _{\nu =1}^{2^k} \left\{ \phi \left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) - {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) \right\} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial \eta _{\beta }} ,\, \beta =1,\ldots , u. \end{aligned}$$

In the following we shall rewrite the two previous expressions by

$$\begin{aligned} {\partial D_{\phi } ({ \tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}},\, j=1,\ldots , t+u. \end{aligned}$$

Since

$$\begin{aligned} {\partial \over \partial s_r}\left( {\partial D_{\phi } ({ \tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right)&= -\sum _{\nu =1}^{2^k} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })^2} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r}} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{j}} \\&+ \sum _{\nu =1}^{2^k} \phi ''\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })^2} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r}} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{j}} {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })}\\&+ \sum _{\nu =1}^{2^k} \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })^2} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r}} {\partial p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{j}} \\&+ \sum _{\nu =1}^{2^k} {\partial ^2 p_{\nu }(\varvec{\lambda }, \varvec{\eta })\over \partial s_{r} s_j} \left\{ \phi \left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) - \phi '\left( {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{\nu }\over p_{\nu }(\varvec{\lambda }, \varvec{\eta })}\right\} , \end{aligned}$$

and denoting $\pi _{\nu }=p_{\nu }(\varvec{\lambda }_0 , \varvec{\eta }_0), \nu =1,\ldots , 2^k,$ the $(t+u)\times (t+u)$ matrix $\mathbf{J}_\mathbf{F}$ associated with function $\mathbf{F}$ at point $(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))$ is given by

$$\begin{aligned} {\partial \mathbf{F}\over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )}&= \left( {\partial \mathbf{F}\over \partial (\varvec{\lambda }, \varvec{\eta })} \right) _{({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta }))= (\pi _1,\ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)} \\&= \left( \left( {\partial \over \partial s_r}\left( {\partial D_{\phi } ({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right) \right) _{\mathop {r=1,\ldots , t+u}\limits ^{j=1,\ldots , t+u}} \right) _{({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta })) =(\pi _1,\ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)} \\&= \phi ''(1) \left( \sum _{l=1}^{2^k} {1\over p_l(\varvec{\lambda }_0 , \varvec{\eta }_0 )}{\partial p_l(\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial s_{r}} {\partial p_l(\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial s_{j}} \right) _{\mathop {r=1,\ldots , t+u}\limits ^{j=1,\ldots , t+u}} \end{aligned}$$

To get the last expression we are using that $\phi (1)=\phi '(1)=0.$ Recall that if $\mathbf{B}$ is a $p\times q$ matrix with $rank(\mathbf{B})=p$ and $\mathbf{C}$ is a $q\times s$ matrix with $rank(\mathbf{C})=q,$ then $rank(\mathbf{BC})=p.$ Taking

$$\begin{aligned} \mathbf{B}= \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T, \, \, \mathbf{C}=\mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{-{1\over 2}}, \end{aligned}$$

it follows that $\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T=\mathbf{BC}$ has rank $t+u$ applying the fourth condition of Birch. Also,

$$\begin{aligned} rank (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))=rank (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T)=rank (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))=t+u. \end{aligned}$$

Therefore, the $(t+u)\times (t+u)$ matrix ${\partial \mathbf{F}\over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0)}$ is nonsingular at $(\pi _1, \ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)$.

Applying the Implicit Function Theorem, there exists a neighborhood $U$ of $(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))$ such that the matrix $\mathbf{J}_\mathbf{F}$ is nonsingular (in our case $\mathbf{J}_\mathbf{F}$ at $(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), (\varvec{\lambda }_0 , \varvec{\eta }_0 ))$ is positive definite and then it is continuously differentiable). Also, there exists a continuously differentiable function

$$\begin{aligned} \tilde{\varvec{\theta }}:A\subset l^{2^k} \rightarrow \mathbb {R}^{t+u} \end{aligned}$$

such that $\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)\in A$ and

$$\begin{aligned} \left\{ ({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta }))\in U : \mathbf{F}({\tilde{\mathbf{p}}}, (\varvec{\lambda }, \varvec{\eta }))=0\right\} =\left\{ ({\tilde{\mathbf{p}}}, \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}})): { \tilde{\mathbf{p}}}\in A \right\} . \end{aligned}$$

(12)

We can observe that $\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))$ is an argmin of

$$\begin{aligned} \psi (\varvec{\lambda }, \varvec{\eta }):= D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\varvec{\lambda }, \varvec{\eta })) \end{aligned}$$

because $\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)\in A$ and then

$$\begin{aligned} \mathbf{F}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))) = {\partial D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)))) \over \partial (\varvec{\lambda }, \varvec{\eta })} = \mathbf{0}. \end{aligned}$$

On the other hand, applying (12),

$$\begin{aligned} (\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))) \in U, \end{aligned}$$

and then $\mathbf{J}_\mathbf{F}$ is positive definite at $ (\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))).$ Therefore,

$$\begin{aligned} D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)))) = \inf _{(\varvec{\lambda }, \varvec{\eta })\in \Theta } D_{\phi }(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0), \mathbf{p}(\varvec{\lambda }, \varvec{\eta })), \end{aligned}$$

and by the $\phi $-divergence properties $\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))= (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T,$ and

$$\begin{aligned} {\partial \mathbf{F}\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) } - {\partial \mathbf{F} \over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0)} {\partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)} =\mathbf{0}. \end{aligned}$$

Further, we know that

$$\begin{aligned} {\partial \mathbf{F} \over \partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )} = \phi ''(1) \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \end{aligned}$$

and we shall establish later that the $(t+u)\times 2^k$ matrix $ {\partial \mathbf{F} \over \partial \varvec{\pi }}$ is

$$\begin{aligned} {\partial \mathbf{F} \over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)} = \phi '' (1) \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}}. \end{aligned}$$

(13)

Therefore, the $(t+u)\times 2^k$ matrix ${\partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) }$ is

$$\begin{aligned} {\partial (\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) } = (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}}. \end{aligned}$$

The Taylor expansion of the function $\tilde{\varvec{\theta }}$ around $\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) $ yields

$$\begin{aligned} \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}}) = \tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) + \left( {\partial \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}}) \over {\tilde{\mathbf{p}}}}\right) _{{ \tilde{\mathbf{p}}}=\varvec{\pi }} ({\tilde{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) + o(\Vert {\tilde{\mathbf{p}}} -\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ). \end{aligned}$$

As $\tilde{\varvec{\theta }}(\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) = (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T,$ we obtain from here

$$\begin{aligned} \tilde{\varvec{\theta }}({\tilde{\mathbf{p}}})&= (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T+ (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 }^{-{1\over 2}} ({ \tilde{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))\\&+ o(\Vert { \tilde{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ). \end{aligned}$$

We know that $ {\hat{\mathbf{p}}}{\overset{a.s.}{\longrightarrow }} \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 ),$ so that ${\hat{\mathbf{p}}}\in A$ and, consequently, $ \tilde{\varvec{\theta }}({\hat{\mathbf{p}}})$ is the unique solution of the system of equations

$$\begin{aligned} {\partial D_{\phi }( {\hat{\mathbf{p}}}, \mathbf{p} ( \tilde{\varvec{\theta }}({\hat{\mathbf{p}}})))\over s_j} =0,\, j=1,\ldots , t+u, \end{aligned}$$

and also $( {\hat{\mathbf{p}}}, \tilde{\varvec{\theta }}({\hat{\mathbf{p}}}))\in U.$ Therefore, $\tilde{\varvec{\theta }}({\hat{\mathbf{p}}})$ is the minimum $\phi $-divergence estimator, $\hat{\varvec{\theta }}_{\phi }$, satisfying the relation

$$\begin{aligned} \hat{\varvec{\theta }}_{\phi }&= (\varvec{\lambda }_0 , \varvec{\eta }_0 )^T+ (\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0))^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}} ({\hat{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 ))\\&+ o(\Vert {\hat{\mathbf{p}}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )\Vert ). \end{aligned}$$

Finally, we are going to establish (13). We compute the $(i,j)$-th element of the $(t+u)\times 2^k$ matrix ${\partial \mathbf{F}\over \partial \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) }.$

$$\begin{aligned} {\partial \over \partial p_i} \left( {\partial D_{\phi } ({\tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right)&= {\partial \over \partial p_i} \left( \sum _{l=1}^{2^k} \left\{ \phi \left( {\tilde{p}_{l} \over p_l(\varvec{\lambda }, \varvec{\eta })} \right) - \phi ' \left( {\tilde{p}_{l}\over p_l(\varvec{\lambda }, \varvec{\eta })} \right) {\tilde{p}_{l}\over p_l(\varvec{\lambda }, \varvec{\eta })}\right\} {\partial p_l(\varvec{\lambda }, \varvec{\eta })\over \partial s_j} \right) \\&= {1\over p_i(\varvec{\lambda }, \varvec{\eta })} \left( -{p_i\over p_i (\varvec{\lambda }, \varvec{\eta })} \phi ''\left( {p_i\over p_i (\varvec{\lambda }, \varvec{\eta })}\right) \right) {\partial p_i(\varvec{\lambda }, \varvec{\eta })\over \partial s_j} \end{aligned}$$

and for $(\pi _1,\ldots , \pi _{2^k}; \lambda _1^0,\ldots , \lambda _t^0; \eta _1^0,\ldots , \eta _u^0)$ we have

$$\begin{aligned} {\partial \over \partial p_i} \left( {\partial D_{\phi } ({ \tilde{\mathbf{p}}}, \mathbf{p}(\varvec{\lambda }, \varvec{\eta }))\over \partial s_{j}} \right) = {1\over p_i (\varvec{\lambda }_0 , \varvec{\eta }_0 )} \phi ''\left( 1\right) {\partial p_i(\varvec{\lambda }_0 , \varvec{\eta }_0 )\over \partial s_j}. \end{aligned}$$

Since $\mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)=\mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0 )}^{-{1\over 2}} \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0),$ then (13) holds. $\square $

Proof of Theorem 2

Applying the previous theorem, it holds

$$\begin{aligned} \sqrt{N}(\hat{\varvec{\theta }}_{\phi } -(\varvec{\lambda }_0 , \varvec{\eta }_0)^T)&= \left( \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0 )^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \right) ^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{-{1\over 2}} \sqrt{N} (\hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0))\\&+\, \sqrt{N} \, \, \, o(\Vert \hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ). \end{aligned}$$

Note that

$$\begin{aligned} \sqrt{N} \, \, \, o(\Vert \hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \Vert ) = o_p(1). \end{aligned}$$

On the other hand, as $\hat{\mathbf{p}}$ is the sample proportion, we can apply the Central Limit Theorem to conclude

$$\begin{aligned} \sqrt{N} (\hat{\mathbf{p}} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)) {\overset{L}{\longrightarrow }} \mathcal{N}(\mathbf{0} , \varvec{\Sigma }_{\mathbf{p}(\varvec{\lambda }_0, \varvec{\eta }_0)} ), \end{aligned}$$

where $\varvec{\Sigma }_{\mathbf{p}(\varvec{\lambda }_0, \varvec{\eta }_0)} $ is given by

$$\begin{aligned} \varvec{\Sigma }_{\mathbf{p}(\varvec{\lambda }_0, \varvec{\eta }_0)} = \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)} - \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0) \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T. \end{aligned}$$

Therefore, it follows

$$\begin{aligned} \sqrt{N}(\hat{\varvec{\theta }}_{\phi } -(\varvec{\lambda }_0 , \varvec{\eta }_0 )) {\overset{L}{\longrightarrow }} \mathcal{N}(\mathbf{0} , \varvec{\Sigma }^*), \end{aligned}$$

where $\varvec{\Sigma }^* $ is given by

$$\begin{aligned} \varvec{\Sigma }^* = \left( \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0 )^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \right) ^{-1} - \mathbf{B} \mathbf{B}^T \end{aligned}$$

with $\mathbf{B}:=\left( \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0 )^T \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0) \right) ^{-1} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{1\over 2}.$

It is not difficult to see that

$$\begin{aligned} \mathbf{D}_{\mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)}^{{1\over 2}} \mathbf{A}(\varvec{\lambda }_0 , \varvec{\eta }_0)=0, \end{aligned}$$

whence $\mathbf{B}=\mathbf{0}$ and the result holds. $\square $

Proof of Theorem 3

Using Theorem 2, it suffices to apply the delta method. Then, we can conclude that

$$\begin{aligned} \sqrt{N}(\mathbf{p}( \hat{\varvec{\theta }}_{\phi })- \mathbf{p} (\varvec{\lambda }_0, \varvec{\eta }_0 )) {\overset{L}{\longrightarrow }} \mathcal{N}(\mathbf{0} , \nabla \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)^T\left( \mathbf{A}(\varvec{\lambda }_0, \varvec{\eta }_0 )^t \mathbf{A}(\varvec{\lambda }_0, \varvec{\eta }_0 ) \right) ^{-1} \nabla \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)). \end{aligned}$$

Now, as $\nabla \mathbf{p}(\varvec{\lambda }_0 , \varvec{\eta }_0)= \mathbf{J}(\varvec{\lambda }_0 , \varvec{\eta }_0)$, the theorem is proved. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Felipe, A., Miranda, P. & Pardo, L. Minimum $\phi $-Divergence Estimation in Constrained Latent Class Models for Binary Data. Psychometrika 80, 1020–1042 (2015). https://doi.org/10.1007/s11336-015-9450-4

Download citation

Received: 22 February 2013
Accepted: 12 February 2015
Published: 28 February 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11336-015-9450-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimum \(\phi \)-Divergence Estimation in Constrained Latent Class Models for Binary Data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Parsimonious ultrametric Gaussian mixture models

A Systematic Review of Hidden Markov Models and Their Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Remark 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Parsimonious ultrametric Gaussian mixture models

A Systematic Review of Hidden Markov Models and Their Applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Remark 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation