Abstract
This paper presents an extension of the factor analysis model based on the normal mean–variance mixture of the Birnbaum–Saunders in the presence of nonresponses and missing data. This model can be used as a powerful tool to model non-normal features observed from data such as strongly skewed and heavy-tailed noises. Missing data may occur due to operator error or incomplete data capturing therefore cannot be ignored in factor analysis modeling. We implement an EM-type algorithm for maximum likelihood estimation and propose single imputation of possible missing values under a missing at random mechanism. The potential and applicability of our proposed method are illustrated through analyzing both simulated and real datasets.
Similar content being viewed by others
References
Anderson, T.W.: An introduction to multivariate statistical analysis (Wiley Series in Probability and Statistics), 3 edn. (2003)
Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized inverse gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 38(4), 309–311 (1977)
Basilevsky, A.T.: Statistical factor analysis and related methods: theory and applications, New York, Wiley (2009)
Desmond, A.F.: On the relationship between two fatigue-life models. IEEE Trans. Reliab. 35(2), 167–169 (1986)
Fokoué, E., Titterington, D.: Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation. Machine Learning 50(1), 73–94 (2003)
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4), 237–264 (1953)
Hashemi, F., Naderi, M., Jamalizadeh, A., Lin, T.I.: A skew factor analysis model based on the normal mean–variance mixture of Birnbaum-Saunders distribution. J. Appl. Stat. 47(16), 3007–3029 (2020)
Hashemi, F., Naderi, M., Mashinchi, M.: Clustering right-skewed data stream via Birnbaum-Saunders mixture models: a flexible approach based on fuzzy clustering algorithm. Appl. Soft Comput. 82, 105539 (2019). https://doi.org/10.1016/j.asoc.2019.105539
Kibler, D., Aha, D.W., Albert, M.K.: Instance-based prediction of real-valued attributes. Comput. Intell. 5(2), 51–57 (1989)
Lawley, D.N.: The estimation of factor loadings by the method of maximum likelihood. Proc. R. Soc. Edinb. 60(1), 64–82 (1940)
Lawley, D.N., Maxwell, A.E.: Factor analysis as a statistical method. J. Royal Statist. Soc.: Series D (The Statistician) 12(3), 209–229 (1962)
Lee, S.X., Mclachlan, G.J.: On mixtures of skew normal and skew t-distributions. Adv. Data Anal. Classif. 7(3), 241–266 (2013)
Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat. Comput. 24(4), 531–546 (2014)
Lin, T.I., Wang, W.L., McLachlan, G.J., Lee, S.X.: Robust mixtures of factor analysis models using the restricted multivariate skew-t distribution. Stat. Model. 18(1), 50–72 (2018)
Little, R., Rubin, D.: Statistical analysis with missing data. Wiley, London (2002)
Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633–648 (1994)
Liu, M., Lin, T.: Skew-normal factor analysis models with incomplete data. J. Appl. Stat. 42(4), 789–805 (2015)
McLachlan, G.J., Bean, R., Jones, L.B.T.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Statist. Data Analy. 51(11), 5327–5338 (2007)
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993)
Murray, P.M., Browne, R.P., McNicholas, P.D.: Mixtures of skew-t factor analyzers. Comput. Statist. Data Analy. 77, 326–335 (2014a)
Murray, P.M., McNicholas, P.D., Browne, R.P.: A mixture of common skew-t factor analysers. Stat 3(1), 68–82 (2014b)
Pourmousa, R., Jamalizadeh, A., Rezapour, M.: Multivariate normal mean-variance mixture distribution based on Birnbaum-Saunders distribution. J. Stat. Comput. Simul. 85(13), 2736–2749 (2015)
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Rubin, D.B., Thayer, D.T.: Em algorithms for ml factor analysis. Psychometrika 47(1), 69–76 (1982)
Schafer, J.L.: Analysis of incomplete multivariate data. CRC Press (1997)
Tortora, C., McNicholas, P.D., Browne, R.P.: A mixture of generalized hyperbolic factor analyzers. Adv. Data Anal. Classif. 10(4), 423–440 (2015). https://doi.org/10.1007/s11634-015-0204-z
Villasenor Alva, J.A., Estrada, E.G.: A generalization of shapiro-wilk’s test for multivariate normality. Communications in Statistics-Theory and Methods 38(11), 1870–1883 (2009)
Wang, W.L., Liu, M., Lin, T.I.: Robust skew-t factor analysis models for handling missing data. Statis. Methods Appl. 26(4), 649–672 (2017)
Wei, Y., Tang, Y., McNicholas, P.D.: Flexible high-dimensional unsupervised learning with missing data. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 610–621 (2020)
Acknowledgements
The authors would like to sincerely thank the Editor, anonymous Associate Editor and two reviewers for their constructive and insightful comments, which significantly improved the presentation. This work was based on research supported by the National Research Foundation, South Africa (SRUG190308422768 Grant No. 120839 and IFR170227223754 Grant No. 109214), the South African NRF SARChI Research Chair in Computational and Methodological Statistics (UID: 71199). The research of the corresponding author is supported by a grant from Ferdowsi University of Mashhad (N.2/54034). We would like to thank Prof. McNicholas (Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada) for his valuable comments on the earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendix A. Proof of Proposition 1
Appendix A. Proof of Proposition 1
-
(a)
Let \( {\varvec{Y}}_j\;(j=1,\ldots ,n) \), \( {\varvec{\mu }} \), \( {\varvec{\Sigma }} \) and \( {\varvec{\lambda }} \) be partitioned as
$$\begin{aligned}&{\varvec{Y}}_j=\begin{bmatrix} {\varvec{Y}}_j^o\\ {\varvec{Y}}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{Y}}_j\\ {\varvec{M}}_j{\varvec{Y}}_j \end{bmatrix},\quad {\varvec{\mu }}=\begin{bmatrix} {\varvec{\mu }}_j^o\\ {\varvec{\mu }}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\mu }}\\ {\varvec{M}}_j{\varvec{\mu }} \end{bmatrix},\quad {\varvec{\lambda }}=\begin{bmatrix} {\varvec{\lambda }}_j^o\\ {\varvec{\lambda }}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\lambda }}\\ {\varvec{M}}_j{\varvec{\lambda }} \end{bmatrix},\\&{\varvec{\Sigma }}=\begin{bmatrix} {\varvec{\Sigma }}_j^{oo}&{} {\varvec{\Sigma }}_j^{om}\\ {\varvec{\Sigma }}_j^{mo}&{} {\varvec{\Sigma }}_j^{mm} \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\Sigma }} {\varvec{O}}_j^\top &{}{\varvec{O}}_j{\varvec{\Sigma }} {\varvec{M}}_j^\top \\ {\varvec{M}}_j{\varvec{\Sigma }} {\varvec{O}}_j^\top &{}{\varvec{M}}_j{\varvec{\Sigma }} {\varvec{M}}_j^\top \end{bmatrix}. \end{aligned}$$Based on [22], the marginal distribution of the observed component \( {\varvec{Y}}_j^o \) is
$$\begin{aligned} {\varvec{Y}}_j^o \sim \text {NMVBS}_{p_j^o}({\varvec{\mu }}_{j}^o -a_{\alpha }{\varvec{\eta }}_{j}^o,{\varvec{\Sigma }}_{j}^{oo}, {\varvec{\eta }}_{j}^o,\alpha ). \end{aligned}$$ -
(b)
Using part (a) and (2.7), proof of part (b) is omitted.
-
(c)
We have \( {\varvec{Y}}_j=\begin{bmatrix} {\varvec{Y}}_j^o\\ {\varvec{Y}}_j^m \end{bmatrix}\mid (\tilde{{\varvec{u}}}_j, w_j) \sim N_p({\varvec{\mu }} +\tilde{{\varvec{B}}}\tilde{{\varvec{u}}}_j,w_j{\varvec{D}}) \). By Theorem 2.5.1 of [1], we can show that
$$\begin{aligned} {\varvec{\varphi }}_{j}^{m.o}=&E({\varvec{Y}}_j^m\mid {\varvec{Y}}_j^o,\tilde{{\varvec{u}}}_j, w_j)={\varvec{M}}_j {\varvec{\mu }}\\ {}&+{\varvec{M}}_j \tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j}+{\varvec{M}}_j{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}({\varvec{O}}_j{\varvec{Y}}_j- {\varvec{O}}_j{\varvec{\mu }} - {\varvec{O}}_j\tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j})\\ =&{\varvec{M}}_j\left[ {\varvec{\mu }} +\tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j}+{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j({\varvec{Y}}_j-{\varvec{\mu }} - \tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j})\right] \end{aligned}$$and
$$\begin{aligned} {\varvec{D}}_{j}^{mm.o}=&cov({\varvec{Y}}_j^m\mid {\varvec{Y}}_j^o,\tilde{{\varvec{u}}}_j, w_j)={\varvec{M}}_j{\varvec{D}} {\varvec{M}}_j^\top -{\varvec{M}}_j{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j{\varvec{D}} {\varvec{M}}_j^\top \\ =&{\varvec{M}}_j\left( {\varvec{I}}_p-{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j\right) {\varvec{D}} {\varvec{M}}_j^\top . \end{aligned}$$Thus, \( {\varvec{Y}}_j^m\mid ({\varvec{Y}}_j^o, \tilde{{\varvec{u}}}_{j}, w_j) \sim N_{p-p_j^o}({\varvec{\varphi }}_{j}^{m.o}, W_j{\varvec{D}}_{j}^{mm.o}) \).
The proofs of part (d), (e) and part (f) are straightforward omitted based on part (a) and [7].
-
(g)
It follows from part (d) that the distribution of \(W_j\mid {\varvec{y}}_j^o\) has a mixture of two GIG distributions. Thus, \( E(W_j^r\mid {\varvec{y}}_j^o)=\pi _j^o E(V_{1j}^r)+ (1-\pi _j^o) E(V_{2j}^r),\) where \(V_{1j} \sim GIG\left( \frac{1-p_j^o}{2},\chi _{j}^o,\psi _{j}^o\right) \) and \(V_{2j} \sim GIG\left( -\frac{1+p_j^o}{2},\chi _{j}^o,\psi _{j}^o\right) \). Therefore,
$$\begin{aligned} E(W^r_j \mid {\varvec{y}}_j^o) =&\pi _j^o \left( \frac{\chi _j^o}{\psi _j^o}\right) ^{r/2}\frac{K_{(1-p^o)/2+r}\left( \sqrt{\psi _j^o\chi _j^o} \right) }{K_{(1-p^o)/2}\left( \sqrt{\psi _j^o\chi _j^o} \right) }\\&+ (1-\pi _j^o) \left( \frac{\chi _j^o}{\psi _j^o}\right) ^{r/2}\frac{K_{-(1+p^o)/2+r}\left( \sqrt{\psi _j^o\chi _j^o}\right) }{K_{-(1+p^o)/2}\left( \sqrt{\psi _j^o\chi _j^o}\right) },\quad r=\pm 1. \end{aligned}$$From part (f), we can see that \( E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)={\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} .\) Applying the law of iterative expectations, we get
$$\begin{aligned} E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o)=&E\left\{ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o\right\} =E\left\{ {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \mid {\varvec{y}}_j^o \right\} \\ =&{\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(E(W_j\mid {\varvec{y}}_j^o)-a_{\alpha })\right\} . \end{aligned}$$Using the law of iterative expectations and part (f), it is easy to verify that
$$\begin{aligned} E(W_j^{-1}\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o)=&E\left\{ W_j^{-1}E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o\right\} \\ =&E\left\{ W_j^{-1}\left[ {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \right] \mid {\varvec{y}}_j\right\} \nonumber \\ =&{\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o E(W_j^{-1}\mid {\varvec{y}}_j^o)+{\varvec{\lambda }}\left[ 1-a_{\alpha }E(W_j^{-1}\mid {\varvec{y}}_j^o)\right] \right\} . \end{aligned}$$Using the law of iterative expectations, we obtain
$$\begin{aligned} E(W_j^{-1}\tilde{{{\varvec{U}}}}_{j}&\tilde{{{\varvec{U}}}}_{j}^{\top }\mid {\varvec{y}}_j^o)= E(W_j^{-1}E(\tilde{{{\varvec{U}}}}_{j}\tilde{{{\varvec{U}}}}_{j}^{\top }\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o)\nonumber \\ =&E\left\{ W_j^{-1} \Big [ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j) E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)^{\top } + W_j{\varvec{R}}_{j}^{oo}\Big ]\mid {\varvec{y}}_j \right\} \nonumber \\ =&E\left\{ W_j^{-1} \Big [ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j) \left( {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \right) ^{\top } + W_j R_{j}^{oo}\Big ]\mid {\varvec{y}}_j^o\right\} \nonumber \\ =&\Bigg \{E(W_j^{-1}\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o) {\varvec{b}}_{j}^{o\top }+\big [E(\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o) -a_{\alpha }E(W_j^{-1}\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o)\big ]{\varvec{\lambda }}^{\top } +{\varvec{I}}_q\Bigg \}{\varvec{R}}_{j}^{oo}. \end{aligned}$$
Rights and permissions
About this article
Cite this article
Bekker, A., Hashemi, F. & Arashi, M. Flexible Factor Model for Handling Missing Data in Supervised Learning. Commun. Math. Stat. 11, 477–501 (2023). https://doi.org/10.1007/s40304-021-00260-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40304-021-00260-9
Keywords
- Automobile dataset
- Asymmetry
- ECME algorithm
- Factor analysis model
- Heavy tails
- Incomplete data
- Liver disorders dataset