Skip to main content
Log in

Flexible Factor Model for Handling Missing Data in Supervised Learning

  • Published:
Communications in Mathematics and Statistics Aims and scope Submit manuscript

Abstract

This paper presents an extension of the factor analysis model based on the normal mean–variance mixture of the Birnbaum–Saunders in the presence of nonresponses and missing data. This model can be used as a powerful tool to model non-normal features observed from data such as strongly skewed and heavy-tailed noises. Missing data may occur due to operator error or incomplete data capturing therefore cannot be ignored in factor analysis modeling. We implement an EM-type algorithm for maximum likelihood estimation and propose single imputation of possible missing values under a missing at random mechanism. The potential and applicability of our proposed method are illustrated through analyzing both simulated and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Anderson, T.W.: An introduction to multivariate statistical analysis (Wiley Series in Probability and Statistics), 3 edn. (2003)

  2. Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized inverse gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 38(4), 309–311 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  3. Basilevsky, A.T.: Statistical factor analysis and related methods: theory and applications, New York, Wiley (2009)

  4. Desmond, A.F.: On the relationship between two fatigue-life models. IEEE Trans. Reliab. 35(2), 167–169 (1986)

    Article  MATH  Google Scholar 

  5. Fokoué, E., Titterington, D.: Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation. Machine Learning 50(1), 73–94 (2003)

  6. Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4), 237–264 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  7. Hashemi, F., Naderi, M., Jamalizadeh, A., Lin, T.I.: A skew factor analysis model based on the normal mean–variance mixture of Birnbaum-Saunders distribution. J. Appl. Stat. 47(16), 3007–3029 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hashemi, F., Naderi, M., Mashinchi, M.: Clustering right-skewed data stream via Birnbaum-Saunders mixture models: a flexible approach based on fuzzy clustering algorithm. Appl. Soft Comput. 82, 105539 (2019). https://doi.org/10.1016/j.asoc.2019.105539

    Article  Google Scholar 

  9. Kibler, D., Aha, D.W., Albert, M.K.: Instance-based prediction of real-valued attributes. Comput. Intell. 5(2), 51–57 (1989)

    Article  Google Scholar 

  10. Lawley, D.N.: The estimation of factor loadings by the method of maximum likelihood. Proc. R. Soc. Edinb. 60(1), 64–82 (1940)

    Article  MathSciNet  MATH  Google Scholar 

  11. Lawley, D.N., Maxwell, A.E.: Factor analysis as a statistical method. J. Royal Statist. Soc.: Series D (The Statistician) 12(3), 209–229 (1962)

    Google Scholar 

  12. Lee, S.X., Mclachlan, G.J.: On mixtures of skew normal and skew t-distributions. Adv. Data Anal. Classif. 7(3), 241–266 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat. Comput. 24(4), 531–546 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lin, T.I., Wang, W.L., McLachlan, G.J., Lee, S.X.: Robust mixtures of factor analysis models using the restricted multivariate skew-t distribution. Stat. Model. 18(1), 50–72 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  15. Little, R., Rubin, D.: Statistical analysis with missing data. Wiley, London (2002)

    Book  MATH  Google Scholar 

  16. Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633–648 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  17. Liu, M., Lin, T.: Skew-normal factor analysis models with incomplete data. J. Appl. Stat. 42(4), 789–805 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  18. McLachlan, G.J., Bean, R., Jones, L.B.T.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Statist. Data Analy. 51(11), 5327–5338 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  19. Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  20. Murray, P.M., Browne, R.P., McNicholas, P.D.: Mixtures of skew-t factor analyzers. Comput. Statist. Data Analy. 77, 326–335 (2014a)

    Article  MathSciNet  MATH  Google Scholar 

  21. Murray, P.M., McNicholas, P.D., Browne, R.P.: A mixture of common skew-t factor analysers. Stat 3(1), 68–82 (2014b)

    Article  MathSciNet  Google Scholar 

  22. Pourmousa, R., Jamalizadeh, A., Rezapour, M.: Multivariate normal mean-variance mixture distribution based on Birnbaum-Saunders distribution. J. Stat. Comput. Simul. 85(13), 2736–2749 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  23. Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  24. Rubin, D.B., Thayer, D.T.: Em algorithms for ml factor analysis. Psychometrika 47(1), 69–76 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  25. Schafer, J.L.: Analysis of incomplete multivariate data. CRC Press (1997)

  26. Tortora, C., McNicholas, P.D., Browne, R.P.: A mixture of generalized hyperbolic factor analyzers. Adv. Data Anal. Classif. 10(4), 423–440 (2015). https://doi.org/10.1007/s11634-015-0204-z

    Article  MathSciNet  MATH  Google Scholar 

  27. Villasenor Alva, J.A., Estrada, E.G.: A generalization of shapiro-wilk’s test for multivariate normality. Communications in Statistics-Theory and Methods 38(11), 1870–1883 (2009)

  28. Wang, W.L., Liu, M., Lin, T.I.: Robust skew-t factor analysis models for handling missing data. Statis. Methods Appl. 26(4), 649–672 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  29. Wei, Y., Tang, Y., McNicholas, P.D.: Flexible high-dimensional unsupervised learning with missing data. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 610–621 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to sincerely thank the Editor, anonymous Associate Editor and two reviewers for their constructive and insightful comments, which significantly improved the presentation. This work was based on research supported by the National Research Foundation, South Africa (SRUG190308422768 Grant No. 120839 and IFR170227223754 Grant No. 109214), the South African NRF SARChI Research Chair in Computational and Methodological Statistics (UID: 71199). The research of the corresponding author is supported by a grant from Ferdowsi University of Mashhad (N.2/54034). We would like to thank Prof. McNicholas (Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada) for his valuable comments on the earlier version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Arashi.

Appendix A. Proof of Proposition 1

Appendix A. Proof of Proposition 1

  1. (a)

    Let \( {\varvec{Y}}_j\;(j=1,\ldots ,n) \), \( {\varvec{\mu }} \), \( {\varvec{\Sigma }} \) and \( {\varvec{\lambda }} \) be partitioned as

    $$\begin{aligned}&{\varvec{Y}}_j=\begin{bmatrix} {\varvec{Y}}_j^o\\ {\varvec{Y}}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{Y}}_j\\ {\varvec{M}}_j{\varvec{Y}}_j \end{bmatrix},\quad {\varvec{\mu }}=\begin{bmatrix} {\varvec{\mu }}_j^o\\ {\varvec{\mu }}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\mu }}\\ {\varvec{M}}_j{\varvec{\mu }} \end{bmatrix},\quad {\varvec{\lambda }}=\begin{bmatrix} {\varvec{\lambda }}_j^o\\ {\varvec{\lambda }}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\lambda }}\\ {\varvec{M}}_j{\varvec{\lambda }} \end{bmatrix},\\&{\varvec{\Sigma }}=\begin{bmatrix} {\varvec{\Sigma }}_j^{oo}&{} {\varvec{\Sigma }}_j^{om}\\ {\varvec{\Sigma }}_j^{mo}&{} {\varvec{\Sigma }}_j^{mm} \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\Sigma }} {\varvec{O}}_j^\top &{}{\varvec{O}}_j{\varvec{\Sigma }} {\varvec{M}}_j^\top \\ {\varvec{M}}_j{\varvec{\Sigma }} {\varvec{O}}_j^\top &{}{\varvec{M}}_j{\varvec{\Sigma }} {\varvec{M}}_j^\top \end{bmatrix}. \end{aligned}$$

    Based on [22], the marginal distribution of the observed component \( {\varvec{Y}}_j^o \) is

    $$\begin{aligned} {\varvec{Y}}_j^o \sim \text {NMVBS}_{p_j^o}({\varvec{\mu }}_{j}^o -a_{\alpha }{\varvec{\eta }}_{j}^o,{\varvec{\Sigma }}_{j}^{oo}, {\varvec{\eta }}_{j}^o,\alpha ). \end{aligned}$$
  2. (b)

    Using part (a) and (2.7), proof of part (b) is omitted.

  3. (c)

    We have \( {\varvec{Y}}_j=\begin{bmatrix} {\varvec{Y}}_j^o\\ {\varvec{Y}}_j^m \end{bmatrix}\mid (\tilde{{\varvec{u}}}_j, w_j) \sim N_p({\varvec{\mu }} +\tilde{{\varvec{B}}}\tilde{{\varvec{u}}}_j,w_j{\varvec{D}}) \). By Theorem 2.5.1 of [1], we can show that

    $$\begin{aligned} {\varvec{\varphi }}_{j}^{m.o}=&E({\varvec{Y}}_j^m\mid {\varvec{Y}}_j^o,\tilde{{\varvec{u}}}_j, w_j)={\varvec{M}}_j {\varvec{\mu }}\\ {}&+{\varvec{M}}_j \tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j}+{\varvec{M}}_j{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}({\varvec{O}}_j{\varvec{Y}}_j- {\varvec{O}}_j{\varvec{\mu }} - {\varvec{O}}_j\tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j})\\ =&{\varvec{M}}_j\left[ {\varvec{\mu }} +\tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j}+{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j({\varvec{Y}}_j-{\varvec{\mu }} - \tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j})\right] \end{aligned}$$

    and

    $$\begin{aligned} {\varvec{D}}_{j}^{mm.o}=&cov({\varvec{Y}}_j^m\mid {\varvec{Y}}_j^o,\tilde{{\varvec{u}}}_j, w_j)={\varvec{M}}_j{\varvec{D}} {\varvec{M}}_j^\top -{\varvec{M}}_j{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j{\varvec{D}} {\varvec{M}}_j^\top \\ =&{\varvec{M}}_j\left( {\varvec{I}}_p-{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j\right) {\varvec{D}} {\varvec{M}}_j^\top . \end{aligned}$$

    Thus, \( {\varvec{Y}}_j^m\mid ({\varvec{Y}}_j^o, \tilde{{\varvec{u}}}_{j}, w_j) \sim N_{p-p_j^o}({\varvec{\varphi }}_{j}^{m.o}, W_j{\varvec{D}}_{j}^{mm.o}) \).

The proofs of part (d), (e) and part (f) are straightforward omitted based on part (a) and [7].

  1. (g)

    It follows from part (d) that the distribution of \(W_j\mid {\varvec{y}}_j^o\) has a mixture of two GIG distributions. Thus, \( E(W_j^r\mid {\varvec{y}}_j^o)=\pi _j^o E(V_{1j}^r)+ (1-\pi _j^o) E(V_{2j}^r),\) where \(V_{1j} \sim GIG\left( \frac{1-p_j^o}{2},\chi _{j}^o,\psi _{j}^o\right) \) and \(V_{2j} \sim GIG\left( -\frac{1+p_j^o}{2},\chi _{j}^o,\psi _{j}^o\right) \). Therefore,

    $$\begin{aligned} E(W^r_j \mid {\varvec{y}}_j^o) =&\pi _j^o \left( \frac{\chi _j^o}{\psi _j^o}\right) ^{r/2}\frac{K_{(1-p^o)/2+r}\left( \sqrt{\psi _j^o\chi _j^o} \right) }{K_{(1-p^o)/2}\left( \sqrt{\psi _j^o\chi _j^o} \right) }\\&+ (1-\pi _j^o) \left( \frac{\chi _j^o}{\psi _j^o}\right) ^{r/2}\frac{K_{-(1+p^o)/2+r}\left( \sqrt{\psi _j^o\chi _j^o}\right) }{K_{-(1+p^o)/2}\left( \sqrt{\psi _j^o\chi _j^o}\right) },\quad r=\pm 1. \end{aligned}$$

    From part (f), we can see that \( E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)={\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} .\) Applying the law of iterative expectations, we get

    $$\begin{aligned} E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o)=&E\left\{ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o\right\} =E\left\{ {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \mid {\varvec{y}}_j^o \right\} \\ =&{\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(E(W_j\mid {\varvec{y}}_j^o)-a_{\alpha })\right\} . \end{aligned}$$

    Using the law of iterative expectations and part (f), it is easy to verify that

    $$\begin{aligned} E(W_j^{-1}\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o)=&E\left\{ W_j^{-1}E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o\right\} \\ =&E\left\{ W_j^{-1}\left[ {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \right] \mid {\varvec{y}}_j\right\} \nonumber \\ =&{\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o E(W_j^{-1}\mid {\varvec{y}}_j^o)+{\varvec{\lambda }}\left[ 1-a_{\alpha }E(W_j^{-1}\mid {\varvec{y}}_j^o)\right] \right\} . \end{aligned}$$

    Using the law of iterative expectations, we obtain

    $$\begin{aligned} E(W_j^{-1}\tilde{{{\varvec{U}}}}_{j}&\tilde{{{\varvec{U}}}}_{j}^{\top }\mid {\varvec{y}}_j^o)= E(W_j^{-1}E(\tilde{{{\varvec{U}}}}_{j}\tilde{{{\varvec{U}}}}_{j}^{\top }\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o)\nonumber \\ =&E\left\{ W_j^{-1} \Big [ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j) E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)^{\top } + W_j{\varvec{R}}_{j}^{oo}\Big ]\mid {\varvec{y}}_j \right\} \nonumber \\ =&E\left\{ W_j^{-1} \Big [ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j) \left( {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \right) ^{\top } + W_j R_{j}^{oo}\Big ]\mid {\varvec{y}}_j^o\right\} \nonumber \\ =&\Bigg \{E(W_j^{-1}\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o) {\varvec{b}}_{j}^{o\top }+\big [E(\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o) -a_{\alpha }E(W_j^{-1}\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o)\big ]{\varvec{\lambda }}^{\top } +{\varvec{I}}_q\Bigg \}{\varvec{R}}_{j}^{oo}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bekker, A., Hashemi, F. & Arashi, M. Flexible Factor Model for Handling Missing Data in Supervised Learning. Commun. Math. Stat. 11, 477–501 (2023). https://doi.org/10.1007/s40304-021-00260-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40304-021-00260-9

Keywords

Mathematics Subject Classification

Navigation