Flexible Factor Model for Handling Missing Data in Supervised Learning

Bekker, Andriette; Hashemi, Farzane; Arashi, Mohammad

doi:10.1007/s40304-021-00260-9

Flexible Factor Model for Handling Missing Data in Supervised Learning

Published: 30 August 2022

Volume 11, pages 477–501, (2023)
Cite this article

Communications in Mathematics and Statistics Aims and scope Submit manuscript

310 Accesses
Explore all metrics

Abstract

This paper presents an extension of the factor analysis model based on the normal mean–variance mixture of the Birnbaum–Saunders in the presence of nonresponses and missing data. This model can be used as a powerful tool to model non-normal features observed from data such as strongly skewed and heavy-tailed noises. Missing data may occur due to operator error or incomplete data capturing therefore cannot be ignored in factor analysis modeling. We implement an EM-type algorithm for maximum likelihood estimation and propose single imputation of possible missing values under a missing at random mechanism. The potential and applicability of our proposed method are illustrated through analyzing both simulated and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel Bayesian approach for latent variable modeling from mixed data with missing values

Article Open access 08 January 2019

On using multiple imputation for exploratory factor analysis of incomplete data

Article 01 February 2018

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Article 14 November 2023

References

Anderson, T.W.: An introduction to multivariate statistical analysis (Wiley Series in Probability and Statistics), 3 edn. (2003)
Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized inverse gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 38(4), 309–311 (1977)
Article MathSciNet MATH Google Scholar
Basilevsky, A.T.: Statistical factor analysis and related methods: theory and applications, New York, Wiley (2009)
Desmond, A.F.: On the relationship between two fatigue-life models. IEEE Trans. Reliab. 35(2), 167–169 (1986)
Article MATH Google Scholar
Fokoué, E., Titterington, D.: Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation. Machine Learning 50(1), 73–94 (2003)
Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4), 237–264 (1953)
Article MathSciNet MATH Google Scholar
Hashemi, F., Naderi, M., Jamalizadeh, A., Lin, T.I.: A skew factor analysis model based on the normal mean–variance mixture of Birnbaum-Saunders distribution. J. Appl. Stat. 47(16), 3007–3029 (2020)
Article MathSciNet MATH Google Scholar
Hashemi, F., Naderi, M., Mashinchi, M.: Clustering right-skewed data stream via Birnbaum-Saunders mixture models: a flexible approach based on fuzzy clustering algorithm. Appl. Soft Comput. 82, 105539 (2019). https://doi.org/10.1016/j.asoc.2019.105539
Article Google Scholar
Kibler, D., Aha, D.W., Albert, M.K.: Instance-based prediction of real-valued attributes. Comput. Intell. 5(2), 51–57 (1989)
Article Google Scholar
Lawley, D.N.: The estimation of factor loadings by the method of maximum likelihood. Proc. R. Soc. Edinb. 60(1), 64–82 (1940)
Article MathSciNet MATH Google Scholar
Lawley, D.N., Maxwell, A.E.: Factor analysis as a statistical method. J. Royal Statist. Soc.: Series D (The Statistician) 12(3), 209–229 (1962)
Google Scholar
Lee, S.X., Mclachlan, G.J.: On mixtures of skew normal and skew t-distributions. Adv. Data Anal. Classif. 7(3), 241–266 (2013)
Article MathSciNet MATH Google Scholar
Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat. Comput. 24(4), 531–546 (2014)
Article MathSciNet MATH Google Scholar
Lin, T.I., Wang, W.L., McLachlan, G.J., Lee, S.X.: Robust mixtures of factor analysis models using the restricted multivariate skew-t distribution. Stat. Model. 18(1), 50–72 (2018)
Article MathSciNet MATH Google Scholar
Little, R., Rubin, D.: Statistical analysis with missing data. Wiley, London (2002)
Book MATH Google Scholar
Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633–648 (1994)
Article MathSciNet MATH Google Scholar
Liu, M., Lin, T.: Skew-normal factor analysis models with incomplete data. J. Appl. Stat. 42(4), 789–805 (2015)
Article MathSciNet MATH Google Scholar
McLachlan, G.J., Bean, R., Jones, L.B.T.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Statist. Data Analy. 51(11), 5327–5338 (2007)
Article MathSciNet MATH Google Scholar
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993)
Article MathSciNet MATH Google Scholar
Murray, P.M., Browne, R.P., McNicholas, P.D.: Mixtures of skew-t factor analyzers. Comput. Statist. Data Analy. 77, 326–335 (2014a)
Article MathSciNet MATH Google Scholar
Murray, P.M., McNicholas, P.D., Browne, R.P.: A mixture of common skew-t factor analysers. Stat 3(1), 68–82 (2014b)
Article MathSciNet Google Scholar
Pourmousa, R., Jamalizadeh, A., Rezapour, M.: Multivariate normal mean-variance mixture distribution based on Birnbaum-Saunders distribution. J. Stat. Comput. Simul. 85(13), 2736–2749 (2015)
Article MathSciNet MATH Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet MATH Google Scholar
Rubin, D.B., Thayer, D.T.: Em algorithms for ml factor analysis. Psychometrika 47(1), 69–76 (1982)
Article MathSciNet MATH Google Scholar
Schafer, J.L.: Analysis of incomplete multivariate data. CRC Press (1997)
Tortora, C., McNicholas, P.D., Browne, R.P.: A mixture of generalized hyperbolic factor analyzers. Adv. Data Anal. Classif. 10(4), 423–440 (2015). https://doi.org/10.1007/s11634-015-0204-z
Article MathSciNet MATH Google Scholar
Villasenor Alva, J.A., Estrada, E.G.: A generalization of shapiro-wilk’s test for multivariate normality. Communications in Statistics-Theory and Methods 38(11), 1870–1883 (2009)
Wang, W.L., Liu, M., Lin, T.I.: Robust skew-t factor analysis models for handling missing data. Statis. Methods Appl. 26(4), 649–672 (2017)
Article MathSciNet MATH Google Scholar
Wei, Y., Tang, Y., McNicholas, P.D.: Flexible high-dimensional unsupervised learning with missing data. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 610–621 (2020)
Article Google Scholar

Download references

Acknowledgements

The authors would like to sincerely thank the Editor, anonymous Associate Editor and two reviewers for their constructive and insightful comments, which significantly improved the presentation. This work was based on research supported by the National Research Foundation, South Africa (SRUG190308422768 Grant No. 120839 and IFR170227223754 Grant No. 109214), the South African NRF SARChI Research Chair in Computational and Methodological Statistics (UID: 71199). The research of the corresponding author is supported by a grant from Ferdowsi University of Mashhad (N.2/54034). We would like to thank Prof. McNicholas (Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada) for his valuable comments on the earlier version of this paper.

Author information

Authors and Affiliations

Department of Statistics, Faculty of Natural and Agricultural Sciences, University of Pretoria, Pretoria, South Africa
Andriette Bekker, Farzane Hashemi & Mohammad Arashi
Department of Statistics, Faculty of Mathematical Sciences, University of Kashan, Kashan, Iran
Farzane Hashemi
Department of Statistics, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran
Mohammad Arashi

Authors

Andriette Bekker
View author publications
You can also search for this author in PubMed Google Scholar
Farzane Hashemi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Arashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Arashi.

Appendix A. Proof of Proposition 1

(a)
Let $ {\varvec{Y}}_j\;(j=1,\ldots ,n) $, $ {\varvec{\mu }} $, $ {\varvec{\Sigma }} $ and $ {\varvec{\lambda }} $ be partitioned as
$$\begin{aligned}&{\varvec{Y}}_j=\begin{bmatrix} {\varvec{Y}}_j^o\\ {\varvec{Y}}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{Y}}_j\\ {\varvec{M}}_j{\varvec{Y}}_j \end{bmatrix},\quad {\varvec{\mu }}=\begin{bmatrix} {\varvec{\mu }}_j^o\\ {\varvec{\mu }}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\mu }}\\ {\varvec{M}}_j{\varvec{\mu }} \end{bmatrix},\quad {\varvec{\lambda }}=\begin{bmatrix} {\varvec{\lambda }}_j^o\\ {\varvec{\lambda }}_j^m \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\lambda }}\\ {\varvec{M}}_j{\varvec{\lambda }} \end{bmatrix},\\&{\varvec{\Sigma }}=\begin{bmatrix} {\varvec{\Sigma }}_j^{oo}&{} {\varvec{\Sigma }}_j^{om}\\ {\varvec{\Sigma }}_j^{mo}&{} {\varvec{\Sigma }}_j^{mm} \end{bmatrix}=\begin{bmatrix} {\varvec{O}}_j{\varvec{\Sigma }} {\varvec{O}}_j^\top &{}{\varvec{O}}_j{\varvec{\Sigma }} {\varvec{M}}_j^\top \\ {\varvec{M}}_j{\varvec{\Sigma }} {\varvec{O}}_j^\top &{}{\varvec{M}}_j{\varvec{\Sigma }} {\varvec{M}}_j^\top \end{bmatrix}. \end{aligned}$$
Based on [22], the marginal distribution of the observed component $ {\varvec{Y}}_j^o $ is
$$\begin{aligned} {\varvec{Y}}_j^o \sim \text {NMVBS}_{p_j^o}({\varvec{\mu }}_{j}^o -a_{\alpha }{\varvec{\eta }}_{j}^o,{\varvec{\Sigma }}_{j}^{oo}, {\varvec{\eta }}_{j}^o,\alpha ). \end{aligned}$$
(b)
Using part (a) and (2.7), proof of part (b) is omitted.
(c)
We have $ {\varvec{Y}}_j=\begin{bmatrix} {\varvec{Y}}_j^o\\ {\varvec{Y}}_j^m \end{bmatrix}\mid (\tilde{{\varvec{u}}}_j, w_j) \sim N_p({\varvec{\mu }} +\tilde{{\varvec{B}}}\tilde{{\varvec{u}}}_j,w_j{\varvec{D}}) $. By Theorem 2.5.1 of [1], we can show that
$$\begin{aligned} {\varvec{\varphi }}_{j}^{m.o}=&E({\varvec{Y}}_j^m\mid {\varvec{Y}}_j^o,\tilde{{\varvec{u}}}_j, w_j)={\varvec{M}}_j {\varvec{\mu }}\\ {}&+{\varvec{M}}_j \tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j}+{\varvec{M}}_j{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}({\varvec{O}}_j{\varvec{Y}}_j- {\varvec{O}}_j{\varvec{\mu }} - {\varvec{O}}_j\tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j})\\ =&{\varvec{M}}_j\left[ {\varvec{\mu }} +\tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j}+{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j({\varvec{Y}}_j-{\varvec{\mu }} - \tilde{{\varvec{B}}} \tilde{{\varvec{u}}}_{j})\right] \end{aligned}$$
and
$$\begin{aligned} {\varvec{D}}_{j}^{mm.o}=&cov({\varvec{Y}}_j^m\mid {\varvec{Y}}_j^o,\tilde{{\varvec{u}}}_j, w_j)={\varvec{M}}_j{\varvec{D}} {\varvec{M}}_j^\top -{\varvec{M}}_j{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j{\varvec{D}} {\varvec{M}}_j^\top \\ =&{\varvec{M}}_j\left( {\varvec{I}}_p-{\varvec{D}} {\varvec{O}}_j^\top ({\varvec{O}}_j{\varvec{D}} {\varvec{O}}_j^\top )^{-1}{\varvec{O}}_j\right) {\varvec{D}} {\varvec{M}}_j^\top . \end{aligned}$$
Thus, $ {\varvec{Y}}_j^m\mid ({\varvec{Y}}_j^o, \tilde{{\varvec{u}}}_{j}, w_j) \sim N_{p-p_j^o}({\varvec{\varphi }}_{j}^{m.o}, W_j{\varvec{D}}_{j}^{mm.o}) $.

The proofs of part (d), (e) and part (f) are straightforward omitted based on part (a) and [7].

(g)
It follows from part (d) that the distribution of $W_j\mid {\varvec{y}}_j^o$ has a mixture of two GIG distributions. Thus, $ E(W_j^r\mid {\varvec{y}}_j^o)=\pi _j^o E(V_{1j}^r)+ (1-\pi _j^o) E(V_{2j}^r),$ where $V_{1j} \sim GIG\left( \frac{1-p_j^o}{2},\chi _{j}^o,\psi _{j}^o\right) $ and $V_{2j} \sim GIG\left( -\frac{1+p_j^o}{2},\chi _{j}^o,\psi _{j}^o\right) $. Therefore,
$$\begin{aligned} E(W^r_j \mid {\varvec{y}}_j^o) =&\pi _j^o \left( \frac{\chi _j^o}{\psi _j^o}\right) ^{r/2}\frac{K_{(1-p^o)/2+r}\left( \sqrt{\psi _j^o\chi _j^o} \right) }{K_{(1-p^o)/2}\left( \sqrt{\psi _j^o\chi _j^o} \right) }\\&+ (1-\pi _j^o) \left( \frac{\chi _j^o}{\psi _j^o}\right) ^{r/2}\frac{K_{-(1+p^o)/2+r}\left( \sqrt{\psi _j^o\chi _j^o}\right) }{K_{-(1+p^o)/2}\left( \sqrt{\psi _j^o\chi _j^o}\right) },\quad r=\pm 1. \end{aligned}$$
From part (f), we can see that $ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)={\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} .$ Applying the law of iterative expectations, we get
$$\begin{aligned} E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o)=&E\left\{ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o\right\} =E\left\{ {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \mid {\varvec{y}}_j^o \right\} \\ =&{\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(E(W_j\mid {\varvec{y}}_j^o)-a_{\alpha })\right\} . \end{aligned}$$
Using the law of iterative expectations and part (f), it is easy to verify that
$$\begin{aligned} E(W_j^{-1}\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o)=&E\left\{ W_j^{-1}E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o\right\} \\ =&E\left\{ W_j^{-1}\left[ {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \right] \mid {\varvec{y}}_j\right\} \nonumber \\ =&{\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o E(W_j^{-1}\mid {\varvec{y}}_j^o)+{\varvec{\lambda }}\left[ 1-a_{\alpha }E(W_j^{-1}\mid {\varvec{y}}_j^o)\right] \right\} . \end{aligned}$$
Using the law of iterative expectations, we obtain
$$\begin{aligned} E(W_j^{-1}\tilde{{{\varvec{U}}}}_{j}&\tilde{{{\varvec{U}}}}_{j}^{\top }\mid {\varvec{y}}_j^o)= E(W_j^{-1}E(\tilde{{{\varvec{U}}}}_{j}\tilde{{{\varvec{U}}}}_{j}^{\top }\mid {\varvec{y}}_j^o,W_j)\mid {\varvec{y}}_j^o)\nonumber \\ =&E\left\{ W_j^{-1} \Big [ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j) E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j)^{\top } + W_j{\varvec{R}}_{j}^{oo}\Big ]\mid {\varvec{y}}_j \right\} \nonumber \\ =&E\left\{ W_j^{-1} \Big [ E(\tilde{{{\varvec{U}}}}_{j}\mid {\varvec{y}}_j^o,W_j) \left( {\varvec{R}}_{j}^{oo}\left\{ {\varvec{b}}_{j}^o+{\varvec{\lambda }}(W_j-a_{\alpha })\right\} \right) ^{\top } + W_j R_{j}^{oo}\Big ]\mid {\varvec{y}}_j^o\right\} \nonumber \\ =&\Bigg \{E(W_j^{-1}\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o) {\varvec{b}}_{j}^{o\top }+\big [E(\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o) -a_{\alpha }E(W_j^{-1}\tilde{{\varvec{U}}}_{j}\mid {\varvec{y}}_j^o)\big ]{\varvec{\lambda }}^{\top } +{\varvec{I}}_q\Bigg \}{\varvec{R}}_{j}^{oo}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bekker, A., Hashemi, F. & Arashi, M. Flexible Factor Model for Handling Missing Data in Supervised Learning. Commun. Math. Stat. 11, 477–501 (2023). https://doi.org/10.1007/s40304-021-00260-9

Download citation

Received: 14 January 2021
Revised: 28 March 2021
Accepted: 27 July 2021
Published: 30 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s40304-021-00260-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flexible Factor Model for Handling Missing Data in Supervised Learning

Abstract

Access this article

Similar content being viewed by others

A novel Bayesian approach for latent variable modeling from mixed data with missing values

On using multiple imputation for exploratory factor analysis of incomplete data

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A. Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Flexible Factor Model for Handling Missing Data in Supervised Learning

Abstract

Access this article

Similar content being viewed by others

A novel Bayesian approach for latent variable modeling from mixed data with missing values

On using multiple imputation for exploratory factor analysis of incomplete data

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A. Proof of Proposition 1

Appendix A. Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation