Abstract
Sparse principal component analysis (SPCA) is a popular tool for dimensionality reduction in high-dimensional data. However, there is still a lack of theoretically justified Bayesian SPCA methods that can scale well computationally. One of the major challenges in Bayesian SPCA is selecting an appropriate prior for the loadings matrix, considering that principal components are mutually orthogonal. We propose a novel parameter-expanded coordinate ascent variational inference (PX-CAVI) algorithm. This algorithm utilizes a spike and slab prior, which incorporates parameter expansion to cope with the orthogonality constraint. Besides comparing to two popular SPCA approaches, we introduce the PX-EM algorithm as an EM analogue to the PX-CAVI algorithm for comparison. Through extensive numerical simulations, we demonstrate that the PX-CAVI algorithm outperforms these SPCA approaches, showcasing its superiority in terms of performance. We study the posterior contraction rate of the variational posterior, providing a novel contribution to the existing literature. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. The \(\textsf{R}\) package \(\textsf{VBsparsePCA}\) with an implementation of the algorithm is available on the Comprehensive R Archive Network (CRAN).
Similar content being viewed by others
References
Avalos-Pacheco, A., Rossell, D., Savage, R.S.: Heterogeneous large datasets integration using Bayesian factor regression. Bayesian Anal. 17, 33–66 (2022)
Banerjee, S., Castillo, I., Ghosal, S.: Bayesian inference in high-dimensional models. Springer volume on Data Science (to Appear) (2021)
Belitser, E., Ghosal, S.: Empirical Bayes oracle uncertainty quantification for regression. Ann. Stat. 48, 3113–3137 (2020)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 518, 859–877 (2017)
Bouveyron, C., Latouche, P., Mattei, P.-A.: Bayesian variable selection for globally sparse probabilistic PCA. Electron. J. Stat. 12, 3036–3070 (2018)
Cai, T., Ma, Z., Wu, Y.: Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Relat. Fields 161(3), 781–815 (2015)
Carbonetto, P., Stephens, M.: Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7(1), 73–108 (2012)
Castillo, I., Roquain, E.: On spike and slab empirical Bayes multiple testing. Ann. Stat. (to appear) (2020)
Castillo, I., Schmidt-Hieber, J., van der Vaart, A.: Bayesian linear regression with sparse priors. Ann. Stat. 43, 1986–2018 (2015)
Castillo, I., Szabó, B.: Spike and slab empirical Bayes sparse credible sets. Bernoulli 26, 127–158 (2020)
Castillo, I., van der Vaart, A.: Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–22 (1977)
Erichson, N.B., Zheng, P., Manohar, K., Brunton, S.L., Kutz, J.N., Aravkin, A.Y.: Sparse principal component analysis via variable projection. SIAM J. Appl. Math. 80, 977–1002 (2020)
Gao, C., Zhou, H.H.: Rate-optimal posterior contraction rate for sparse PCA. Ann. Stat. 43, 785–818 (2015)
Ghahramani, Z., Beal, M.: Variational inference for Bayesian mixtures of factor analysers. In: Advances in Neural Information Processing Systems, vol. 12. MIT Press, Cambridge (1999)
Guan, Y., Dy, J.: Sparse probabilistic principal component analysis. Proc. Twelfth Int. Conf. Artif. Intell. Stat. 5, 185–192 (2009)
Hansen, B., Avalos-Pacheco, A., Russo, M., Vito, R.D.: Fast variational inference for Bayesian factor analysis in single and multi-study settings. (2023). arXiv:2305.13188
Huang, X., Wang, J., Liang, F.: A variational algorithm for Bayesian variable selection. arXiv:1602.07640 (2016)
Jammalamadaka, S.R., Qiu, J., Ning, N.: Predicting a stock portfolio with the multivariate Bayesian structural time series model: Do news or emotions matter? Int. J. Artif. Intell. 17(2), 81–104 (2019)
Jeong, S., Ghosal, S.: Unified Bayesian asymptotic theory for sparse linear regression. arXiv:2008.10230 (2020)
Johnson, V.E., Rossell, D.: On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B 72, 143–170 (2010)
Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)
Johnstone, I.M., Silverman, B.W.: Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32(4), 1594–1649 (2004)
Li, Z., Safo, S.E., Long, Q.: Incorporating biological information in sparse principal component analysis with application to genomic data. BMC Bioinformatics, 12 pages (2017)
Liu, C., Rubin, D.B., Wu, Y.N.: Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85(4), 755–770 (1998)
Martin, R., Mess, R., Walker, S.G.: Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23, 1822–1857 (2017)
Martin, R., Ning, B.: Empirical priors and coverage of posterior credible sets in a sparse normal mean model. Sankhya A 82, 477–498 (2020)
Ning, B., Ghosal, S., Thomas, J.: Bayesian method for causal inference in spatially-correlated multivariate time series. Bayesian Anal. 14(1), 1–28 (2019)
Ning, B., Jeong, S., Ghosal, S.: Bayesian linear regression for multivariate responses under group sparsity. Bernoulli 26, 2353–2382 (2020)
Ning, B.Y.-C.: Empirical Bayes large-scale multiple testing for high-dimensional sparse binary sequences. arXiv:2307.05943, 80 pages (2023a)
Ning, N.: Bayesian feature selection in joint quantile time series analysis. Bayesian Anal. 1(1), 1–27 (2023b)
Ohn, I., Lin, L., Kim, Y.: A Bayesian sparse factor model with adaptive posterior concentration. Bayesian Anal. (to Appear), 1–25 (2023)
Pati, D., Bhattacharya, A., Pillai, N.S., Dunson, D.: Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 42(3), 1102–1130 (2014)
Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sin. 17(4), 1617–1642 (2007)
Qiu, J., Jammalamadaka, S.R., Ning, N.: Multivariate Bayesian structural time series model. J. Mach. Learn. Res. 19(1), 2744–2776 (2018)
Qiu, J., Jammalamadaka, S.R., Ning, N.: Multivariate time series analysis from a Bayesian machine learning perspective. Ann. Math. Artif. Intell. 88(10), 1061–1082 (2020)
Rapach, D., Zhou, G.: Sparse macro factors. Available at SSRN: https://ssrn.com/abstract=3259447 (2019)
Ray, K., Szabo, B.: Variational Bayes for high-dimensional linear regression with sparse priors. J. Am. Stat. Assoc. 117, 1270–1281 (2022)
Ročková, V.: Bayesian estimation of sparse signals with a continuous spike-and-slab prior. Ann. Stat. 46(1), 401–437 (2018)
Ročková, V., George, E.I.: EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 109, 828–846 (2014)
Ročková, V., George, E.I.: Fast Bayesian factor analysis via automatic rotations to sparsity. J. Am. Stat. Assoc. 111, 1608–1622 (2016)
Ročková, V., George, E.I.: The spike-and-slab lasso. J. Am. Stat. Assoc. 113, 431–444 (2018)
Ročková, V., Lesaffre, E.: Incorporating grouping information in Bayesian variable selection with applications in genomics. Bayesian Anal. 9(1), 221–258 (2014)
She, Y.: Selective factor extraction in high dimensions. Biometrika 104, 97–110 (2017)
van der Pas, S., Szabó, B., van der Vaart, A.: Uncertainty quantification for the horseshoe (with discussion). Bayesian Anal. 12(4), 1221–1274 (2017)
Varmuza, K., Filzmoser, P.: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL (2009)
Wang, Y., Blei, D.M.: Frequentist consistency of variational Bayes. J. Am. Stat. Assoc. 114, 1147–1161 (2019)
Wang, Z., Gu, Y., Lan, A., Baraniuk, R.: VarFA: A variational factor analysis framework for efficient Bayesian learning analytics. arXiv:2005.13107, 12 pages (2020)
Xie, F., Cape, J., Priebe, C.E., Xu, Y.: Bayesian sparse spiked covariance model with a continuous matrix shrinkage prior. Bayesian Anal. 17(4), 1193–1217 (2022)
Yang, Y., Pati, D., Bhattacharya, A.: \(\alpha \)-variational inference with statistical guarantees. Ann. Stat. 48, 886–905 (2020)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Zhang, F., Gao, C.: Convergence rates of variational posterior distributions. Ann. Stat. 48, 2180–2207 (2020)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Gr. Stat. 265–286 (2006)
Zou, H., Xue, L.: A selective overview of sparse principal component analysis. Proc. IEEE 106(8), 1311–1320 (2018)
Acknowledgements
We would like to warmly thanks Drs. Ryan Martin and Botond Szabó for their helpful suggestions on an early version of this paper. Bo Ning gratefully acknowledges the funding support provided by NASA XRP 80NSSC18K0443. The authors would like to thank two anonymous reviewers and the Editors for their very constructive comments and efforts on this lengthy work, which greatly improved the quality of this paper.
Funding
The research of Ning was partially supported by NIH grant 1R21AI180492-01 and the Individual Research Grant at Texas A &M University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Ning Ning serves as an Associate Editor for statistics and computing.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix A: Derivation of (12)–(18)
Appendix A: Derivation of (12)–(18)
First, we need the following result:
where \(H_i = {{\widetilde{\omega }}}_i{{\widetilde{\omega }}}_i' + {\widetilde{V}}_w\) and the expressions of \({\widetilde{w}}_i\) and \({\widetilde{V}}_w\) are given in (10).
Since the ELBO is a summation of p terms, we solve \(u_j\) and \(M_j\) for each j. As the posterior conditional on \(\gamma _j =0\) is singular to the Dirac measure, we only need to consider the case \(\gamma _j = 1\). This leads to minimize the function
where \(\kappa _j^\circ = \int \pi (\gamma _j|\kappa ) d\Pi (\kappa )\). Then we take the derivative of \({\widetilde{u}}_j\) and \({\widetilde{M}}_j\) to obtain (12) and (13). The solutions in (14) are obtained by changing \(\lambda _1 \sum _{k=1}^r f({\widetilde{u}}_{jk}, {\widetilde{M}}_{j,kk})\) in the last display with \(\frac{\lambda _1}{2\sigma ^2} \left( {\widetilde{u}}_j {\widetilde{u}}_j' + \sigma ^2 {{\,\textrm{Tr}\,}}(M_j)\right) \).
To derive (15), we have
The solution of \({\widehat{h}}_j\) can be obtained by minimizing \(z_j\) from the last line of the above display. Similarly, (16) is obtained by minimizing \(z_j\) from the following expression
Last, to obtain (17), we first sum the expressions in (37) for all \(j =1, \dots , p\). Next, we write down the explicit expression of C which involves \(\sigma ^2\), i.e.,
Last, we plugging the above expression and solve \(\sigma ^2\). The solution (18) can be obtained similarly using (38).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ning, YC.B., Ning, N. Spike and slab Bayesian sparse principal component analysis. Stat Comput 34, 118 (2024). https://doi.org/10.1007/s11222-024-10430-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10430-8