Skip to main content
Log in

Spike and slab Bayesian sparse principal component analysis

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Sparse principal component analysis (SPCA) is a popular tool for dimensionality reduction in high-dimensional data. However, there is still a lack of theoretically justified Bayesian SPCA methods that can scale well computationally. One of the major challenges in Bayesian SPCA is selecting an appropriate prior for the loadings matrix, considering that principal components are mutually orthogonal. We propose a novel parameter-expanded coordinate ascent variational inference (PX-CAVI) algorithm. This algorithm utilizes a spike and slab prior, which incorporates parameter expansion to cope with the orthogonality constraint. Besides comparing to two popular SPCA approaches, we introduce the PX-EM algorithm as an EM analogue to the PX-CAVI algorithm for comparison. Through extensive numerical simulations, we demonstrate that the PX-CAVI algorithm outperforms these SPCA approaches, showcasing its superiority in terms of performance. We study the posterior contraction rate of the variational posterior, providing a novel contribution to the existing literature. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. The \(\textsf{R}\) package \(\textsf{VBsparsePCA}\) with an implementation of the algorithm is available on the Comprehensive R Archive Network (CRAN).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1

Similar content being viewed by others

References

  • Avalos-Pacheco, A., Rossell, D., Savage, R.S.: Heterogeneous large datasets integration using Bayesian factor regression. Bayesian Anal. 17, 33–66 (2022)

    Article  MathSciNet  Google Scholar 

  • Banerjee, S., Castillo, I., Ghosal, S.: Bayesian inference in high-dimensional models. Springer volume on Data Science (to Appear) (2021)

  • Belitser, E., Ghosal, S.: Empirical Bayes oracle uncertainty quantification for regression. Ann. Stat. 48, 3113–3137 (2020)

    Article  MathSciNet  Google Scholar 

  • Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 518, 859–877 (2017)

    Article  MathSciNet  Google Scholar 

  • Bouveyron, C., Latouche, P., Mattei, P.-A.: Bayesian variable selection for globally sparse probabilistic PCA. Electron. J. Stat. 12, 3036–3070 (2018)

    Article  MathSciNet  Google Scholar 

  • Cai, T., Ma, Z., Wu, Y.: Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Relat. Fields 161(3), 781–815 (2015)

    Article  MathSciNet  Google Scholar 

  • Carbonetto, P., Stephens, M.: Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7(1), 73–108 (2012)

    Article  MathSciNet  Google Scholar 

  • Castillo, I., Roquain, E.: On spike and slab empirical Bayes multiple testing. Ann. Stat. (to appear) (2020)

  • Castillo, I., Schmidt-Hieber, J., van der Vaart, A.: Bayesian linear regression with sparse priors. Ann. Stat. 43, 1986–2018 (2015)

    Article  MathSciNet  Google Scholar 

  • Castillo, I., Szabó, B.: Spike and slab empirical Bayes sparse credible sets. Bernoulli 26, 127–158 (2020)

    Article  MathSciNet  Google Scholar 

  • Castillo, I., van der Vaart, A.: Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012)

    Article  MathSciNet  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–22 (1977)

    Article  MathSciNet  Google Scholar 

  • Erichson, N.B., Zheng, P., Manohar, K., Brunton, S.L., Kutz, J.N., Aravkin, A.Y.: Sparse principal component analysis via variable projection. SIAM J. Appl. Math. 80, 977–1002 (2020)

    Article  MathSciNet  Google Scholar 

  • Gao, C., Zhou, H.H.: Rate-optimal posterior contraction rate for sparse PCA. Ann. Stat. 43, 785–818 (2015)

    Article  MathSciNet  Google Scholar 

  • Ghahramani, Z., Beal, M.: Variational inference for Bayesian mixtures of factor analysers. In: Advances in Neural Information Processing Systems, vol. 12. MIT Press, Cambridge (1999)

  • Guan, Y., Dy, J.: Sparse probabilistic principal component analysis. Proc. Twelfth Int. Conf. Artif. Intell. Stat. 5, 185–192 (2009)

    Google Scholar 

  • Hansen, B., Avalos-Pacheco, A., Russo, M., Vito, R.D.: Fast variational inference for Bayesian factor analysis in single and multi-study settings. (2023). arXiv:2305.13188

  • Huang, X., Wang, J., Liang, F.: A variational algorithm for Bayesian variable selection. arXiv:1602.07640 (2016)

  • Jammalamadaka, S.R., Qiu, J., Ning, N.: Predicting a stock portfolio with the multivariate Bayesian structural time series model: Do news or emotions matter? Int. J. Artif. Intell. 17(2), 81–104 (2019)

    Google Scholar 

  • Jeong, S., Ghosal, S.: Unified Bayesian asymptotic theory for sparse linear regression. arXiv:2008.10230 (2020)

  • Johnson, V.E., Rossell, D.: On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B 72, 143–170 (2010)

    Article  MathSciNet  Google Scholar 

  • Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)

    Article  MathSciNet  Google Scholar 

  • Johnstone, I.M., Silverman, B.W.: Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32(4), 1594–1649 (2004)

    Article  MathSciNet  Google Scholar 

  • Li, Z., Safo, S.E., Long, Q.: Incorporating biological information in sparse principal component analysis with application to genomic data. BMC Bioinformatics, 12 pages (2017)

  • Liu, C., Rubin, D.B., Wu, Y.N.: Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85(4), 755–770 (1998)

    Article  MathSciNet  Google Scholar 

  • Martin, R., Mess, R., Walker, S.G.: Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23, 1822–1857 (2017)

    Article  MathSciNet  Google Scholar 

  • Martin, R., Ning, B.: Empirical priors and coverage of posterior credible sets in a sparse normal mean model. Sankhya A 82, 477–498 (2020)

    Article  MathSciNet  Google Scholar 

  • Ning, B., Ghosal, S., Thomas, J.: Bayesian method for causal inference in spatially-correlated multivariate time series. Bayesian Anal. 14(1), 1–28 (2019)

    Article  MathSciNet  Google Scholar 

  • Ning, B., Jeong, S., Ghosal, S.: Bayesian linear regression for multivariate responses under group sparsity. Bernoulli 26, 2353–2382 (2020)

    Article  MathSciNet  Google Scholar 

  • Ning, B.Y.-C.: Empirical Bayes large-scale multiple testing for high-dimensional sparse binary sequences. arXiv:2307.05943, 80 pages (2023a)

  • Ning, N.: Bayesian feature selection in joint quantile time series analysis. Bayesian Anal. 1(1), 1–27 (2023b)

    Google Scholar 

  • Ohn, I., Lin, L., Kim, Y.: A Bayesian sparse factor model with adaptive posterior concentration. Bayesian Anal. (to Appear), 1–25 (2023)

  • Pati, D., Bhattacharya, A., Pillai, N.S., Dunson, D.: Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 42(3), 1102–1130 (2014)

    Article  MathSciNet  Google Scholar 

  • Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sin. 17(4), 1617–1642 (2007)

    MathSciNet  Google Scholar 

  • Qiu, J., Jammalamadaka, S.R., Ning, N.: Multivariate Bayesian structural time series model. J. Mach. Learn. Res. 19(1), 2744–2776 (2018)

    MathSciNet  Google Scholar 

  • Qiu, J., Jammalamadaka, S.R., Ning, N.: Multivariate time series analysis from a Bayesian machine learning perspective. Ann. Math. Artif. Intell. 88(10), 1061–1082 (2020)

    Article  MathSciNet  Google Scholar 

  • Rapach, D., Zhou, G.: Sparse macro factors. Available at SSRN: https://ssrn.com/abstract=3259447 (2019)

  • Ray, K., Szabo, B.: Variational Bayes for high-dimensional linear regression with sparse priors. J. Am. Stat. Assoc. 117, 1270–1281 (2022)

  • Ročková, V.: Bayesian estimation of sparse signals with a continuous spike-and-slab prior. Ann. Stat. 46(1), 401–437 (2018)

    Article  MathSciNet  Google Scholar 

  • Ročková, V., George, E.I.: EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 109, 828–846 (2014)

    Article  MathSciNet  Google Scholar 

  • Ročková, V., George, E.I.: Fast Bayesian factor analysis via automatic rotations to sparsity. J. Am. Stat. Assoc. 111, 1608–1622 (2016)

    Article  MathSciNet  Google Scholar 

  • Ročková, V., George, E.I.: The spike-and-slab lasso. J. Am. Stat. Assoc. 113, 431–444 (2018)

    Article  MathSciNet  Google Scholar 

  • Ročková, V., Lesaffre, E.: Incorporating grouping information in Bayesian variable selection with applications in genomics. Bayesian Anal. 9(1), 221–258 (2014)

    Article  MathSciNet  Google Scholar 

  • She, Y.: Selective factor extraction in high dimensions. Biometrika 104, 97–110 (2017)

    MathSciNet  Google Scholar 

  • van der Pas, S., Szabó, B., van der Vaart, A.: Uncertainty quantification for the horseshoe (with discussion). Bayesian Anal. 12(4), 1221–1274 (2017)

    MathSciNet  Google Scholar 

  • Varmuza, K., Filzmoser, P.: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL (2009)

    Google Scholar 

  • Wang, Y., Blei, D.M.: Frequentist consistency of variational Bayes. J. Am. Stat. Assoc. 114, 1147–1161 (2019)

    Article  MathSciNet  Google Scholar 

  • Wang, Z., Gu, Y., Lan, A., Baraniuk, R.: VarFA: A variational factor analysis framework for efficient Bayesian learning analytics. arXiv:2005.13107, 12 pages (2020)

  • Xie, F., Cape, J., Priebe, C.E., Xu, Y.: Bayesian sparse spiked covariance model with a continuous matrix shrinkage prior. Bayesian Anal. 17(4), 1193–1217 (2022)

    Article  MathSciNet  Google Scholar 

  • Yang, Y., Pati, D., Bhattacharya, A.: \(\alpha \)-variational inference with statistical guarantees. Ann. Stat. 48, 886–905 (2020)

  • Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)

    Article  MathSciNet  Google Scholar 

  • Zhang, F., Gao, C.: Convergence rates of variational posterior distributions. Ann. Stat. 48, 2180–2207 (2020)

    Article  MathSciNet  Google Scholar 

  • Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Gr. Stat. 265–286 (2006)

  • Zou, H., Xue, L.: A selective overview of sparse principal component analysis. Proc. IEEE 106(8), 1311–1320 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to warmly thanks Drs. Ryan Martin and Botond Szabó for their helpful suggestions on an early version of this paper. Bo Ning gratefully acknowledges the funding support provided by NASA XRP 80NSSC18K0443. The authors would like to thank two anonymous reviewers and the Editors for their very constructive comments and efforts on this lengthy work, which greatly improved the quality of this paper.

Funding

The research of Ning was partially supported by NIH grant 1R21AI180492-01 and the Individual Research Grant at Texas A &M University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Ning.

Ethics declarations

Conflict of interest

Ning Ning serves as an Associate Editor for statistics and computing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 601 KB)

Appendix A: Derivation of (12)–(18)

Appendix A: Derivation of (12)–(18)

First, we need the following result:

$$\begin{aligned}&\mathbb {E}_{w|\Theta ^{(t)}} \left[ \frac{1}{2\sigma ^2} \sum _{i=1}^n (X_{ij} - {{\widetilde{\beta }}}_j w_i)^2 \right] \nonumber \\&= \frac{1}{2\sigma ^2} \sum _{i=1}^n \left( X_{ij}^2 - 2X_{ij} {{\widetilde{\beta }}}_j {{\widetilde{\omega }}}_i + {{\widetilde{\beta }}}_j H_i {{\widetilde{\beta }}}_j' \right) , \end{aligned}$$
(36)

where \(H_i = {{\widetilde{\omega }}}_i{{\widetilde{\omega }}}_i' + {\widetilde{V}}_w\) and the expressions of \({\widetilde{w}}_i\) and \({\widetilde{V}}_w\) are given in (10).

Since the ELBO is a summation of p terms, we solve \(u_j\) and \(M_j\) for each j. As the posterior conditional on \(\gamma _j =0\) is singular to the Dirac measure, we only need to consider the case \(\gamma _j = 1\). This leads to minimize the function

$$\begin{aligned}&\mathbb {E}_{{\widetilde{u}}_j, {\widetilde{M}}_j, z_j|\gamma _j = 1}\Bigg [ \frac{1}{2\sigma ^2} \sum _{i=1}^n \left( - 2X_{ij} {{\widetilde{\beta }}}_j {{\widetilde{\omega }}}_i + {{\widetilde{\beta }}}_j H_i {{\widetilde{\beta }}}_j' \right) \\&\qquad \qquad \qquad + \log \frac{N({\widetilde{u}}_j, \sigma ^2 {\widetilde{M}}_j)}{\kappa _j^\circ g( {{\widetilde{\beta }}}_j|\lambda _1)} \Bigg ]\\ {}&\quad = C - \frac{1}{\sigma ^2} \sum _{i=1}^n X_{ij} {\widetilde{u}}_j {{\widetilde{\omega }}}_i \\&\hspace{1cm}+ \frac{1}{2\sigma ^2} \sum _{i=1}^n \left( {\widetilde{u}}_j H_i {\widetilde{u}}_j' + {{\,\text {Tr}\,}}\left( \sigma ^2{\widetilde{M}}_jH_i\right) \right) \\ {}&\hspace{1cm} + \lambda _1 \sum _{k=1}^r f({\widetilde{u}}_{jk}, {\widetilde{M}}_{j,kk}), \end{aligned}$$

where \(\kappa _j^\circ = \int \pi (\gamma _j|\kappa ) d\Pi (\kappa )\). Then we take the derivative of \({\widetilde{u}}_j\) and \({\widetilde{M}}_j\) to obtain (12) and (13). The solutions in (14) are obtained by changing \(\lambda _1 \sum _{k=1}^r f({\widetilde{u}}_{jk}, {\widetilde{M}}_{j,kk})\) in the last display with \(\frac{\lambda _1}{2\sigma ^2} \left( {\widetilde{u}}_j {\widetilde{u}}_j' + \sigma ^2 {{\,\textrm{Tr}\,}}(M_j)\right) \).

To derive (15), we have

$$\begin{aligned}&\mathbb {E}_P \left( \mathbb {E}_{w|\Theta ^{(t)}} \pi ({{\widetilde{\beta }}}_j, w, X) - \log q({{\widetilde{\beta }}}_j) \right) \nonumber \\&= C + \mathbb {E}_{{{\widetilde{\mu }}}_j, {\widetilde{M}}_j, z_j} \Bigg [ \frac{1}{2\sigma ^2} \sum _{i=1}^n \left( -2X_{ij} {{\widetilde{\beta }}}_j {{\widetilde{\omega }}}_i + {{\widetilde{\beta }}}_j H_i {{\widetilde{\beta }}}_j' \right) \nonumber \\&\quad + \mathbb {1}_{\{\gamma _j = 0\}}\log \frac{1 - z_j}{1-\kappa _j^\circ } \nonumber \\&\quad + \mathbb {1}_{\{\gamma _j = 1\}}\log \frac{z_j N({{\widetilde{\mu }}}_j, \sigma ^2 M_j)}{\kappa _j^\circ g({{\widetilde{\beta }}}_j|\lambda _1)} \Bigg ]\nonumber \\&= C+ (1-z_j) \log \frac{1-z_j}{1-\kappa _j^\circ }\nonumber \\&\quad +z_j \Bigg \{ \frac{1}{2\sigma ^2} \sum _{i=1}^n \left( {{\widetilde{\mu }}}_j H_i {{\widetilde{\mu }}}_j' +\sigma ^2 {{\,\text {Tr}\,}}({\widetilde{M}}_j H_i) - 2X_{ij} {{\widetilde{\mu }}}_j {{\widetilde{\omega }}}_i \right) \nonumber \\&\quad + r\log \left( \frac{\sqrt{2}}{\sqrt{\pi } \sigma \lambda _1}\right) - \frac{1}{2}\log \det ({\widetilde{M}}_j) - \frac{1}{2}\nonumber \\&\quad + \lambda _1 \sum _{k=1}^r f({{\widetilde{\mu }}}_{jk},\sigma ^2 {\widetilde{M}}_{j,kk}) + \log \frac{z_j}{\kappa _j^\circ } \Bigg \}. \end{aligned}$$
(37)

The solution of \({\widehat{h}}_j\) can be obtained by minimizing \(z_j\) from the last line of the above display. Similarly, (16) is obtained by minimizing \(z_j\) from the following expression

$$\begin{aligned}&C+z_j\Bigg \{ \frac{1}{2\sigma ^2} \sum _{i=1}^n \left( {{\widetilde{\mu }}}_j H_i {{\widetilde{\mu }}}_j' +\sigma ^2 {{\,\text {Tr}\,}}({\widetilde{M}}_j H_i) - 2X_{ij} {{\widetilde{\mu }}}_j {{\widetilde{\omega }}}_i \right) \nonumber \\&- \frac{r\log \lambda _1 + 1}{2}- \frac{1}{2}\log \det ({\widetilde{M}}_j) + \frac{\lambda _1}{2\sigma ^2} \left( {\widetilde{u}}_j {\widetilde{u}}_j' + {{\,\text {Tr}\,}}(\sigma ^2 {\widetilde{M}}_j) \right) \nonumber \\&+ \log \frac{z_j}{\kappa _j^\circ }\Bigg \} + (1-z_j) \log \frac{1-z_j}{1-\kappa _j^\circ }. \end{aligned}$$
(38)

Last, to obtain (17), we first sum the expressions in (37) for all \(j =1, \dots , p\). Next, we write down the explicit expression of C which involves \(\sigma ^2\), i.e.,

$$\begin{aligned} pC_{\sigma ^2} = \frac{(np + 2\sigma _a +2)\log \sigma ^2}{2} + \frac{{{\,\textrm{Tr}\,}}(X'X) + 2\sigma _b}{2\sigma ^2}. \end{aligned}$$

Last, we plugging the above expression and solve \(\sigma ^2\). The solution (18) can be obtained similarly using (38).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ning, YC.B., Ning, N. Spike and slab Bayesian sparse principal component analysis. Stat Comput 34, 118 (2024). https://doi.org/10.1007/s11222-024-10430-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-024-10430-8

Keywords

Mathematics Subject Classification

Navigation