Skip to main content
Log in

Poisson reduced-rank models with sparse loadings

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

High-dimensional Poisson reduced-rank models have been considered for statistical inference on low-dimensional locations of the individuals based on the observations of high-dimensional count vectors. In this study, we assume sparsity on a so-called loading matrix to enhance its interpretability. The sparsity assumption leads to the use of \(L_1\) penalty, for the estimation of the loading. We provide novel computational and theoretical analyses for the corresponding penalized Poisson maximum likelihood estimation. We establish theoretical convergence rates for the parameters under weak-dependence conditions; this implies consistency even in large-dimensional problems. To implement the proposed method involving several computational issues, including nonconvex log-likelihoods, \(L_1\) penalty, and orthogonal constraints, we developed an iterative algorithm. Further, we propose a Bayesian-Information-Criteria-based penalty parameter selection, which works well in the implementation. Some numerical evidence is provided by conducting real-data-based simulation analyses and the proposed method is illustrated with the analysis of German party manifesto data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Ahn, S. C., & Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81, 1203–1227.

    Article  MathSciNet  Google Scholar 

  • Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70, 191–221.

    Article  MathSciNet  Google Scholar 

  • Bathia, N., Yao, Q., & Ziegelmann, F. (2010). Identifying the finite dimensionality of curve time series. Annals of Statistics, 38, 3352–3386.

    Article  MathSciNet  Google Scholar 

  • Collins, M., Dasgupta, S., & Schapire, R. (2002). A generalization of principal component analysis to the exponential family. Adv. Neu. Info. Proces. Sys, 14, 617–624.

    Google Scholar 

  • Freyaldenhoven, S. (2019). Identification through sparsity in factor models. Working paper.

  • Goodman, L. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. J. R. Statist. Soc., B 74, 537–552.

    MathSciNet  Google Scholar 

  • Goodman, L. A. (1981). Association models, canonical correlation in the analysis of cross-classification having ordered categories. Journal of American Statistical Association, 76, 320–334.

    MathSciNet  Google Scholar 

  • Gopalan, P., Hofman, J.M., & Blei, D.M. (2015). Scalable recommendation with hierarchical Poisson factorization. In Proc. 31st Conf. on Uncertainty in Artificial Intelligence, pages 326–335. AUAI Press Corvallis, Oregon.

  • Hallin, M., & Liska, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of American Statistical Association, 102, 603–617.

    Article  MathSciNet  Google Scholar 

  • Jentsch, C., Lee, E. R., & Mammen, E. (2020+). Poisson reduced rank models with an application to political text data. Biometrika.

  • Jentsch, C., Lee, E. R., & Mammen, E. (2020). Time-dependent poisson reduced rank models for political text data analysis. Computational Statistics and Data Analysis, 142, 106813.

    Article  MathSciNet  Google Scholar 

  • Jung, S., & Marron, J. (2009). Pca consistency in high dimension, low sample size context. Annals of Statistics, 37, 4104–4130.

    Article  MathSciNet  Google Scholar 

  • Jung, Y., Huang, J. Z., & Hu, J. (2014). Biomarker detection in association studies: Modeling SNPs simultaneously via logistic ANOVA. Journal of American Statistical Association, 108, 1355–1367.

    Article  MathSciNet  Google Scholar 

  • Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.

    Article  Google Scholar 

  • Lam, C., & Yao, Q. (2012). Factor modeling for high-dimenional time series: inference for the number of factors. Annals of Statistics, 40, 694–726.

    Article  MathSciNet  Google Scholar 

  • Lee, Y. K., Lee, E. R., & Park, B. U. (2012). Principal component analysis in very high-dimensional spaces. Statistica Sinica, 22, 933–956.

    MathSciNet  MATH  Google Scholar 

  • Lee, S., Chugh, P. E., Shen, H., Eberle, R., & Dittmer, D. (2013). Poisson factor models with applications to non-normalized microRNA profiling. Bioinformatics, 29, 1105–1111.

    Article  Google Scholar 

  • Lee, S., Zou, F., & Wright, F. (2014). Convergence of sample eigenvalues, eigenvectors, and principal component scores for ultra-high dimensional data. Biometrika, 101, 484–490.

    Article  MathSciNet  Google Scholar 

  • Park, Zhao. (2019). Sparse principal component analysis with missing observations. Annals of Applied Statistics, 13(2), 1016–1042.

    MathSciNet  MATH  Google Scholar 

  • Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52, 471–501.

    Article  MathSciNet  Google Scholar 

  • Shen, D., Shen, H., & Marron, J. (2016). A general framework for consistency of principal component analysis. Journal of Machine Learning Research, 17, 1–3.

    MathSciNet  MATH  Google Scholar 

  • Slapin, J. B., & Proksch, S.-O. (2008). A scaling model for estimating time series party positions from texts. American Journal of Political Science, 52, 705–722.

    Article  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistics Society, B 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Wedel, M., Böckenholt, U., & Kamakura, W. A. (2003). Factor models for multivariate count data. Journal of Multivariate Analysis, 87, 356–369.

    Article  MathSciNet  Google Scholar 

  • Yu, Y., Wang, T., & Samworth, R. J. (2015). A useful variant of the Davis-Kahan theorem for statisticians. Biometrika, 102, 315–323.

    Article  MathSciNet  Google Scholar 

  • Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265–286.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The first two authors, Eun Ryung Lee and Seyoung Park, equally contributed to the paper. This research was supported by a National Research Foundation of Korea grant funded by the Korea government (MSIP) (No. NRF-2019R1C1C1003805). Eun Ryung Lee is supported by a National Research Foundation of Korea grant funded by the Korean government (MSIT) (No. NRF-2019R1F1A1062795). Seyoung Park was supported by Sungkyun Research Fund, Sungkyunkwan University, 2018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eun Ryung Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, E.R., Park, S. Poisson reduced-rank models with sparse loadings. J. Korean Stat. Soc. 50, 1079–1097 (2021). https://doi.org/10.1007/s42952-021-00106-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-021-00106-8

Keywords

Navigation