Abstract
High-dimensional Poisson reduced-rank models have been considered for statistical inference on low-dimensional locations of the individuals based on the observations of high-dimensional count vectors. In this study, we assume sparsity on a so-called loading matrix to enhance its interpretability. The sparsity assumption leads to the use of \(L_1\) penalty, for the estimation of the loading. We provide novel computational and theoretical analyses for the corresponding penalized Poisson maximum likelihood estimation. We establish theoretical convergence rates for the parameters under weak-dependence conditions; this implies consistency even in large-dimensional problems. To implement the proposed method involving several computational issues, including nonconvex log-likelihoods, \(L_1\) penalty, and orthogonal constraints, we developed an iterative algorithm. Further, we propose a Bayesian-Information-Criteria-based penalty parameter selection, which works well in the implementation. Some numerical evidence is provided by conducting real-data-based simulation analyses and the proposed method is illustrated with the analysis of German party manifesto data.
Similar content being viewed by others
References
Ahn, S. C., & Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81, 1203–1227.
Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70, 191–221.
Bathia, N., Yao, Q., & Ziegelmann, F. (2010). Identifying the finite dimensionality of curve time series. Annals of Statistics, 38, 3352–3386.
Collins, M., Dasgupta, S., & Schapire, R. (2002). A generalization of principal component analysis to the exponential family. Adv. Neu. Info. Proces. Sys, 14, 617–624.
Freyaldenhoven, S. (2019). Identification through sparsity in factor models. Working paper.
Goodman, L. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. J. R. Statist. Soc., B 74, 537–552.
Goodman, L. A. (1981). Association models, canonical correlation in the analysis of cross-classification having ordered categories. Journal of American Statistical Association, 76, 320–334.
Gopalan, P., Hofman, J.M., & Blei, D.M. (2015). Scalable recommendation with hierarchical Poisson factorization. In Proc. 31st Conf. on Uncertainty in Artificial Intelligence, pages 326–335. AUAI Press Corvallis, Oregon.
Hallin, M., & Liska, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of American Statistical Association, 102, 603–617.
Jentsch, C., Lee, E. R., & Mammen, E. (2020+). Poisson reduced rank models with an application to political text data. Biometrika.
Jentsch, C., Lee, E. R., & Mammen, E. (2020). Time-dependent poisson reduced rank models for political text data analysis. Computational Statistics and Data Analysis, 142, 106813.
Jung, S., & Marron, J. (2009). Pca consistency in high dimension, low sample size context. Annals of Statistics, 37, 4104–4130.
Jung, Y., Huang, J. Z., & Hu, J. (2014). Biomarker detection in association studies: Modeling SNPs simultaneously via logistic ANOVA. Journal of American Statistical Association, 108, 1355–1367.
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.
Lam, C., & Yao, Q. (2012). Factor modeling for high-dimenional time series: inference for the number of factors. Annals of Statistics, 40, 694–726.
Lee, Y. K., Lee, E. R., & Park, B. U. (2012). Principal component analysis in very high-dimensional spaces. Statistica Sinica, 22, 933–956.
Lee, S., Chugh, P. E., Shen, H., Eberle, R., & Dittmer, D. (2013). Poisson factor models with applications to non-normalized microRNA profiling. Bioinformatics, 29, 1105–1111.
Lee, S., Zou, F., & Wright, F. (2014). Convergence of sample eigenvalues, eigenvectors, and principal component scores for ultra-high dimensional data. Biometrika, 101, 484–490.
Park, Zhao. (2019). Sparse principal component analysis with missing observations. Annals of Applied Statistics, 13(2), 1016–1042.
Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52, 471–501.
Shen, D., Shen, H., & Marron, J. (2016). A general framework for consistency of principal component analysis. Journal of Machine Learning Research, 17, 1–3.
Slapin, J. B., & Proksch, S.-O. (2008). A scaling model for estimating time series party positions from texts. American Journal of Political Science, 52, 705–722.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistics Society, B 58, 267–288.
Wedel, M., Böckenholt, U., & Kamakura, W. A. (2003). Factor models for multivariate count data. Journal of Multivariate Analysis, 87, 356–369.
Yu, Y., Wang, T., & Samworth, R. J. (2015). A useful variant of the Davis-Kahan theorem for statisticians. Biometrika, 102, 315–323.
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265–286.
Acknowledgements
The first two authors, Eun Ryung Lee and Seyoung Park, equally contributed to the paper. This research was supported by a National Research Foundation of Korea grant funded by the Korea government (MSIP) (No. NRF-2019R1C1C1003805). Eun Ryung Lee is supported by a National Research Foundation of Korea grant funded by the Korean government (MSIT) (No. NRF-2019R1F1A1062795). Seyoung Park was supported by Sungkyun Research Fund, Sungkyunkwan University, 2018.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, E.R., Park, S. Poisson reduced-rank models with sparse loadings. J. Korean Stat. Soc. 50, 1079–1097 (2021). https://doi.org/10.1007/s42952-021-00106-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-021-00106-8