Abstract
This paper considers a Bayesian approach for selecting the number of factors in a factor analysis model with continuous and polvtomous variables. A procedure for computing the important statistic in model selection, namely the Bayes factor, is developed via path sampling. The main computation effort is on simulating observations from the appropriate posterior distribution. This task is done by a hybrid algorithm which combines the Gibbs sampler and the Metropolis-Hastings algorithm. Bayesian estimates of thresholds, factor loadings, unique variances, and latent factor scores as well as their standard errors can be produced as by-products. The empirical performance of the proposed procedure is illustrated by means of a simulation study and a real example.
Similar content being viewed by others
References
Akaike, H. (1970). Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22, 203–217.
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.
Aitkin, M., Anderson, D., & Hinde, J. (1981). Statistical modelling of data on teaching styles (with discussion). Journal of the Royal Statistical Society. Senes A, 144, 419–461.
Arminger, G. & Muthén, B. O. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs sampler and the Metropolis-Hastings algorithm. Psychometrika, 63, 271–300.
Berger, J. O. (1985). Statistical decision theory and Bayesian analysis. New York: Springer-Verlag.
Carlin, B. & Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo. Journal of the Royal Statistical Society, Series B, 57, 473–484.
Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90, 1313–1321.
Cowles, M. K. (1996). Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Statistics and Computing, 6, 101–111.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Ser. B, 39, 1–38.
DiCiccio, T. J., Kass, R. E., Raftery, A., & Wasserman, L. (1997). Computing Bayes factors by combining simulation and asymptotic approximations. Journal of the American Statistical Association, 92, 903–915.
Gelfand, A. E. & Dey, D. K. (1994). Bayesian model choice: asymptotic and exact calculations. Journal of the Royal Statistical Society. Series B, 56, 501–514.
Gelman, A. & Meng, X. L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13, 163–185.
Gelman, A., Meng, X. L., & Stern, H.(1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733–807.
Geman, S. & Genian, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
George, E. I. & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88, 881–889.
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov Chains and their application. Biometrica, 57, 97–109.
Kass, R. E. & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
Lawiey, D. N. & Maxwell, A. E. (1971). Factor analysis as a statistical method (2nd ed.). London: Butterworths.
Lee, S. Y. (1981). A Bayesian approach to confirmatory factor analysis. Psychometrika, 46, 153–160.
Lee, S. Y., Poon, W. Y., & Bentler, P. M. (1995). A two-stage estimation of structural equation models with continuous and polytomous variables. British Journal of mathematical and Statistical Psychology, 48, 339–358.
Lee, S. Y. & Shi, J. Q. (2000). Joint Bayesian analysis of factor scores and structural parameters in the factor analysis model. Annals of the Institute of Statistical Mathematics, 4, 722–736.
Lee, S. Y. & Zhu, H. T. (2000). Statistical analysis of nonlinear structural equation models with continuous and polvtomous data. British Journal of Mathematical and Statistical Psychology, 53, 209–232.
Lindsay, B. G. & Basak, P. (1993). Multivariate normal mixtures: A fast consistent method of moments. Journal of the American Statistical Association, 88, 408–476.
Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computation with applications to a gene regulation problem. Journal of the American Statistical Association, 89, 958–966.
Martin, J. K. & McDonald, R. P. (1975). Bayesian estimation in unrestricted factor analysis: a treatment for Heyword cases. Psychometrika, 40, 505–577.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equations of state calculations by fast computing machine. Journal of Chemical Physics, 21, 1087–1091.
McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistics for the number of components in a normal mixture. Applied Statistics, 36, 318–324.
McLachlan, G. J. & Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker.
Meng, X. L. (1994). Posterior predictive p-values. The Annals of Statistics, 22, 1142–1160.
Meng, X. L. & Wong, H. W. (1996). Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica, 6, 831–360.
NewTon, M. A. & Raftery, A. E. (1994). Approximate Bayesian inference by the weighted likelihood bootstrap (with discussion). Journal of the Royal Statistical Society, Series B, 56, 3–48.
Ogata, Y. (1989). A Monte Carlo method for high dimensional integration. Numerical Mathematics, 55, 137–157.
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 163–180). Beverly Hills, CA: Sage.
Roboussin, B. A. & Liang, K. Y. (1998). An estimating equations approach for the LISCOMP model. Psychometrika, 63, 165–182.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statist., 6, 461–464.
Shi, J. Q. & Lee, S. Y. (1998). Bayesian sampling-based approach for factor analysis model with continuous and polvtomous data. British Journal of Mathematical and Statistical Psychology, 51, 233–252.
Shi, J. Q. & Lee, S. Y. (2000). Latent variable models with mixed continuous and polytomous data. Journal of the Royal Statistical Society, Series B, 62, 77–87.
Tanner, M. A. & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528–550.
Wei, G. C. G. & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. Journal of the American Statistical Association, 85, 699–704.
WORLD VALUES SURVEY, 1981–1984 AND 1990–1993. (1994). ICPSR version. Ann Arbor, MI: Institute for Social Research [producer], 1994. Ann Arbon, MI: Interuniversitv Consortium for Political and Social Research [distribution], 1994.
Zhu, H. T. & Lee, S. Y. (2001). A Bayesian analysis of finite mixtures in the LISREL model. Psychometrika, 66, 133–152.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research is fully supported by a grant (CUHK 4346/01H) from the Research Grant Council of the Hong Kong Special Administrative Region. We are indebted to the Editor for valuable comments.
About this article
Cite this article
Lee, SY., Song, XY. Bayesian Selection on the Number of Factors in a Factor Analysis Model. Behaviormetrika 29, 23–39 (2002). https://doi.org/10.2333/bhmk.29.23
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2333/bhmk.29.23