Abstract
The present paper proposes a hierarchical, multi-unidimensional two-parameter logistic item response theory (2PL-MUIRT) model extended for a large number of groups. The proposed model was motivated by a large-scale integrative data analysis (IDA) study which combined data (N = 24,336) from 24 independent alcohol intervention studies. IDA projects face unique challenges that are different from those encountered in individual studies, such as the need to establish a common scoring metric across studies and to handle missingness in the pooled data. To address these challenges, we developed a Markov chain Monte Carlo (MCMC) algorithm for a hierarchical 2PL-MUIRT model for multiple groups in which not only were the item parameters and latent traits estimated, but the means and covariance structures for multiple dimensions were also estimated across different groups. Compared to a few existing MCMC algorithms for multidimensional IRT models that constrain the item parameters to facilitate estimation of the covariance matrix, we adapted an MCMC algorithm so that we could directly estimate the correlation matrix for the anchor group without any constraints on the item parameters. The feasibility of the MCMC algorithm and the validity of the basic calibration procedure were examined using a simulation study. Results showed that model parameters could be adequately recovered, and estimated latent trait scores closely approximated true latent trait scores. The algorithm was then applied to analyze real data (69 items across 20 studies for 22,608 participants). The posterior predictive model check showed that the model fit all items well, and the correlations between the MCMC scores and original scores were overall quite high. An additional simulation study demonstrated robustness of the MCMC procedures in the context of the high proportion of missingness in data. The Bayesian hierarchical IRT model using the MCMC algorithms developed in the current study has the potential to be widely implemented for IDA studies or multi-site studies, and can be further refined to meet more complicated needs in applied research.
Similar content being viewed by others
Notes
The multiple shorter chains were used instead of one single longer chain. Once the chains converged, the magnitudes of the auto-correlation did not affect the estimates. Therefore, it is not necessary to compute the auto-correlation.
Each of the 20 studies administered a subset of the 66 items (16 items in the study with the least items and 52 in the study with the most).
There are several computer programs available, such as mlirt and WinBUGS. However, those programs are not specifically designed for dealing with the problems we have. For example, the mlirt program is more appropriate for analyzing the within and between variability in the multilevel IRT models. The WinBUGS program can be used for a variety of the Bayesian IRT models, but it did not meet our need. The MCMC algorithms we programmed gave us full control on every aspect of estimation (e.g., determining the candidate variances for greater convergence efficiency). This allowed us to tailor our program to meet specific needs in solving problems in our work.
Latent trait scores can be estimated simultaneously along with other structural model parameters. However, we decided to split the entire MCMC procedure into two stages: calibration and scoring for the purpose of computational efficiency. Because three studies had relatively larger sample sizes than other studies (more than half the total sample across these three studies), it required much longer computing time when all the observations were utilized in one combined stage. Using only 10 % of the sample from these three studies and all participants from the rest of the studies in the first stage was computationally more efficient, especially because we fine-tuned the algorithms several times along the way. As such, we needed a second step to score all the respondents using the same MCMC procedure. Thus, once the calibration using the subsample at baseline was completed, we used the structural parameter estimates obtained in the calibration stage to derive latent trait scores for all participants not only at baseline but also at all subsequent follow-ups.
The amount of bias can be affected by group sizes and the magnitudes of the parameters. In our study, groups with relatively small sample sizes were more susceptible to this problem given that considerable missingness existed in our data. The five largest biases were observed in three small studies.
References
Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.
Bauer, D. J., & Hussong, A. M. (2009). Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models. Psychological Methods, 14, 101–125.
Béguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 4, 541–562.
Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markovchain Monte Carlo. Applied Psychological Measurement, 27, 395–414.
Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for Windows [Computer software]. Lincolnwood, IL: Scientific Software International.
Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167–174.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. http://www.jstatsoft.org/v48/i06/.
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49, 327–335.
Curran, P. J., & Hussong, A. M. (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14, 81–100.
Curran, P. J., Hussong, A. M., Cai, L., Huang, W., Chassin, L., Sher, K. J., et al. (2008). Pooling data from multiple longitudinal studies: The role of item response theory in integrative data analysis. Developmental Psychology, 44, 365–380.
de la Torre, J. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33, 465–485.
de la Torre, J., & Hong, Y. (2009). Parameter estimation with small sample size: A higher-order IRT model approach. Applied Psychological Measurement, 34, 267–285.
de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33, 620–639.
de la Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30, 295–311.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1–38.
Dimeff, L. A., Baer, J. S., Kivlahan, D. R., & Marlatt, G. A. (1999). Brief alcohol screening and intervention for college students: A harm reduction approach. New York, NY: Guilford Press.
Doornik, J. A. (2009). Object-oriented matrix programming using Ox (Version 3.1) [Computer software]. London: Timberlake Consultants Press.
Fox, J.-P., & Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271–288.
Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45, 225–245.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC.
Gill, J. (2002). Bayesian methods: A social and behavioral sciences approach (1st ed.). Boca Raton, FL: Chapman & Hall/CRC.
Hartig, J., & Höhler, J. (2009). Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, 35, 57–63.
Hurlbut, S. C., & Sher, K. J. (1992). Assessing alcohol problems in college students. Journal of American College Health, 41(2), 49–58. doi:10.1080/07448481.1992.10392818.
Kahler, C. W., Strong, D. R., & Read, J. P. (2005). Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire. Alcoholism: Clinical and Experimental Research, 29(7), 1180–1189. doi:10.1097/01.alc.0000171940.95813.a5.
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton Mifflin.
Liu, X. (2008). Parameter expansion for sampling a correlation matrix: An efficient GPX-RPMH algorithm. Journal of Statistical Computation and Simulation, 78, 1065–1076.
Liu, X., & Daniels, M. J. (2006). A new efficient algorithm for sampling a correlation matrix based on parameter expansion and re-parameterization. Journal of Computational and Graphical Statistics, 15, 897–914.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
McArdle, J. J., Grimm, K., Hamagami, F., Bowles, R., & Meredith, W. (2009). Modeling life-span growth curves of cognition using longitudinal data with multiple samples and changing scales of measurement. Psychological Methods, 14, 126–149.
McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 257–269). New York: Springer.
Meng, X. L. (1994). Posterior predictive p-values. The Annals of Statistics, 22, 1142–1160.
Millsap, R., & Maydeu-Olivares, A. (2009). Handbook of quantitative methods in psychology. London, UK: Sage.
Mislevy, R. (1991). Randomization-based inferences about latent variables from complex samples. Psychometrika, 56, 177–196.
Mun, E. Y., White, H. R., de la Torre, J., Atkins, D. C., Larimer, M., Jiao, Y., et al. (2011). Overview of integrative analysis of brief alcohol interventions for college students. Alcoholism: Clinical and Experimental Research, 35, 147.
Oshima, T. C., Raju, N. S., & Flowers, C. P. (1997). Development and demonstration of multidimensional IRT-based internal measures of differential functioning of items and tests. Journal of Educational Measurement, 34, 253–272.
Reckase, M. D. (1996). A linear logistic multidimensional model. In W. J. van der Linder & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271–286). New York, NY: Springer.
Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
Rubin, D. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
Saunders, J. B., Aasland, O. G., Babor, T. F., & Grant, M. (1993). Development of the alcohol use disorders identification test (AUDIT): WHO Collaborative Project on early detection of persons with harmful alcohol consumption-II. Addiction, 88(6), 791–804. doi:10.1111/j.1360-0443.1993.tb02093.x.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman & Hall/CRC.
Sheng, Y., & Wikle, C. K. (2007). Comparing unidimensional and multi-unidimensional IRT models. Educational and Psychological Measurement, 67, 899–919.
Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement, 68, 413–430.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321.
Skinner, H. A., & Allen, B. A. (1982). Alcohol dependence syndrome: Measurement and validation. Journal of Abnormal Psychology, 91(3), 199–209.
Skinner, H. A., & Horn, J. L. (1984). Alcohol dependence scale: Users guide. Toronto: Addiction Research Foundation.
Thomas, N. (2002). The role of secondary covariates when estimating latent trait population distributions. Psychometrika, 67, 33–48.
Van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York, NY: Springer.
Wang, W., Wilson, M., & Adams. R. J. (1995). Item response modeling for multidimensional between-items and multidimensional within-items. Paper presented at the International Objective Measurement Conference. Berkeley, CA.
White, H. R., & Labouvie, E. W. (1989). Towards the assessment of adolescent problem drinking. Journal of Studies on Alcohol, 50(1), 30–37.
Zeger, L. M., & Thomas, N. (1997). Efficient matrix sampling instruments for correlated latent traits: Examples from the National Assessment of Education Progress. Journal of the American Statistical Association, 92, 416–425.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BIOLOG-MG 3 [Computer Software]. Lincolnwood, IL: Scientific Software International Inc.
Acknowledgments
We would like to thank the following investigators who generously contributed their data to Project INTEGRATE: John S. Baer, Department of Psychology, The University of Washington, and Veterans’ Affairs Puget Sound Health Care System; Nancy P. Barnett, Center for Alcohol and Addiction Studies, Brown University; M. Dolores Cimini, University Counseling Center, The University at Albany, State University of New York; William R. Corbin, Department of Psychology, Arizona State University; Kim Fromme, Department of Psychology, The University of Texas, Austin; Joseph W. LaBrie, Department of Psychology, Loyola Marymount University; Mary E. Larimer, Department of Psychiatry and Behavioral Sciences, The University of Washington; Matthew P. Martens, Department of Educational, School, and Counseling Psychology, The University of Missouri; James G. Murphy, Department of Psychology, The University of Memphis; Scott T. Walters, Department of Behavioral and Community Health, The University of North Texas Health Science Center; Helene R. White, Center of Alcohol Studies, Rutgers, The State University of New Jersey; and Mark D. Wood, Department of Psychology, The University of Rhode Island. The project described was supported by Award Number R01 AA019511 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIAAA or the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huo, Y., de la Torre, J., Mun, EY. et al. A Hierarchical Multi-Unidimensional IRT Approach for Analyzing Sparse, Multi-Group Data for Integrative Data Analysis. Psychometrika 80, 834–855 (2015). https://doi.org/10.1007/s11336-014-9420-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-014-9420-2