Skip to main content
Log in

A Hierarchical Multi-Unidimensional IRT Approach for Analyzing Sparse, Multi-Group Data for Integrative Data Analysis

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The present paper proposes a hierarchical, multi-unidimensional two-parameter logistic item response theory (2PL-MUIRT) model extended for a large number of groups. The proposed model was motivated by a large-scale integrative data analysis (IDA) study which combined data (N = 24,336) from 24 independent alcohol intervention studies. IDA projects face unique challenges that are different from those encountered in individual studies, such as the need to establish a common scoring metric across studies and to handle missingness in the pooled data. To address these challenges, we developed a Markov chain Monte Carlo (MCMC) algorithm for a hierarchical 2PL-MUIRT model for multiple groups in which not only were the item parameters and latent traits estimated, but the means and covariance structures for multiple dimensions were also estimated across different groups. Compared to a few existing MCMC algorithms for multidimensional IRT models that constrain the item parameters to facilitate estimation of the covariance matrix, we adapted an MCMC algorithm so that we could directly estimate the correlation matrix for the anchor group without any constraints on the item parameters. The feasibility of the MCMC algorithm and the validity of the basic calibration procedure were examined using a simulation study. Results showed that model parameters could be adequately recovered, and estimated latent trait scores closely approximated true latent trait scores. The algorithm was then applied to analyze real data (69 items across 20 studies for 22,608 participants). The posterior predictive model check showed that the model fit all items well, and the correlations between the MCMC scores and original scores were overall quite high. An additional simulation study demonstrated robustness of the MCMC procedures in the context of the high proportion of missingness in data. The Bayesian hierarchical IRT model using the MCMC algorithms developed in the current study has the potential to be widely implemented for IDA studies or multi-site studies, and can be further refined to meet more complicated needs in applied research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The multiple shorter chains were used instead of one single longer chain. Once the chains converged, the magnitudes of the auto-correlation did not affect the estimates. Therefore, it is not necessary to compute the auto-correlation.

  2. Each of the 20 studies administered a subset of the 66 items (16 items in the study with the least items and 52 in the study with the most).

  3. There are several computer programs available, such as mlirt and WinBUGS. However, those programs are not specifically designed for dealing with the problems we have. For example, the mlirt program is more appropriate for analyzing the within and between variability in the multilevel IRT models. The WinBUGS program can be used for a variety of the Bayesian IRT models, but it did not meet our need. The MCMC algorithms we programmed gave us full control on every aspect of estimation (e.g., determining the candidate variances for greater convergence efficiency). This allowed us to tailor our program to meet specific needs in solving problems in our work.

  4. Latent trait scores can be estimated simultaneously along with other structural model parameters. However, we decided to split the entire MCMC procedure into two stages: calibration and scoring for the purpose of computational efficiency. Because three studies had relatively larger sample sizes than other studies (more than half the total sample across these three studies), it required much longer computing time when all the observations were utilized in one combined stage. Using only 10 % of the sample from these three studies and all participants from the rest of the studies in the first stage was computationally more efficient, especially because we fine-tuned the algorithms several times along the way. As such, we needed a second step to score all the respondents using the same MCMC procedure. Thus, once the calibration using the subsample at baseline was completed, we used the structural parameter estimates obtained in the calibration stage to derive latent trait scores for all participants not only at baseline but also at all subsequent follow-ups.

  5. The amount of bias can be affected by group sizes and the magnitudes of the parameters. In our study, groups with relatively small sample sizes were more susceptible to this problem given that considerable missingness existed in our data. The five largest biases were observed in three small studies.

References

  • Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.

    Article  Google Scholar 

  • Bauer, D. J., & Hussong, A. M. (2009). Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models. Psychological Methods, 14, 101–125.

  • Béguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 4, 541–562.

    Article  Google Scholar 

  • Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markovchain Monte Carlo. Applied Psychological Measurement, 27, 395–414.

  • Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for Windows [Computer software]. Lincolnwood, IL: Scientific Software International.

    Google Scholar 

  • Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167–174.

    Google Scholar 

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. http://www.jstatsoft.org/v48/i06/.

  • Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49, 327–335.

    Google Scholar 

  • Curran, P. J., & Hussong, A. M. (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14, 81–100.

    Article  PubMed Central  PubMed  Google Scholar 

  • Curran, P. J., Hussong, A. M., Cai, L., Huang, W., Chassin, L., Sher, K. J., et al. (2008). Pooling data from multiple longitudinal studies: The role of item response theory in integrative data analysis. Developmental Psychology, 44, 365–380.

    Article  PubMed Central  PubMed  Google Scholar 

  • de la Torre, J. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33, 465–485.

  • de la Torre, J., & Hong, Y. (2009). Parameter estimation with small sample size: A higher-order IRT model approach. Applied Psychological Measurement, 34, 267–285.

    Article  Google Scholar 

  • de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33, 620–639.

    Article  Google Scholar 

  • de la Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30, 295–311.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1–38.

    Google Scholar 

  • Dimeff, L. A., Baer, J. S., Kivlahan, D. R., & Marlatt, G. A. (1999). Brief alcohol screening and intervention for college students: A harm reduction approach. New York, NY: Guilford Press.

    Google Scholar 

  • Doornik, J. A. (2009). Object-oriented matrix programming using Ox (Version 3.1) [Computer software]. London: Timberlake Consultants Press.

    Google Scholar 

  • Fox, J.-P., & Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271–288.

    Article  Google Scholar 

  • Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45, 225–245.

    Article  Google Scholar 

  • Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472.

    Article  Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC.

    Google Scholar 

  • Gill, J. (2002). Bayesian methods: A social and behavioral sciences approach (1st ed.). Boca Raton, FL: Chapman & Hall/CRC.

    Google Scholar 

  • Hartig, J., & Höhler, J. (2009). Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, 35, 57–63.

    Article  Google Scholar 

  • Hurlbut, S. C., & Sher, K. J. (1992). Assessing alcohol problems in college students. Journal of American College Health, 41(2), 49–58. doi:10.1080/07448481.1992.10392818.

    Article  PubMed  Google Scholar 

  • Kahler, C. W., Strong, D. R., & Read, J. P. (2005). Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire. Alcoholism: Clinical and Experimental Research, 29(7), 1180–1189. doi:10.1097/01.alc.0000171940.95813.a5.

    Article  Google Scholar 

  • Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton Mifflin.

    Google Scholar 

  • Liu, X. (2008). Parameter expansion for sampling a correlation matrix: An efficient GPX-RPMH algorithm. Journal of Statistical Computation and Simulation, 78, 1065–1076.

    Article  Google Scholar 

  • Liu, X., & Daniels, M. J. (2006). A new efficient algorithm for sampling a correlation matrix based on parameter expansion and re-parameterization. Journal of Computational and Graphical Statistics, 15, 897–914.

    Article  Google Scholar 

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    Google Scholar 

  • McArdle, J. J., Grimm, K., Hamagami, F., Bowles, R., & Meredith, W. (2009). Modeling life-span growth curves of cognition using longitudinal data with multiple samples and changing scales of measurement. Psychological Methods, 14, 126–149.

    Article  PubMed Central  PubMed  Google Scholar 

  • McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 257–269). New York: Springer.

  • Meng, X. L. (1994). Posterior predictive p-values. The Annals of Statistics, 22, 1142–1160.

    Article  Google Scholar 

  • Millsap, R., & Maydeu-Olivares, A. (2009). Handbook of quantitative methods in psychology. London, UK: Sage.

    Google Scholar 

  • Mislevy, R. (1991). Randomization-based inferences about latent variables from complex samples. Psychometrika, 56, 177–196.

    Article  Google Scholar 

  • Mun, E. Y., White, H. R., de la Torre, J., Atkins, D. C., Larimer, M., Jiao, Y., et al. (2011). Overview of integrative analysis of brief alcohol interventions for college students. Alcoholism: Clinical and Experimental Research, 35, 147.

    Google Scholar 

  • Oshima, T. C., Raju, N. S., & Flowers, C. P. (1997). Development and demonstration of multidimensional IRT-based internal measures of differential functioning of items and tests. Journal of Educational Measurement, 34, 253–272.

    Article  Google Scholar 

  • Reckase, M. D. (1996). A linear logistic multidimensional model. In W. J. van der Linder & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271–286). New York, NY: Springer.

    Google Scholar 

  • Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.

    Book  Google Scholar 

  • Rubin, D. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

    Book  Google Scholar 

  • Saunders, J. B., Aasland, O. G., Babor, T. F., & Grant, M. (1993). Development of the alcohol use disorders identification test (AUDIT): WHO Collaborative Project on early detection of persons with harmful alcohol consumption-II. Addiction, 88(6), 791–804. doi:10.1111/j.1360-0443.1993.tb02093.x.

    Article  PubMed  Google Scholar 

  • Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Sheng, Y., & Wikle, C. K. (2007). Comparing unidimensional and multi-unidimensional IRT models. Educational and Psychological Measurement, 67, 899–919.

    Article  Google Scholar 

  • Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement, 68, 413–430.

    Article  Google Scholar 

  • Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321.

    Article  Google Scholar 

  • Skinner, H. A., & Allen, B. A. (1982). Alcohol dependence syndrome: Measurement and validation. Journal of Abnormal Psychology, 91(3), 199–209.

    Article  PubMed  Google Scholar 

  • Skinner, H. A., & Horn, J. L. (1984). Alcohol dependence scale: Users guide. Toronto: Addiction Research Foundation.

    Google Scholar 

  • Thomas, N. (2002). The role of secondary covariates when estimating latent trait population distributions. Psychometrika, 67, 33–48.

    Article  Google Scholar 

  • Van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York, NY: Springer.

    Book  Google Scholar 

  • Wang, W., Wilson, M., & Adams. R. J. (1995). Item response modeling for multidimensional between-items and multidimensional within-items. Paper presented at the International Objective Measurement Conference. Berkeley, CA.

  • White, H. R., & Labouvie, E. W. (1989). Towards the assessment of adolescent problem drinking. Journal of Studies on Alcohol, 50(1), 30–37.

    Article  PubMed  Google Scholar 

  • Zeger, L. M., & Thomas, N. (1997). Efficient matrix sampling instruments for correlated latent traits: Examples from the National Assessment of Education Progress. Journal of the American Statistical Association, 92, 416–425.

    Article  Google Scholar 

  • Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BIOLOG-MG 3 [Computer Software]. Lincolnwood, IL: Scientific Software International Inc.

    Google Scholar 

Download references

Acknowledgments

We would like to thank the following investigators who generously contributed their data to Project INTEGRATE: John S. Baer, Department of Psychology, The University of Washington, and Veterans’ Affairs Puget Sound Health Care System; Nancy P. Barnett, Center for Alcohol and Addiction Studies, Brown University; M. Dolores Cimini, University Counseling Center, The University at Albany, State University of New York; William R. Corbin, Department of Psychology, Arizona State University; Kim Fromme, Department of Psychology, The University of Texas, Austin; Joseph W. LaBrie, Department of Psychology, Loyola Marymount University; Mary E. Larimer, Department of Psychiatry and Behavioral Sciences, The University of Washington; Matthew P. Martens, Department of Educational, School, and Counseling Psychology, The University of Missouri; James G. Murphy, Department of Psychology, The University of Memphis; Scott T. Walters, Department of Behavioral and Community Health, The University of North Texas Health Science Center; Helene R. White, Center of Alcohol Studies, Rutgers, The State University of New Jersey; and Mark D. Wood, Department of Psychology, The University of Rhode Island. The project described was supported by Award Number R01 AA019511 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIAAA or the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Huo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huo, Y., de la Torre, J., Mun, EY. et al. A Hierarchical Multi-Unidimensional IRT Approach for Analyzing Sparse, Multi-Group Data for Integrative Data Analysis. Psychometrika 80, 834–855 (2015). https://doi.org/10.1007/s11336-014-9420-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-014-9420-2

Keywords

Navigation