Abstract
International large-scale assessment in education aims to compare educational achievement across many countries. Differences between countries in language, culture, and education give rise to differential item functioning (DIF). For many decades, DIF has been regarded as a nuisance and a threat to validity. In this paper, we take a different stance and argue that DIF holds essential information about the differences between countries. To uncover this information, we explore the use of multivariate analysis techniques as ways to analyze DIF emphasizing visualization. PISA 2012 data are used for illustration.
Similar content being viewed by others
Notes
Orthonormal means that \(\mathbf {U^{\prime }U}=\mathbf {V^{\prime }V}=\mathbf {I}\), implying that the columns of both U and V are mutually independent or perpendicular, and the sum-of-squares for each of their columns is 1.
References
Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91.
Angoff, W., & Ford, S. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95–105.
Bechger, T., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340.
Bechger, T., Hox, J., van den Wittenboer, G., & de Glopper, C. (1999). The validity of comparative educational studies. Educational Measurement: Issues and Practice, 18(3), 18–26.
Behrisch, M., Bach, B., Henry Riche, N., Schreck, T., & Fekete, J. (2016). Matrix reordering methods for table and network visualization. Computer Graphics Forum, 35(3), 693–716. https://doi.org/10.1111/cgf.12935.
Brazma, A., & Vilo, J. (2000). Gene expression data analysis. FEBS Letters, 480, 117–24.
Brinkhuis, M.J., Bakker, M., & Maris, G. (2015). Filtering data for detecting differential development. Journal of Educational Measurement, 52(3), 319–338.
Cadima, J., & Joliffe, I. (2009). On relationships between uncentred and column-centred principal component analysis. Pakistan Journal of Statistics, 25(4), 473–503.
Doebler, A. (2019). Looking at dif from a new perspective: A structure-based approach acknowledging inherent indefinability. Applied Psychological Measurement, 43(4), 303–321.
Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika 1211–218.
Everitt, B., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis. Chichester: Wiley.
Gabriel, K. (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 456–467.
Glas, C.A.W., & Verhelst, N.D. (1995). Testing the Rasch model. In Fischer, G. H., & Molenaar, I W (Eds.) Rasch models: Foundations, recent developments, and applications, chap 5 (pp. 69–95). New York: Springer.
Golub, G.H., & Van Loan, C.F. (1996). Matrix computations, 3rd edn. Johns Hopkins University Press.
Greenacre, M. (2010). Biplots in practice. Bilbao: BBVA Foundation. http://www.multivariatestatistics.org.
Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, inference and prediction. Springer Series in Statistics. New York: Springer.
Holland, P., & Wainer, H. (2012). Differential item functioning. Taylor & Francis. https://books.google.nl/books?id=6YAXJfswvfYC.
Jolliffe, I. (2002). Principal component analysis. Springer Series in Statistics, Springer, Berlin.
Kolde, R. (2019). pheatmap: Pretty Heatmaps. https://CRAN.R-project.org/package=pheatmap, r package version 1.0.12.
Koops, J., Bechger, T., & Maris, G. (in press). Research for practical issues and solutions in computerized multistage testing (chap 19). In von Davier, A., & Duanli, Y (Eds.) (pp. 201–216). London: Routledge.
Lele, S., & Richtsmeier, J. (2001). An invariant approach to statistical analysis of shapes. Chapman & Hall/CRC Interdisciplinary Statistics, CRC Press.
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associate.
Madeira, S., & Oliveira, A. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 24–45.
Maris, G., Bechger, T., & San Martin, E. (2015). A Gibbs sampler for the (extended) marginal Rasch model. Psychometrika, 80(4), 859–879.
Maris, G., Bechger, T., Koops, J., & Partchev, I. (2019). dexter: Data management and analysis of tests. https://CRAN.R-project.org/package=dexter, r package version 1.0.0.
Millsap, R. (2012). Statistical approaches to measurement invariance. Routledge.
OECD. (2014). Pisa 2012 technical report.
Oshima, T., & Miller, M.D. (1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16(3), 237–248.
Padilha, V., & Campello, R. (2017). A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics, 18. https://doi.org/10.1186/s12859-017-1487-1.
R. Core Team. (2019). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/.
San Martín, E., & Rolin, J. (2013). Identification of parametric Rasch-type models. Journal of Statistical Planning and Inference, 143(1), 116–130.
Thompson, D.R., Huntley, M.A., & Suurtamm, C. (2017). International perspectives on mathematics curriculum. IAP.
Travers, K.J., & Westbury, I. (1989). The IEA study of mathematics I: Analysis of mathematics curricula. Pergamon Press.
Verhelst, N. (2012). Profile analysis: A closer look at the PISA 2000 reading data. Scandinavian Journal of Educational Research, 56(3), 315–332. https://doi.org/10.1080/00313831.2011.583937.
Wang, T., Strobl, C., Zeileis, A., & Merkle, E. (2018). Score-based tests of differential item functioning via pairwise maximum likelihood estimation. Psychometrika, 83(1), 132–155.
Zwitser, R., Glaser, S., & Maris, G. (2017). Monitoring countries in a changing world: A new look at DIF in international surveys. Psychometrika, 82 (1), 210–232.
Funding
This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 765400.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: R code
Here is some minimal R code to produce the plot in Fig. 6. It assumes that the item difficulties are in a matrix, here called x, with items in the rows and countries in the columns, and that the rows and columns are appropriately labeled.
x = scale(x, center=TRUE, scale=FALSE) s = svd(x, 4, 4) m = tcrossprod(s$u[,2:4] library(pheatmap) attr(m,'dimnames') = attr(x,'dimnames') dom = data.frame(Content = factor(substr(row.names(m),1,1))) row.names(dom) = row.names(m) pheatmap(m, scale='none', annotation_row = dom, cutree_cols = 6)
For Fig. 7, change the last line to:
pheatmap(m, scale='none', annotation_row = dom, cutree_cols = 6, clustering_method = 'average')
Appendix 2: Correspondence with the original PISA labels
ItemID | Label | ItemID | Label | ItemID | Label |
---|---|---|---|---|---|
PM00GQ01 | SCFZ1 | PM446Q02 | CSFA2 | PM909Q02 | QPEB1 |
PM00KQ02 | SCFC1 | PM447Q01 | SPEA2 | PM909Q03 | CPIB1 |
PM033Q01 | SCIN1 | PM462Q01D | SSES1 | PM915Q01 | UPEW1 |
PM034Q01T | SOFN1 | PM464Q01T | SPFS1 | PM915Q02 | CPEW1 |
PM155Q01 | CSIN2 | PM474Q01 | QCEC2 | PM949Q01T | SOEA1 |
PM155Q02D | CSEN1 | PM496Q01T | QPFA2 | PM949Q02T | SOEA2 |
PM155Q03D | CSEN2 | PM496Q02 | QPEA2 | PM949Q03 | SOFA2 |
PM155Q04T | CSIN1 | PM559Q01 | QPII1 | PM955Q01 | UPIA5 |
PM192Q01T | CSFG1 | PM564Q01 | QPFI1 | PM955Q02 | UPIA3 |
PM273Q01T | SOEZ1 | PM564Q02 | UPFI1 | PM955Q03 | UPEA2 |
PM305Q01 | SPEA1 | PM571Q01 | CSIG1 | PM982Q01 | UPEA1 |
PM406Q01 | SPEA3 | PM603Q01T | QSEO1 | PM982Q02 | UPEA3 |
PM406Q02 | SPFA1 | PM800Q01 | QCEC1 | PM982Q03T | UPIA2 |
PM408Q01T | UPIA4 | PM803Q01T | UOFC1 | PM982Q04 | UPFA1 |
PM411Q01 | QPEA3 | PM828Q01 | CSEN3 | PM992Q01 | SOFF2 |
PM411Q02 | UPIA1 | PM828Q02 | USEN1 | PM992Q02 | SOFF1 |
PM420Q01T | UCIA3 | PM828Q03 | QSEN1 | PM992Q03 | COFF1 |
PM423Q01 | UCIA1 | PM906Q01 | QSEA2 | PM998Q02 | CCIL1 |
PM442Q02 | QPIA1 | PM906Q02 | QSEA1 | PM998Q04T | CCEL1 |
PM446Q01 | CSFA1 | PM909Q01 | QPIB1 |
Rights and permissions
About this article
Cite this article
Cuellar, E., Partchev, I., Zwitser, R. et al. Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments. Educ Asse Eval Acc 33, 9–25 (2021). https://doi.org/10.1007/s11092-021-09355-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11092-021-09355-x