Skip to main content

Advertisement

Log in

Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments

  • Published:
Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Abstract

International large-scale assessment in education aims to compare educational achievement across many countries. Differences between countries in language, culture, and education give rise to differential item functioning (DIF). For many decades, DIF has been regarded as a nuisance and a threat to validity. In this paper, we take a different stance and argue that DIF holds essential information about the differences between countries. To uncover this information, we explore the use of multivariate analysis techniques as ways to analyze DIF emphasizing visualization. PISA 2012 data are used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Orthonormal means that \(\mathbf {U^{\prime }U}=\mathbf {V^{\prime }V}=\mathbf {I}\), implying that the columns of both U and V are mutually independent or perpendicular, and the sum-of-squares for each of their columns is 1.

  2. This information is provided by the PISA technical report (see Appendix A, OECD2014)

References

  • Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91.

    Article  Google Scholar 

  • Angoff, W., & Ford, S. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95–105.

    Article  Google Scholar 

  • Bechger, T., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340.

    Article  Google Scholar 

  • Bechger, T., Hox, J., van den Wittenboer, G., & de Glopper, C. (1999). The validity of comparative educational studies. Educational Measurement: Issues and Practice, 18(3), 18–26.

    Article  Google Scholar 

  • Behrisch, M., Bach, B., Henry Riche, N., Schreck, T., & Fekete, J. (2016). Matrix reordering methods for table and network visualization. Computer Graphics Forum, 35(3), 693–716. https://doi.org/10.1111/cgf.12935.

    Article  Google Scholar 

  • Brazma, A., & Vilo, J. (2000). Gene expression data analysis. FEBS Letters, 480, 117–24.

    Article  Google Scholar 

  • Brinkhuis, M.J., Bakker, M., & Maris, G. (2015). Filtering data for detecting differential development. Journal of Educational Measurement, 52(3), 319–338.

    Article  Google Scholar 

  • Cadima, J., & Joliffe, I. (2009). On relationships between uncentred and column-centred principal component analysis. Pakistan Journal of Statistics, 25(4), 473–503.

    Google Scholar 

  • Doebler, A. (2019). Looking at dif from a new perspective: A structure-based approach acknowledging inherent indefinability. Applied Psychological Measurement, 43(4), 303–321.

    Article  Google Scholar 

  • Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika 1211–218.

  • Everitt, B., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis. Chichester: Wiley.

    Book  Google Scholar 

  • Gabriel, K. (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 456–467.

  • Glas, C.A.W., & Verhelst, N.D. (1995). Testing the Rasch model. In Fischer, G. H., & Molenaar, I W (Eds.) Rasch models: Foundations, recent developments, and applications, chap 5 (pp. 69–95). New York: Springer.

  • Golub, G.H., & Van Loan, C.F. (1996). Matrix computations, 3rd edn. Johns Hopkins University Press.

  • Greenacre, M. (2010). Biplots in practice. Bilbao: BBVA Foundation. http://www.multivariatestatistics.org.

    Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, inference and prediction. Springer Series in Statistics. New York: Springer.

    Google Scholar 

  • Holland, P., & Wainer, H. (2012). Differential item functioning. Taylor & Francis. https://books.google.nl/books?id=6YAXJfswvfYC.

  • Jolliffe, I. (2002). Principal component analysis. Springer Series in Statistics, Springer, Berlin.

  • Kolde, R. (2019). pheatmap: Pretty Heatmaps. https://CRAN.R-project.org/package=pheatmap, r package version 1.0.12.

  • Koops, J., Bechger, T., & Maris, G. (in press). Research for practical issues and solutions in computerized multistage testing (chap 19). In von Davier, A., & Duanli, Y (Eds.) (pp. 201–216). London: Routledge.

  • Lele, S., & Richtsmeier, J. (2001). An invariant approach to statistical analysis of shapes. Chapman & Hall/CRC Interdisciplinary Statistics, CRC Press.

  • Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associate.

    Google Scholar 

  • Madeira, S., & Oliveira, A. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 24–45.

  • Maris, G., Bechger, T., & San Martin, E. (2015). A Gibbs sampler for the (extended) marginal Rasch model. Psychometrika, 80(4), 859–879.

    Article  Google Scholar 

  • Maris, G., Bechger, T., Koops, J., & Partchev, I. (2019). dexter: Data management and analysis of tests. https://CRAN.R-project.org/package=dexter, r package version 1.0.0.

  • Millsap, R. (2012). Statistical approaches to measurement invariance. Routledge.

  • OECD. (2014). Pisa 2012 technical report.

  • Oshima, T., & Miller, M.D. (1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16(3), 237–248.

    Article  Google Scholar 

  • Padilha, V., & Campello, R. (2017). A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics, 18. https://doi.org/10.1186/s12859-017-1487-1.

  • R. Core Team. (2019). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/.

    Google Scholar 

  • San Martín, E., & Rolin, J. (2013). Identification of parametric Rasch-type models. Journal of Statistical Planning and Inference, 143(1), 116–130.

    Article  Google Scholar 

  • Thompson, D.R., Huntley, M.A., & Suurtamm, C. (2017). International perspectives on mathematics curriculum. IAP.

  • Travers, K.J., & Westbury, I. (1989). The IEA study of mathematics I: Analysis of mathematics curricula. Pergamon Press.

  • Verhelst, N. (2012). Profile analysis: A closer look at the PISA 2000 reading data. Scandinavian Journal of Educational Research, 56(3), 315–332. https://doi.org/10.1080/00313831.2011.583937.

    Article  Google Scholar 

  • Wang, T., Strobl, C., Zeileis, A., & Merkle, E. (2018). Score-based tests of differential item functioning via pairwise maximum likelihood estimation. Psychometrika, 83(1), 132–155.

    Article  Google Scholar 

  • Zwitser, R., Glaser, S., & Maris, G. (2017). Monitoring countries in a changing world: A new look at DIF in international surveys. Psychometrika, 82 (1), 210–232.

    Article  Google Scholar 

Download references

Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 765400.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edwin Cuellar.

Ethics declarations

Conflict of interest

The authors have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: R code

Here is some minimal R code to produce the plot in Fig. 6. It assumes that the item difficulties are in a matrix, here called x, with items in the rows and countries in the columns, and that the rows and columns are appropriately labeled.


x = scale(x, center=TRUE, scale=FALSE) s = svd(x, 4, 4) m = tcrossprod(s$u[,2:4] library(pheatmap) attr(m,'dimnames') = attr(x,'dimnames') dom = data.frame(Content = factor(substr(row.names(m),1,1))) row.names(dom) = row.names(m) pheatmap(m, scale='none', annotation_row = dom, cutree_cols = 6)

For Fig. 7, change the last line to:


pheatmap(m, scale='none', annotation_row = dom, cutree_cols = 6, clustering_method = 'average')

Appendix 2: Correspondence with the original PISA labels

ItemID

Label

ItemID

Label

ItemID

Label

PM00GQ01

SCFZ1

PM446Q02

CSFA2

PM909Q02

QPEB1

PM00KQ02

SCFC1

PM447Q01

SPEA2

PM909Q03

CPIB1

PM033Q01

SCIN1

PM462Q01D

SSES1

PM915Q01

UPEW1

PM034Q01T

SOFN1

PM464Q01T

SPFS1

PM915Q02

CPEW1

PM155Q01

CSIN2

PM474Q01

QCEC2

PM949Q01T

SOEA1

PM155Q02D

CSEN1

PM496Q01T

QPFA2

PM949Q02T

SOEA2

PM155Q03D

CSEN2

PM496Q02

QPEA2

PM949Q03

SOFA2

PM155Q04T

CSIN1

PM559Q01

QPII1

PM955Q01

UPIA5

PM192Q01T

CSFG1

PM564Q01

QPFI1

PM955Q02

UPIA3

PM273Q01T

SOEZ1

PM564Q02

UPFI1

PM955Q03

UPEA2

PM305Q01

SPEA1

PM571Q01

CSIG1

PM982Q01

UPEA1

PM406Q01

SPEA3

PM603Q01T

QSEO1

PM982Q02

UPEA3

PM406Q02

SPFA1

PM800Q01

QCEC1

PM982Q03T

UPIA2

PM408Q01T

UPIA4

PM803Q01T

UOFC1

PM982Q04

UPFA1

PM411Q01

QPEA3

PM828Q01

CSEN3

PM992Q01

SOFF2

PM411Q02

UPIA1

PM828Q02

USEN1

PM992Q02

SOFF1

PM420Q01T

UCIA3

PM828Q03

QSEN1

PM992Q03

COFF1

PM423Q01

UCIA1

PM906Q01

QSEA2

PM998Q02

CCIL1

PM442Q02

QPIA1

PM906Q02

QSEA1

PM998Q04T

CCEL1

PM446Q01

CSFA1

PM909Q01

QPIB1

  

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cuellar, E., Partchev, I., Zwitser, R. et al. Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments. Educ Asse Eval Acc 33, 9–25 (2021). https://doi.org/10.1007/s11092-021-09355-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11092-021-09355-x

Keywords

Navigation