Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments

Cuellar, Edwin; Partchev, Ivailo; Zwitser, Robert; Bechger, Timo

doi:10.1007/s11092-021-09355-x

Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments

Published: 08 February 2021

Volume 33, pages 9–25, (2021)
Cite this article

Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Edwin Cuellar ORCID: orcid.org/0000-0002-6486-7709¹,
Ivailo Partchev¹,
Robert Zwitser² &
…
Timo Bechger³

529 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

International large-scale assessment in education aims to compare educational achievement across many countries. Differences between countries in language, culture, and education give rise to differential item functioning (DIF). For many decades, DIF has been regarded as a nuisance and a threat to validity. In this paper, we take a different stance and argue that DIF holds essential information about the differences between countries. To uncover this information, we explore the use of multivariate analysis techniques as ways to analyze DIF emphasizing visualization. PISA 2012 data are used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

Parental Educational Expectations and Academic Achievement in Children and Adolescents—a Meta-analysis

Article 24 October 2019

Mental health and academic performance: a study on selection and causation effects from childhood to early adulthood

Article Open access 19 August 2020

Notes

Orthonormal means that $\mathbf {U^{\prime }U}=\mathbf {V^{\prime }V}=\mathbf {I}$, implying that the columns of both U and V are mutually independent or perpendicular, and the sum-of-squares for each of their columns is 1.
This information is provided by the PISA technical report (see Appendix A, OECD2014)

References

Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91.
Article Google Scholar
Angoff, W., & Ford, S. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95–105.
Article Google Scholar
Bechger, T., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340.
Article Google Scholar
Bechger, T., Hox, J., van den Wittenboer, G., & de Glopper, C. (1999). The validity of comparative educational studies. Educational Measurement: Issues and Practice, 18(3), 18–26.
Article Google Scholar
Behrisch, M., Bach, B., Henry Riche, N., Schreck, T., & Fekete, J. (2016). Matrix reordering methods for table and network visualization. Computer Graphics Forum, 35(3), 693–716. https://doi.org/10.1111/cgf.12935.
Article Google Scholar
Brazma, A., & Vilo, J. (2000). Gene expression data analysis. FEBS Letters, 480, 117–24.
Article Google Scholar
Brinkhuis, M.J., Bakker, M., & Maris, G. (2015). Filtering data for detecting differential development. Journal of Educational Measurement, 52(3), 319–338.
Article Google Scholar
Cadima, J., & Joliffe, I. (2009). On relationships between uncentred and column-centred principal component analysis. Pakistan Journal of Statistics, 25(4), 473–503.
Google Scholar
Doebler, A. (2019). Looking at dif from a new perspective: A structure-based approach acknowledging inherent indefinability. Applied Psychological Measurement, 43(4), 303–321.
Article Google Scholar
Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika 1211–218.
Everitt, B., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis. Chichester: Wiley.
Book Google Scholar
Gabriel, K. (1971). The biplot graphical display of matrices with application to principal component analysis. Biometrika, 456–467.
Glas, C.A.W., & Verhelst, N.D. (1995). Testing the Rasch model. In Fischer, G. H., & Molenaar, I W (Eds.) Rasch models: Foundations, recent developments, and applications, chap 5 (pp. 69–95). New York: Springer.
Golub, G.H., & Van Loan, C.F. (1996). Matrix computations, 3rd edn. Johns Hopkins University Press.
Greenacre, M. (2010). Biplots in practice. Bilbao: BBVA Foundation. http://www.multivariatestatistics.org.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, inference and prediction. Springer Series in Statistics. New York: Springer.
Google Scholar
Holland, P., & Wainer, H. (2012). Differential item functioning. Taylor & Francis. https://books.google.nl/books?id=6YAXJfswvfYC.
Jolliffe, I. (2002). Principal component analysis. Springer Series in Statistics, Springer, Berlin.
Kolde, R. (2019). pheatmap: Pretty Heatmaps. https://CRAN.R-project.org/package=pheatmap, r package version 1.0.12.
Koops, J., Bechger, T., & Maris, G. (in press). Research for practical issues and solutions in computerized multistage testing (chap 19). In von Davier, A., & Duanli, Y (Eds.) (pp. 201–216). London: Routledge.
Lele, S., & Richtsmeier, J. (2001). An invariant approach to statistical analysis of shapes. Chapman & Hall/CRC Interdisciplinary Statistics, CRC Press.
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associate.
Google Scholar
Madeira, S., & Oliveira, A. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 24–45.
Maris, G., Bechger, T., & San Martin, E. (2015). A Gibbs sampler for the (extended) marginal Rasch model. Psychometrika, 80(4), 859–879.
Article Google Scholar
Maris, G., Bechger, T., Koops, J., & Partchev, I. (2019). dexter: Data management and analysis of tests. https://CRAN.R-project.org/package=dexter, r package version 1.0.0.
Millsap, R. (2012). Statistical approaches to measurement invariance. Routledge.
OECD. (2014). Pisa 2012 technical report.
Oshima, T., & Miller, M.D. (1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16(3), 237–248.
Article Google Scholar
Padilha, V., & Campello, R. (2017). A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics, 18. https://doi.org/10.1186/s12859-017-1487-1.
R. Core Team. (2019). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/.
Google Scholar
San Martín, E., & Rolin, J. (2013). Identification of parametric Rasch-type models. Journal of Statistical Planning and Inference, 143(1), 116–130.
Article Google Scholar
Thompson, D.R., Huntley, M.A., & Suurtamm, C. (2017). International perspectives on mathematics curriculum. IAP.
Travers, K.J., & Westbury, I. (1989). The IEA study of mathematics I: Analysis of mathematics curricula. Pergamon Press.
Verhelst, N. (2012). Profile analysis: A closer look at the PISA 2000 reading data. Scandinavian Journal of Educational Research, 56(3), 315–332. https://doi.org/10.1080/00313831.2011.583937.
Article Google Scholar
Wang, T., Strobl, C., Zeileis, A., & Merkle, E. (2018). Score-based tests of differential item functioning via pairwise maximum likelihood estimation. Psychometrika, 83(1), 132–155.
Article Google Scholar
Zwitser, R., Glaser, S., & Maris, G. (2017). Monitoring countries in a changing world: A new look at DIF in international surveys. Psychometrika, 82 (1), 210–232.
Article Google Scholar

Download references

Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 765400.

Author information

Authors and Affiliations

Cito, Arnhem, The Netherlands
Edwin Cuellar & Ivailo Partchev
University of Amsterdam, Amsterdam, The Netherlands
Robert Zwitser
ACTNext by ACT, Iowa City, IA, USA
Timo Bechger

Authors

Edwin Cuellar
View author publications
You can also search for this author in PubMed Google Scholar
Ivailo Partchev
View author publications
You can also search for this author in PubMed Google Scholar
Robert Zwitser
View author publications
You can also search for this author in PubMed Google Scholar
Timo Bechger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edwin Cuellar.

Ethics declarations

Conflict of interest

The authors have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: R code

Here is some minimal R code to produce the plot in Fig. 6. It assumes that the item difficulties are in a matrix, here called x, with items in the rows and countries in the columns, and that the rows and columns are appropriately labeled.

x = scale(x, center=TRUE, scale=FALSE) s = svd(x, 4, 4) m = tcrossprod(s$u[,2:4] library(pheatmap) attr(m,'dimnames') = attr(x,'dimnames') dom = data.frame(Content = factor(substr(row.names(m),1,1))) row.names(dom) = row.names(m) pheatmap(m, scale='none', annotation_row = dom, cutree_cols = 6)

For Fig. 7, change the last line to:

pheatmap(m, scale='none', annotation_row = dom, cutree_cols = 6, clustering_method = 'average')

Appendix 2: Correspondence with the original PISA labels

ItemID	Label	ItemID	Label	ItemID	Label
PM00GQ01	SCFZ1	PM446Q02	CSFA2	PM909Q02	QPEB1
PM00KQ02	SCFC1	PM447Q01	SPEA2	PM909Q03	CPIB1
PM033Q01	SCIN1	PM462Q01D	SSES1	PM915Q01	UPEW1
PM034Q01T	SOFN1	PM464Q01T	SPFS1	PM915Q02	CPEW1
PM155Q01	CSIN2	PM474Q01	QCEC2	PM949Q01T	SOEA1
PM155Q02D	CSEN1	PM496Q01T	QPFA2	PM949Q02T	SOEA2
PM155Q03D	CSEN2	PM496Q02	QPEA2	PM949Q03	SOFA2
PM155Q04T	CSIN1	PM559Q01	QPII1	PM955Q01	UPIA5
PM192Q01T	CSFG1	PM564Q01	QPFI1	PM955Q02	UPIA3
PM273Q01T	SOEZ1	PM564Q02	UPFI1	PM955Q03	UPEA2
PM305Q01	SPEA1	PM571Q01	CSIG1	PM982Q01	UPEA1
PM406Q01	SPEA3	PM603Q01T	QSEO1	PM982Q02	UPEA3
PM406Q02	SPFA1	PM800Q01	QCEC1	PM982Q03T	UPIA2
PM408Q01T	UPIA4	PM803Q01T	UOFC1	PM982Q04	UPFA1
PM411Q01	QPEA3	PM828Q01	CSEN3	PM992Q01	SOFF2
PM411Q02	UPIA1	PM828Q02	USEN1	PM992Q02	SOFF1
PM420Q01T	UCIA3	PM828Q03	QSEN1	PM992Q03	COFF1
PM423Q01	UCIA1	PM906Q01	QSEA2	PM998Q02	CCIL1
PM442Q02	QPIA1	PM906Q02	QSEA1	PM998Q04T	CCEL1
PM446Q01	CSFA1	PM909Q01	QPIB1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cuellar, E., Partchev, I., Zwitser, R. et al. Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments. Educ Asse Eval Acc 33, 9–25 (2021). https://doi.org/10.1007/s11092-021-09355-x

Download citation

Received: 30 November 2019
Accepted: 07 January 2021
Published: 08 February 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11092-021-09355-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Parental Educational Expectations and Academic Achievement in Children and Adolescents—a Meta-analysis

Mental health and academic performance: a study on selection and causation effects from childhood to early adulthood

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendices

Appendix 1: R code

Appendix 2: Correspondence with the original PISA labels

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Making sense out of measurement non-invariance: how to explore differences among educational systems in international large-scale assessments

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Parental Educational Expectations and Academic Achievement in Children and Adolescents—a Meta-analysis

Mental health and academic performance: a study on selection and causation effects from childhood to early adulthood

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendices

Appendix 1: R code

Appendix 2: Correspondence with the original PISA labels

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation