A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

Gómez-benito, Juana; Navas-ara, María JosÉ

doi:10.1023/A:1004703709442

A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

Published: February 2000

Volume 34, pages 17–31, (2000)
Cite this article

Quality and Quantity Aims and scope Submit manuscript

Juana Gómez-benito¹ &
María JosÉ Navas-ara²

153 Accesses
10 Citations
Explore all metrics

Abstract

Bias research began at the end of the 1960s anddeveloped rapidly in the following decades for obvioussocial and political reasons, and due to the importantimpact that this issue has on the field ofpsychological and educational measurement. Since then,several methods have been proposed for the study anddetection of item bias or differential itemfunctioning (DIF). This paper presents a simulationstudy comparing the potential of some of these methodsfor detecting DIF: two IRT-based techniques (area measures), three χ²-based procedures (MantelHaenszel, Logit Model and Logistic Regression) and theRestricted Factor Analysis method. The results showthat the technique that appears to do the best job isthe Mantel Haenszel statistic. Moreover, all detectiontechniques tend to overidentify DIF items, that is,some of the items labeled with DIF may in fact bewithout DIF. This tendency is slightly reversed in theLogistic Regression procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Angoff, W. H. (1972). A technique for the investigation of cultural differences. Paper presented at the meeting of the American Psychological Association, Honolulu.
Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In: R.A. Berk (ed.), Handbook of Methods for Detecting Test Bias. Baltimore, MD: The Johns Hopkins University Press.
Google Scholar
Angoff, W. H. & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement 10: 95–106.
Google Scholar
Angoff, W. H. & Sharon, A. L. (1974). The evaluation of differences in test performance of two or more groups. Educational and Psychological Measurement 34: 807–816.
Google Scholar
Camilli, G. (1979). A Critique of the Chi-Square Method of Assessing Item Bias. Laboratory of Educational Research, Boulder, CO: University of Colorado.
Google Scholar
Camilli, G. & Smith, J. K. (1990). Comparison of the Mantel-Haenszel test with a randomized and a jacknife test. Journal of Educational Statistics 15(1): 53–67.
Google Scholar
Clauser, B., Mazor, K. & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education 6(4): 269–279.
Google Scholar
Cole, N. S. & Moss, P. A. (1989). Bias in test use. In: R.L. Linn (ed.), Educational Measurement. New York: Macmillan.
Google Scholar
Green, D. R. & Draper, J. F. (1972). Exploratory studies of bias in achievement tests. Paper presented at the annual meeting of the American Educational Research Association (AERA), Honolulú.
Hambleton, R. K. & Rogers, H. J. (1989). Detecting potentially biased test items: comparison of IRT area and Mantel-Haenszel methods. Applied Measurement in Education 2(4): 313–334.
Google Scholar
Holland, P. W. (1985). On the Study of Differential Item Performance without IRT. Proceedings of the Military Testing Association.
Holland, P. W. & Thayer, D. T. (1986). Differential Item Functioning and the Mantel-Haenszel Procedure (TR No. 86-89). Princeton, NJ: Educational Testing Service.
Google Scholar
Holland, P. W. & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In: H. Wainer & H. I. Braun (eds.), Test Validity. Hillsdale, NJ: Lawrence Earlbaum Associates, Inc.
Google Scholar
Jöreskog, K. G. & Sörbom, D. (1993 a). LISREL 8 User's Reference Guide. Chicago, IL.: Scientific Software.
Google Scholar
Jöreskog, K. G. & Sörbom, D. (1993 b). PRELIS 2 User's Reference Guide. Chicago, IL.: Scientific Software.
Google Scholar
Kim, S. & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement 29(1): 51–66.
Google Scholar
Lim, R. G. & Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal of Applied Psychology 75: 164–174.
Google Scholar
Linn, R. L. & Harnisch, D. L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement 18: 109–118.
Google Scholar
Linn, R. L., Levine, M. V., Hastings, C. N. & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement 5: 159–173.
Google Scholar
Lord, F. M. (1977). Practical Applications of Item Characteristic Curve Theory. Princeton, NJ: Educational Testing Service.
Google Scholar
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: LEA.
Google Scholar
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics 7: 105–118.
Google Scholar
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research 13: 127–143.
Google Scholar
Miller, M. D. & Oshima, T. C. (1992). Effect of sample size, number of biased items and magnitude of bias on a two-stage item bias estimation method. Applied Psychological Measurement 16: 381–388.
Google Scholar
Millsap, R. E. & Everson, H. T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement 17(4): 297–334.
Google Scholar
Muthén, B. & Lehman, J. (1985). Multiple group IRT modeling: applications to item bias analysis. Journal of Educational Statistics 10: 133–142.
Google Scholar
Oort, F. J. (1992). Using restricted factor analysis to detect item bias. Methodika VI: 150–166.
Google Scholar
Oort, F. J. (1993). Theory of violators: assessing unidimensionality of psychological measures. In: R. Steyer, K. F. Wender & K. F. Widaman (eds), Psychometric Methodology. Stuttgart: Gustav Fischer Verlag.
Google Scholar
Ozenne, D. G., Van Gelder, N. C. & Cohen, A. J. (1974). Emergency School Aid Act (ESAA) National Evaluation, Achievement Test Standardization. Santa Monica, CA: Systems Development Corporation.
Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika 53: 495–502.
Google Scholar
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement 14(2): 197–207.
Google Scholar
Raju, N. S., Bode, R. K. & Larsen, V. S. (1989). An empirical assessment of the Mantel-Haenszel statistic for studying differential item performance. Applied Measurement in Education 2: 1–13.
Google Scholar
Rogers, H. J. & Hambleton, R. K. (1989). Evaluation of computer simulated baseline statistics for use in item bias studies. Educational and Psychological Measurement 49: 355–369.
Google Scholar
Rogers, H. J. & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement 17(2): 105–116.
Google Scholar
Rudner, L. M. (1977). An approach to biased item identification using latent trait measurement theory. Paper presented at the annual meeting of the AERA.
Rudner, L. M., Getson, P. R. & Knight, D. L. (1980). Biased item detection techniques. Journal of Educational Statistics 5: 213–233.
Google Scholar
Saris, W. E., Satorra, A. & Sörbom, D. (1987). The detection and correction of specification errors in structural equation models. In: C. C. Clog (ed.), Sociological Methodology. San Francisco: Jossey-Bass, pp. 105–129.
Google Scholar
Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement 16(3): 143–152.
Google Scholar
Shepard, L., Camilli, G. & Williams, D. M. (1984). Accounting for statistical artifacts in item bias research. Journal of Educational Statistics 9: 93–128.
Google Scholar
Shepard, L., Camilli, G. & Williams, D. M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement 26: 55–66.
Google Scholar
Spray, J. & Carlson, J. (1986, April). Comparison of loglinear and logistic regression models for detecting changes in proportions. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Swaminathan, H. & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement 27(4): 361–370.
Google Scholar
Thissen, D., Steinberg, L. & Gerrard, M. (1986). Beyond group-mean differences: the concept of item bias. Psychological Bulletin 99: 118–128.
Google Scholar
Thissen, D., Steinberg, L. & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In: H. Wainer & H. I. Braun (eds), Test Validity. Hillsdale, NJ: LEA.
Google Scholar
van der Flier, H., Mellenbergh, G. J., Ader, H. J. & Wijn, M. (1984). An iterative item bias detection method. Journal of Educational Measurement 21: 131–145.
Google Scholar
Warm, T. A. (1978). A primer of item response theory Technical Rep. No. 941078. Washington DC.: U.S. Coast Guard Institute.

Download references

Author information

Authors and Affiliations

Universitat de Barcelona, Spain
Juana Gómez-benito
Universidad Nacional de Educación a Distancia, Spain
María JosÉ Navas-ara

Authors

Juana Gómez-benito
View author publications
You can also search for this author in PubMed Google Scholar
María JosÉ Navas-ara
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gómez-benito, J., Navas-ara, M.J. A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF. Quality & Quantity 34, 17–31 (2000). https://doi.org/10.1023/A:1004703709442

Download citation

Issue Date: February 2000
DOI: https://doi.org/10.1023/A:1004703709442

differential item functioning

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

Abstract

Access this article

Similar content being viewed by others

Best Practices in Detecting Bias in Cognitive Tests

Detection of Differential Item Functioning via the Credible Intervals and Odds Ratios Methods

An R toolbox for score-based measurement invariance tests in IRT models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

Abstract

Access this article

Similar content being viewed by others

Best Practices in Detecting Bias in Cognitive Tests

Detection of Differential Item Functioning via the Credible Intervals and Odds Ratios Methods

An R toolbox for score-based measurement invariance tests in IRT models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation