Skip to main content
Log in

A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF

  • Published:
Quality and Quantity Aims and scope Submit manuscript

Abstract

Bias research began at the end of the 1960s anddeveloped rapidly in the following decades for obvioussocial and political reasons, and due to the importantimpact that this issue has on the field ofpsychological and educational measurement. Since then,several methods have been proposed for the study anddetection of item bias or differential itemfunctioning (DIF). This paper presents a simulationstudy comparing the potential of some of these methodsfor detecting DIF: two IRT-based techniques (area measures), three χ2-based procedures (MantelHaenszel, Logit Model and Logistic Regression) and theRestricted Factor Analysis method. The results showthat the technique that appears to do the best job isthe Mantel Haenszel statistic. Moreover, all detectiontechniques tend to overidentify DIF items, that is,some of the items labeled with DIF may in fact bewithout DIF. This tendency is slightly reversed in theLogistic Regression procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Angoff, W. H. (1972). A technique for the investigation of cultural differences. Paper presented at the meeting of the American Psychological Association, Honolulu.

  • Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In: R.A. Berk (ed.), Handbook of Methods for Detecting Test Bias. Baltimore, MD: The Johns Hopkins University Press.

    Google Scholar 

  • Angoff, W. H. & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement 10: 95–106.

    Google Scholar 

  • Angoff, W. H. & Sharon, A. L. (1974). The evaluation of differences in test performance of two or more groups. Educational and Psychological Measurement 34: 807–816.

    Google Scholar 

  • Camilli, G. (1979). A Critique of the Chi-Square Method of Assessing Item Bias. Laboratory of Educational Research, Boulder, CO: University of Colorado.

    Google Scholar 

  • Camilli, G. & Smith, J. K. (1990). Comparison of the Mantel-Haenszel test with a randomized and a jacknife test. Journal of Educational Statistics 15(1): 53–67.

    Google Scholar 

  • Clauser, B., Mazor, K. & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education 6(4): 269–279.

    Google Scholar 

  • Cole, N. S. & Moss, P. A. (1989). Bias in test use. In: R.L. Linn (ed.), Educational Measurement. New York: Macmillan.

    Google Scholar 

  • Green, D. R. & Draper, J. F. (1972). Exploratory studies of bias in achievement tests. Paper presented at the annual meeting of the American Educational Research Association (AERA), Honolulú.

  • Hambleton, R. K. & Rogers, H. J. (1989). Detecting potentially biased test items: comparison of IRT area and Mantel-Haenszel methods. Applied Measurement in Education 2(4): 313–334.

    Google Scholar 

  • Holland, P. W. (1985). On the Study of Differential Item Performance without IRT. Proceedings of the Military Testing Association.

  • Holland, P. W. & Thayer, D. T. (1986). Differential Item Functioning and the Mantel-Haenszel Procedure (TR No. 86-89). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Holland, P. W. & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In: H. Wainer & H. I. Braun (eds.), Test Validity. Hillsdale, NJ: Lawrence Earlbaum Associates, Inc.

    Google Scholar 

  • Jöreskog, K. G. & Sörbom, D. (1993 a). LISREL 8 User's Reference Guide. Chicago, IL.: Scientific Software.

    Google Scholar 

  • Jöreskog, K. G. & Sörbom, D. (1993 b). PRELIS 2 User's Reference Guide. Chicago, IL.: Scientific Software.

    Google Scholar 

  • Kim, S. & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement 29(1): 51–66.

    Google Scholar 

  • Lim, R. G. & Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal of Applied Psychology 75: 164–174.

    Google Scholar 

  • Linn, R. L. & Harnisch, D. L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement 18: 109–118.

    Google Scholar 

  • Linn, R. L., Levine, M. V., Hastings, C. N. & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement 5: 159–173.

    Google Scholar 

  • Lord, F. M. (1977). Practical Applications of Item Characteristic Curve Theory. Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: LEA.

    Google Scholar 

  • Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics 7: 105–118.

    Google Scholar 

  • Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research 13: 127–143.

    Google Scholar 

  • Miller, M. D. & Oshima, T. C. (1992). Effect of sample size, number of biased items and magnitude of bias on a two-stage item bias estimation method. Applied Psychological Measurement 16: 381–388.

    Google Scholar 

  • Millsap, R. E. & Everson, H. T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement 17(4): 297–334.

    Google Scholar 

  • Muthén, B. & Lehman, J. (1985). Multiple group IRT modeling: applications to item bias analysis. Journal of Educational Statistics 10: 133–142.

    Google Scholar 

  • Oort, F. J. (1992). Using restricted factor analysis to detect item bias. Methodika VI: 150–166.

    Google Scholar 

  • Oort, F. J. (1993). Theory of violators: assessing unidimensionality of psychological measures. In: R. Steyer, K. F. Wender & K. F. Widaman (eds), Psychometric Methodology. Stuttgart: Gustav Fischer Verlag.

    Google Scholar 

  • Ozenne, D. G., Van Gelder, N. C. & Cohen, A. J. (1974). Emergency School Aid Act (ESAA) National Evaluation, Achievement Test Standardization. Santa Monica, CA: Systems Development Corporation.

    Google Scholar 

  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika 53: 495–502.

    Google Scholar 

  • Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement 14(2): 197–207.

    Google Scholar 

  • Raju, N. S., Bode, R. K. & Larsen, V. S. (1989). An empirical assessment of the Mantel-Haenszel statistic for studying differential item performance. Applied Measurement in Education 2: 1–13.

    Google Scholar 

  • Rogers, H. J. & Hambleton, R. K. (1989). Evaluation of computer simulated baseline statistics for use in item bias studies. Educational and Psychological Measurement 49: 355–369.

    Google Scholar 

  • Rogers, H. J. & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement 17(2): 105–116.

    Google Scholar 

  • Rudner, L. M. (1977). An approach to biased item identification using latent trait measurement theory. Paper presented at the annual meeting of the AERA.

  • Rudner, L. M., Getson, P. R. & Knight, D. L. (1980). Biased item detection techniques. Journal of Educational Statistics 5: 213–233.

    Google Scholar 

  • Saris, W. E., Satorra, A. & Sörbom, D. (1987). The detection and correction of specification errors in structural equation models. In: C. C. Clog (ed.), Sociological Methodology. San Francisco: Jossey-Bass, pp. 105–129.

    Google Scholar 

  • Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement 16(3): 143–152.

    Google Scholar 

  • Shepard, L., Camilli, G. & Williams, D. M. (1984). Accounting for statistical artifacts in item bias research. Journal of Educational Statistics 9: 93–128.

    Google Scholar 

  • Shepard, L., Camilli, G. & Williams, D. M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement 26: 55–66.

    Google Scholar 

  • Spray, J. & Carlson, J. (1986, April). Comparison of loglinear and logistic regression models for detecting changes in proportions. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

  • Swaminathan, H. & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement 27(4): 361–370.

    Google Scholar 

  • Thissen, D., Steinberg, L. & Gerrard, M. (1986). Beyond group-mean differences: the concept of item bias. Psychological Bulletin 99: 118–128.

    Google Scholar 

  • Thissen, D., Steinberg, L. & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In: H. Wainer & H. I. Braun (eds), Test Validity. Hillsdale, NJ: LEA.

    Google Scholar 

  • van der Flier, H., Mellenbergh, G. J., Ader, H. J. & Wijn, M. (1984). An iterative item bias detection method. Journal of Educational Measurement 21: 131–145.

    Google Scholar 

  • Warm, T. A. (1978). A primer of item response theory Technical Rep. No. 941078. Washington DC.: U.S. Coast Guard Institute.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gómez-benito, J., Navas-ara, M.J. A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF. Quality & Quantity 34, 17–31 (2000). https://doi.org/10.1023/A:1004703709442

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1004703709442

Navigation