Abstract
Bias research began at the end of the 1960s anddeveloped rapidly in the following decades for obvioussocial and political reasons, and due to the importantimpact that this issue has on the field ofpsychological and educational measurement. Since then,several methods have been proposed for the study anddetection of item bias or differential itemfunctioning (DIF). This paper presents a simulationstudy comparing the potential of some of these methodsfor detecting DIF: two IRT-based techniques (area measures), three χ2-based procedures (MantelHaenszel, Logit Model and Logistic Regression) and theRestricted Factor Analysis method. The results showthat the technique that appears to do the best job isthe Mantel Haenszel statistic. Moreover, all detectiontechniques tend to overidentify DIF items, that is,some of the items labeled with DIF may in fact bewithout DIF. This tendency is slightly reversed in theLogistic Regression procedure.
Similar content being viewed by others
References
Angoff, W. H. (1972). A technique for the investigation of cultural differences. Paper presented at the meeting of the American Psychological Association, Honolulu.
Angoff, W. H. (1982). Use of difficulty and discrimination indices for detecting item bias. In: R.A. Berk (ed.), Handbook of Methods for Detecting Test Bias. Baltimore, MD: The Johns Hopkins University Press.
Angoff, W. H. & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement 10: 95–106.
Angoff, W. H. & Sharon, A. L. (1974). The evaluation of differences in test performance of two or more groups. Educational and Psychological Measurement 34: 807–816.
Camilli, G. (1979). A Critique of the Chi-Square Method of Assessing Item Bias. Laboratory of Educational Research, Boulder, CO: University of Colorado.
Camilli, G. & Smith, J. K. (1990). Comparison of the Mantel-Haenszel test with a randomized and a jacknife test. Journal of Educational Statistics 15(1): 53–67.
Clauser, B., Mazor, K. & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education 6(4): 269–279.
Cole, N. S. & Moss, P. A. (1989). Bias in test use. In: R.L. Linn (ed.), Educational Measurement. New York: Macmillan.
Green, D. R. & Draper, J. F. (1972). Exploratory studies of bias in achievement tests. Paper presented at the annual meeting of the American Educational Research Association (AERA), Honolulú.
Hambleton, R. K. & Rogers, H. J. (1989). Detecting potentially biased test items: comparison of IRT area and Mantel-Haenszel methods. Applied Measurement in Education 2(4): 313–334.
Holland, P. W. (1985). On the Study of Differential Item Performance without IRT. Proceedings of the Military Testing Association.
Holland, P. W. & Thayer, D. T. (1986). Differential Item Functioning and the Mantel-Haenszel Procedure (TR No. 86-89). Princeton, NJ: Educational Testing Service.
Holland, P. W. & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In: H. Wainer & H. I. Braun (eds.), Test Validity. Hillsdale, NJ: Lawrence Earlbaum Associates, Inc.
Jöreskog, K. G. & Sörbom, D. (1993 a). LISREL 8 User's Reference Guide. Chicago, IL.: Scientific Software.
Jöreskog, K. G. & Sörbom, D. (1993 b). PRELIS 2 User's Reference Guide. Chicago, IL.: Scientific Software.
Kim, S. & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement 29(1): 51–66.
Lim, R. G. & Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal of Applied Psychology 75: 164–174.
Linn, R. L. & Harnisch, D. L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement 18: 109–118.
Linn, R. L., Levine, M. V., Hastings, C. N. & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement 5: 159–173.
Lord, F. M. (1977). Practical Applications of Item Characteristic Curve Theory. Princeton, NJ: Educational Testing Service.
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: LEA.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics 7: 105–118.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research 13: 127–143.
Miller, M. D. & Oshima, T. C. (1992). Effect of sample size, number of biased items and magnitude of bias on a two-stage item bias estimation method. Applied Psychological Measurement 16: 381–388.
Millsap, R. E. & Everson, H. T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement 17(4): 297–334.
Muthén, B. & Lehman, J. (1985). Multiple group IRT modeling: applications to item bias analysis. Journal of Educational Statistics 10: 133–142.
Oort, F. J. (1992). Using restricted factor analysis to detect item bias. Methodika VI: 150–166.
Oort, F. J. (1993). Theory of violators: assessing unidimensionality of psychological measures. In: R. Steyer, K. F. Wender & K. F. Widaman (eds), Psychometric Methodology. Stuttgart: Gustav Fischer Verlag.
Ozenne, D. G., Van Gelder, N. C. & Cohen, A. J. (1974). Emergency School Aid Act (ESAA) National Evaluation, Achievement Test Standardization. Santa Monica, CA: Systems Development Corporation.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika 53: 495–502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement 14(2): 197–207.
Raju, N. S., Bode, R. K. & Larsen, V. S. (1989). An empirical assessment of the Mantel-Haenszel statistic for studying differential item performance. Applied Measurement in Education 2: 1–13.
Rogers, H. J. & Hambleton, R. K. (1989). Evaluation of computer simulated baseline statistics for use in item bias studies. Educational and Psychological Measurement 49: 355–369.
Rogers, H. J. & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement 17(2): 105–116.
Rudner, L. M. (1977). An approach to biased item identification using latent trait measurement theory. Paper presented at the annual meeting of the AERA.
Rudner, L. M., Getson, P. R. & Knight, D. L. (1980). Biased item detection techniques. Journal of Educational Statistics 5: 213–233.
Saris, W. E., Satorra, A. & Sörbom, D. (1987). The detection and correction of specification errors in structural equation models. In: C. C. Clog (ed.), Sociological Methodology. San Francisco: Jossey-Bass, pp. 105–129.
Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement 16(3): 143–152.
Shepard, L., Camilli, G. & Williams, D. M. (1984). Accounting for statistical artifacts in item bias research. Journal of Educational Statistics 9: 93–128.
Shepard, L., Camilli, G. & Williams, D. M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement 26: 55–66.
Spray, J. & Carlson, J. (1986, April). Comparison of loglinear and logistic regression models for detecting changes in proportions. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Swaminathan, H. & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement 27(4): 361–370.
Thissen, D., Steinberg, L. & Gerrard, M. (1986). Beyond group-mean differences: the concept of item bias. Psychological Bulletin 99: 118–128.
Thissen, D., Steinberg, L. & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In: H. Wainer & H. I. Braun (eds), Test Validity. Hillsdale, NJ: LEA.
van der Flier, H., Mellenbergh, G. J., Ader, H. J. & Wijn, M. (1984). An iterative item bias detection method. Journal of Educational Measurement 21: 131–145.
Warm, T. A. (1978). A primer of item response theory Technical Rep. No. 941078. Washington DC.: U.S. Coast Guard Institute.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gómez-benito, J., Navas-ara, M.J. A Comparison of χ2, RFA and IRT Based Procedures in the Detection of DIF. Quality & Quantity 34, 17–31 (2000). https://doi.org/10.1023/A:1004703709442
Issue Date:
DOI: https://doi.org/10.1023/A:1004703709442