Abstract
Objective
To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis.
Study design and setting
A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R 2 > 0.02). Scale level was estimated by the sum score, after purification.
Results
Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1–2.3 points on the 0–100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively.
Conclusion
Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.
Similar content being viewed by others
References
Revicki, D. A., Osoba, D., Fairclough, D., Barofsky, I., Berzon, R., Leidy, N. K., et al. (2000). Recommendations on health-related quality of life research to support labeling and promotional claims in the United States. Quality of Life Research, 9(8), 887–900.
Snyder, C. F., Aaronson, N. K., Choucair, A. K., Elliott, T. E., Greenhalgh, J., Halyard, M. Y., et al. (2012). Implementing patient-reported outcomes assessment in clinical practice: A review of the options and considerations. Quality of Life Research, 21(8), 1305–1314.
Contopoulos-Ioannidis, D. G., Karvouni, A., Kouri, I., & Ioannidis, J. P. (2009). Reporting and interpretation of SF-36 outcomes in randomised trials: Systematic review. BMJ, 338, a3006.
Fayers P. M., & Hays R. D. (2004). Assessing quality of life in clinical trials—Methods and practice (2nd ed.). Oxford: Oxford University Press.
Carle, A., Laurberg, P., Pedersen, I. B., Knudsen, N., Perrild, H., Ovesen, L., et al. (2006). Epidemiology of subtypes of hypothyroidism in Denmark. European Journal of Endocrinology, 154(1), 21–28.
Carle, A., Pedersen, I. B., Knudsen, N., Perrild, H., Ovesen, L., Rasmussen, L. B., et al. (2011). Epidemiology of subtypes of hyperthyroidism in Denmark: A population-based study. European Journal of Endocrinology, 164(5), 801–809.
Watt, T., Groenvold, M., Rasmussen, A. K., Bonnema, S. J., Hegedüs, L., Bjorner, J. B., et al. (2006). Quality of life in patients with benign thyroid disorders. A review. European Journal of Endocrinology, 154, 501–510.
Watt, T., Hegedüs, L., Rasmussen, A. K., Groenvold, M., Bonnema, S. J., Bjorner, J. B., et al. (2007). Which domains of thyroid-related quality of life are most relevant? Patients and clinicians provide complementary perspectives. Thyroid, 17(7), 647–654.
Watt, T., Rasmussen, A. K., Groenvold, M., Bjorner, J. B., Watt, S. H., Bonnema, S. J., et al. (2008). Improving a newly developed patient-reported outcome for thyroid patients, using cognitive interviewing. Quality of Life Research, 17(7), 1009–1017.
Watt, T., Bjorner, J. B., Groenvold, M., Rasmussen, A. K., Bonnema, S. J., Hegedüs, L., et al. (2009). Establishing construct validity for the thyroid-specific patient reported outcome measure (ThyPRO): An initial examination. Quality of Life Research, 18(4), 483–496.
Watt, T., Hegedüs, L., Groenvold, M., Bjorner, J. B., Rasmussen, A. K., Bonnema, S. J., et al. (2010). Validity and reliability of the novel thyroid-specific quality of life questionnaire, ThyPRO. European Journal of Endocrinology, 162(1), 161–167.
Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105–118.
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118–128.
Swaminathan, A. P., & Rogers, J. H. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
French, A. W., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33(3), 315–332.
Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care, 45(5 Suppl 1), S22–S31.
Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(Suppl 1), 33–42.
Bjorner, J. B., Chang, C. H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research, 16(Suppl 1), 95–108.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenzel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity. Hillsdale, NJ.
Muthen, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115–132.
Teresi, J. A. (2006). Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl 3), S152–S170.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores (1st ed.). Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11 Suppl 3), S115–S123.
Cook, K. F., Teal, C. R., Bjorner, J. B., Cella, D., Chang, C. H., Crane, P. K., et al. (2007). IRT health outcomes data analysis project: An overview and summary. Quality of Life Research, 16(Suppl 1), 121–132.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349.
Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692.
Bjorner, J. B., & Pejtersen, J. H. (2010). Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect. Scandinavian Journal of Public Health, 38(3 Suppl), 90–105.
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2007). The use of differential item functioning analyses to identify cultural differences in responses to the EORTC QLQ-C30. Quality of Life Research, 16(1), 115–129.
Hidalgo, M. D., & Lopez-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64(6), 903–915.
Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., et al. (2007). A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research, 16(Suppl 1), 69–84.
Bjorner, J. B., Kosinski, M., & Ware, J. E., Jr. (2003). Calibration of an item pool for assessing the burden of headaches: An application of item response theory to the headache impact test (HIT). Quality of Life Research, 12(8), 913–933.
Martin, M., Blaisdell, B., Kwong, J. W., & Bjorner, J. B. (2004). The Short-Form Headache Impact Test (HIT-6) was psychometrically equivalent in nine languages. Journal of Clinical Epidemiology, 57(12), 1271–1278.
Schmidt, S., Debensason, D., Muhlan, H., Petersen, C., Power, M., Simeoni, M. C., et al. (2006). The DISABKIDS generic quality of life instrument showed cross-cultural validity. Journal of Clinical Epidemiology, 59(6), 587–598.
Schmidt, S., Muhlan, H., & Power, M. (2006). The EUROHIS-QOL 8-item index: Psychometric results of a cross-cultural field study. European Journal of Public Health, 16(4), 420–428.
Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15(3), 185–197.
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3), 373–393.
SAS Institute Inc. (2004). SAS/STAT 9.1 user’s guide (4th ed.). Cary: SAS Institue Inc.
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41(5), 582–592.
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2009). Differential item functioning (DIF) in the EORTC QLQ-C30: A comparison of baseline, on-treatment and off-treatment data. Quality of Life Research, 18(3), 381–388.
Teresi, J. A., Ramirez, M., Jones, R. N., Choi, S., & Crane, P. K. (2012). Modifying measures based on differential item functioning (DIF) impact analyses. Journal of Aging and Health, 24(6), 1044–1076.
Lai, J. S., Teresi, J., & Gershon, R. (2005). Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Evaluation and the Health Professions, 28(3), 283–294.
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62(3), 288–295.
Navas-Ara, M. J., & Gomez-Benito, J. (2002). Effects of ability scale purification on the identification of DIF. European Journal of Psychological Assessment, 18(1), 9–15.
Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74.
Yang, F. M., & Jones, R. N. (2007). Center for Epidemiologic Studies-Depression Scale (CES-D) item response bias found with Mantel-Haenszel method was successfully replicated using latent variable modeling. Journal of Clinical Epidemiology, 60(11), 1195–1200.
Cole, S. R., Kawachi, I., Maller, S. J., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale. Experience from the New Haven EPESE study. Journal of Clinical Epidemiology, 53(3), 285–289.
Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Eimicke, J. P., Crane, P. K., Jones, R. N., et al. (2009). Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach. Psychology Science Quarterly, 51(2), 148–180.
Gibbons, R. D., & Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436.
Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16(Suppl 1), 19–31.
Acknowledgments
This study has been supported by grants from the Danish Agency for Science, Technology and Innovation: Council for Strategic Research and Council for Independent Research. LH is supported by an unrestricted research grant from the Novo Nordisk Foundation.
Conflict of interest
None of the authors have any financial conflicts of interest to declare. The ThyPRO was developed by the research team authoring this paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Watt, T., Groenvold, M., Hegedüs, L. et al. Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning. Qual Life Res 23, 327–338 (2014). https://doi.org/10.1007/s11136-013-0462-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-013-0462-1