Skip to main content
Log in

Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning

  • Published:
Quality of Life Research Aims and scope Submit manuscript

Abstract

Objective

To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis.

Study design and setting

A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R 2 > 0.02). Scale level was estimated by the sum score, after purification.

Results

Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1–2.3 points on the 0–100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively.

Conclusion

Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Revicki, D. A., Osoba, D., Fairclough, D., Barofsky, I., Berzon, R., Leidy, N. K., et al. (2000). Recommendations on health-related quality of life research to support labeling and promotional claims in the United States. Quality of Life Research, 9(8), 887–900.

    Article  CAS  PubMed  Google Scholar 

  2. Snyder, C. F., Aaronson, N. K., Choucair, A. K., Elliott, T. E., Greenhalgh, J., Halyard, M. Y., et al. (2012). Implementing patient-reported outcomes assessment in clinical practice: A review of the options and considerations. Quality of Life Research, 21(8), 1305–1314.

    Article  PubMed  Google Scholar 

  3. Contopoulos-Ioannidis, D. G., Karvouni, A., Kouri, I., & Ioannidis, J. P. (2009). Reporting and interpretation of SF-36 outcomes in randomised trials: Systematic review. BMJ, 338, a3006.

    Article  PubMed Central  PubMed  Google Scholar 

  4. Fayers P. M., & Hays R. D. (2004). Assessing quality of life in clinical trialsMethods and practice (2nd ed.). Oxford: Oxford University Press.

  5. Carle, A., Laurberg, P., Pedersen, I. B., Knudsen, N., Perrild, H., Ovesen, L., et al. (2006). Epidemiology of subtypes of hypothyroidism in Denmark. European Journal of Endocrinology, 154(1), 21–28.

    Article  CAS  PubMed  Google Scholar 

  6. Carle, A., Pedersen, I. B., Knudsen, N., Perrild, H., Ovesen, L., Rasmussen, L. B., et al. (2011). Epidemiology of subtypes of hyperthyroidism in Denmark: A population-based study. European Journal of Endocrinology, 164(5), 801–809.

    Article  CAS  PubMed  Google Scholar 

  7. Watt, T., Groenvold, M., Rasmussen, A. K., Bonnema, S. J., Hegedüs, L., Bjorner, J. B., et al. (2006). Quality of life in patients with benign thyroid disorders. A review. European Journal of Endocrinology, 154, 501–510.

    Article  CAS  PubMed  Google Scholar 

  8. Watt, T., Hegedüs, L., Rasmussen, A. K., Groenvold, M., Bonnema, S. J., Bjorner, J. B., et al. (2007). Which domains of thyroid-related quality of life are most relevant? Patients and clinicians provide complementary perspectives. Thyroid, 17(7), 647–654.

    Article  PubMed  Google Scholar 

  9. Watt, T., Rasmussen, A. K., Groenvold, M., Bjorner, J. B., Watt, S. H., Bonnema, S. J., et al. (2008). Improving a newly developed patient-reported outcome for thyroid patients, using cognitive interviewing. Quality of Life Research, 17(7), 1009–1017.

    Article  PubMed  Google Scholar 

  10. Watt, T., Bjorner, J. B., Groenvold, M., Rasmussen, A. K., Bonnema, S. J., Hegedüs, L., et al. (2009). Establishing construct validity for the thyroid-specific patient reported outcome measure (ThyPRO): An initial examination. Quality of Life Research, 18(4), 483–496.

    Article  PubMed  Google Scholar 

  11. Watt, T., Hegedüs, L., Groenvold, M., Bjorner, J. B., Rasmussen, A. K., Bonnema, S. J., et al. (2010). Validity and reliability of the novel thyroid-specific quality of life questionnaire, ThyPRO. European Journal of Endocrinology, 162(1), 161–167.

    Article  CAS  PubMed  Google Scholar 

  12. Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105–118.

    Article  Google Scholar 

  13. Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118–128.

    Article  Google Scholar 

  14. Swaminathan, A. P., & Rogers, J. H. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.

    Article  Google Scholar 

  15. French, A. W., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33(3), 315–332.

    Article  Google Scholar 

  16. Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care, 45(5 Suppl 1), S22–S31.

    Article  PubMed  Google Scholar 

  17. Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(Suppl 1), 33–42.

    Article  PubMed  Google Scholar 

  18. Bjorner, J. B., Chang, C. H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research, 16(Suppl 1), 95–108.

    Article  PubMed  Google Scholar 

  19. Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenzel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity. Hillsdale, NJ.

  20. Muthen, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115–132.

    Article  Google Scholar 

  21. Teresi, J. A. (2006). Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl 3), S152–S170.

    Article  PubMed  Google Scholar 

  22. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores (1st ed.). Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

    Google Scholar 

  23. Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11 Suppl 3), S115–S123.

    Article  PubMed  Google Scholar 

  24. Cook, K. F., Teal, C. R., Bjorner, J. B., Cella, D., Chang, C. H., Crane, P. K., et al. (2007). IRT health outcomes data analysis project: An overview and summary. Quality of Life Research, 16(Suppl 1), 121–132.

    Article  PubMed  Google Scholar 

  25. Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349.

    Article  Google Scholar 

  26. Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691–692.

    Article  Google Scholar 

  27. Bjorner, J. B., & Pejtersen, J. H. (2010). Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect. Scandinavian Journal of Public Health, 38(3 Suppl), 90–105.

    Article  PubMed  Google Scholar 

  28. Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2007). The use of differential item functioning analyses to identify cultural differences in responses to the EORTC QLQ-C30. Quality of Life Research, 16(1), 115–129.

    Article  CAS  PubMed  Google Scholar 

  29. Hidalgo, M. D., & Lopez-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64(6), 903–915.

    Article  Google Scholar 

  30. Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., et al. (2007). A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research, 16(Suppl 1), 69–84.

    Article  PubMed  Google Scholar 

  31. Bjorner, J. B., Kosinski, M., & Ware, J. E., Jr. (2003). Calibration of an item pool for assessing the burden of headaches: An application of item response theory to the headache impact test (HIT). Quality of Life Research, 12(8), 913–933.

    Article  PubMed  Google Scholar 

  32. Martin, M., Blaisdell, B., Kwong, J. W., & Bjorner, J. B. (2004). The Short-Form Headache Impact Test (HIT-6) was psychometrically equivalent in nine languages. Journal of Clinical Epidemiology, 57(12), 1271–1278.

    Article  PubMed  Google Scholar 

  33. Schmidt, S., Debensason, D., Muhlan, H., Petersen, C., Power, M., Simeoni, M. C., et al. (2006). The DISABKIDS generic quality of life instrument showed cross-cultural validity. Journal of Clinical Epidemiology, 59(6), 587–598.

    Article  PubMed  Google Scholar 

  34. Schmidt, S., Muhlan, H., & Power, M. (2006). The EUROHIS-QOL 8-item index: Psychometric results of a cross-cultural field study. European Journal of Public Health, 16(4), 420–428.

    Article  PubMed  Google Scholar 

  35. Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15(3), 185–197.

    Article  Google Scholar 

  36. French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3), 373–393.

    Article  Google Scholar 

  37. SAS Institute Inc. (2004). SAS/STAT 9.1 user’s guide (4th ed.). Cary: SAS Institue Inc.

    Google Scholar 

  38. Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41(5), 582–592.

    PubMed  Google Scholar 

  39. Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2009). Differential item functioning (DIF) in the EORTC QLQ-C30: A comparison of baseline, on-treatment and off-treatment data. Quality of Life Research, 18(3), 381–388.

    Article  PubMed  Google Scholar 

  40. Teresi, J. A., Ramirez, M., Jones, R. N., Choi, S., & Crane, P. K. (2012). Modifying measures based on differential item functioning (DIF) impact analyses. Journal of Aging and Health, 24(6), 1044–1076.

    Article  PubMed  Google Scholar 

  41. Lai, J. S., Teresi, J., & Gershon, R. (2005). Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Evaluation and the Health Professions, 28(3), 283–294.

    Article  PubMed  Google Scholar 

  42. Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62(3), 288–295.

    Article  PubMed  Google Scholar 

  43. Navas-Ara, M. J., & Gomez-Benito, J. (2002). Effects of ability scale purification on the identification of DIF. European Journal of Psychological Assessment, 18(1), 9–15.

    Article  Google Scholar 

  44. Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74.

    Article  Google Scholar 

  45. Yang, F. M., & Jones, R. N. (2007). Center for Epidemiologic Studies-Depression Scale (CES-D) item response bias found with Mantel-Haenszel method was successfully replicated using latent variable modeling. Journal of Clinical Epidemiology, 60(11), 1195–1200.

    Article  PubMed Central  PubMed  Google Scholar 

  46. Cole, S. R., Kawachi, I., Maller, S. J., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale. Experience from the New Haven EPESE study. Journal of Clinical Epidemiology, 53(3), 285–289.

    Article  CAS  PubMed  Google Scholar 

  47. Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Eimicke, J. P., Crane, P. K., Jones, R. N., et al. (2009). Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach. Psychology Science Quarterly, 51(2), 148–180.

    PubMed Central  PubMed  Google Scholar 

  48. Gibbons, R. D., & Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436.

    Article  Google Scholar 

  49. Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16(Suppl 1), 19–31.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This study has been supported by grants from the Danish Agency for Science, Technology and Innovation: Council for Strategic Research and Council for Independent Research. LH is supported by an unrestricted research grant from the Novo Nordisk Foundation.

Conflict of interest

None of the authors have any financial conflicts of interest to declare. The ThyPRO was developed by the research team authoring this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Torquil Watt.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 193 kb)

Supplementary material 2 (PDF 307 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Watt, T., Groenvold, M., Hegedüs, L. et al. Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning. Qual Life Res 23, 327–338 (2014). https://doi.org/10.1007/s11136-013-0462-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11136-013-0462-1

Keywords

Navigation