Abstract
Purpose
To determine if persons with arthritis differ systematically from persons without arthritis in how they respond to questions on three depression questionnaires, which include somatic items such as fatigue and sleep disturbance.
Methods
We extracted data on the Centers for Epidemiological Studies Depression (CES-D) scale, the Patient Health Questionnaire-9 (PHQ-9), and the Kessler-6 (K-6) scale from three large population-based national surveys. We assessed items on these questionnaires for differential item functioning (DIF) between persons with and without self-reported physician-diagnosed arthritis using multiple indicator multiple cause models, which controlled for the underlying level of depression and important confounders. We also examined if DIF by arthritis status was similar between women and men.
Results
Although five items of the CES-D, one item of the PHQ-9, and five items of the K-6 scale had evidence of DIF based on statistical comparisons, the magnitude of each difference was less than the threshold of a small effect. The statistical differences were a function of the very large sample sizes in the surveys. Effect sizes for DIF were similar between women and men except for two items on the Patient Health Questionnaire-9. For each questionnaire, DIF accounted for 8% or less of the arthritis-depression association, and excluding items with DIF did not reduce the difference in depression scores between those with and without arthritis.
Conclusions
Persons with arthritis respond to items on the CES-D, PHQ-9, and K-6 depression scales similarly to persons without arthritis, despite the inclusion of somatic items in these scales.
Similar content being viewed by others
References
Naughton, M. J., Shumaker, S. A., Anderson, R. T., & Czajkowski, S. M. (1996). Psychological aspects of health-related quality of life measurement: Tests and scales. In B. Spilker (Ed.), Quality of life and pharmacoeconomics in clinical trials (2nd ed., pp. 117–131). Philadelphia: Lippincott-Raven Publishers.
Smarr, K. L. (2003). Measures of depression and depressive symptoms. Arthritis and Rheumatism, 49(S5), S134–S146.
Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1(3), 385–401.
Spitzer, R. L., Kroenke, K., Williams, J. B., & Patient Health Questionnaire Primary Care Study Group. (1999). Validation and utility of a self-report version of PRIME-MD: The PHQ primary care study. JAMA, 282(18), 1737–1744.
Kessler, R. C., Barker, P. R., Colpe, L. J., Epstein, J. F., Gfroerer, J. C., Hiripi, E., et al. (2003). Screening for serious mental illness in the general population. Archives of General Psychiatry, 60(2), 184–189.
Breslau, J., Javaras, K. N., Blacker, D., Murphy, J. M., & Normand, S. L. (2008). Differential item functioning between ethnic groups in the epidemiological assessment of depression. The Journal of Nervous and Mental Disease, 196(4), 297–306.
Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale: Erlbaum.
Teresi, J. A., Ramirez, M., Lai, J. S., & Silver, S. (2008). Occurrences and sources of Differential Item Functioning (DIF) in patient-reported outcome measures: Description of DIF methods, and review of measures of depression, quality of life and general health. Psychology Science Quarterly, 50(4), 538.
Kim, Y., Pilkonis, P. A., Frank, E., Thase, M. E., & Reynolds, C. F. (2002). Differential functioning of the Beck Depression Inventory in late-life patients: Use of item response theory. Psychology and Aging, 17(3), 379–391.
Balsis, S., & Cully, J. A. (2008). Comparing depression diagnostic symptoms across younger and older adults. Aging and Mental Health, 12(6), 800–806.
Berkman, L. F., Berkman, C. S., Kasl, S., Freeman, D. H., Jr., Leo, L., Ostfeld, A. M., et al. (1986). Depressive symptoms in relation to physical health and functioning in the elderly. American Journal of Epidemiology, 124(3), 372–388.
Blalock, S. J., DeVellis, R. F., Brown, G. K., & Wallston, K. A. (1989). Validity of the Center for Epidemiological Studies Depression scale in arthritis populations. Arthritis and Rheumatism, 32(8), 991–997.
Peck, J. R., Smith, T. W., Ward, J. R., & Milano, R. (1989). Disability and depression in rheumatoid arthritis. A multi-trait, multi-method investigation. Arthritis and Rheumatism, 32(9), 1100–1106.
Callahan, L. F., Kaplan, M. R., & Pincus, T. (1991). The Beck Depression Inventory, Center for Epidemiological Studies Depression scale (CES-D), and General Well-Being Schedule Depression subscale in rheumatoid arthritis. Criterion contamination of responses. Arthritis and Rheumatism, 4(1), 3–11.
Pincus, T., Hassett, A. L., & Callahan, L. F. (2009). Criterion contamination of depression scales in patients with rheumatoid arthritis: The need for interpretation of patient questionnaires (as all clinical measures) in the context of all information about the patient. Rheumatic Disease Clinics of North America, 35(4), 861–864.
Covic, T., Pallant, J. F., Conaghan, P. G., & Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5(1), 41. doi:10.1186/1477-7525-5-41.
Thombs, B. D., Fuss, S., Hudson, M., Schieir, O., Taillefer, S. S., Fogel, J., et al. (2008). High rates of depressive symptoms among patients with systemic sclerosis are not explained by differential reporting of somatic symptoms. Arthritis Care & Research., 59(3), 431–437.
Grayson, D. A., Mackinnon, A., Jorm, A. F., Creasey, H., & Broe, G. A. (2000). Item bias in the Center for Epidemiologic Studies Depression scale: Effects of physical disorders and disability in an elderly community sample. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences, 55(5), P273–P282.
Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54(4), 557–585.
Woods, C. M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44(1), 1–27.
Stommel, M., Given, B. A., Given, C. W., Kalaian, H. A., Schulz, R., & McCorkle, R. (1993). Gender bias in the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D). Psychiatry Research, 49(3), 239–250.
Piccinelli, M., & Wilkinson, G. (2000). Gender differences in depression. The British Journal of Psychiatry, 177(6), 486–492.
van de Velde, S., Bracke, P., & Levecque, K. (2010). Gender differences in depression in 23 European countries. Cross-national variation in the gender gap in depression. Social Science and Medicine, 71(3), 305–313.
National Center for Health Statistics. (1987). Plan and operation of the NHANES I Epidemiologic Followup Study, 1982-84. Vital and Health Statistics. Series 1-22. DHHS Pub No 87-1324. Public Health Service. Washington, DC: U.S. Government Printing Office.
Morbidity and Mortality Weekly Report. (2010). Prevalence of Doctor-Diagnosed Arthritis and Arthritis-Attributable Activity Limitation—United States, 2007–2009. https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5939a1.htm. Accessed April 25, 2017.
National Center for Health Statistics. (2007). National Health and Nutrition Examination Survey. http://www.cdc.gov/nchs/nhanes/nhanes2007-2008/nhanes07_08.htm. Accessed March 13, 2016.
National Center for Health Statistics. (1997). National Health Interview Survey. http://www.cdc.gov/nchs/nhis/quest_data_related_1997_forward.htm. Accessed March 13, 2016.
Muthén, B., & Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus. Mplus Web Notes, 4(5), 1–22.
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238–246.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38(1), 1–10.
Steiger, J. H. (1990). Structural model evaluation and modification—An interval estimation approach. Multivariate Behavioral Research, 25(2), 173–180.
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance-structures. Psychological Bulletin, 88(3), 588–606.
Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling, 11(3), 320–341.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale: Erlbaum.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2009). Converting among effect sizes. In Introduction to meta-analysis (pp. 45-49). West Sussex: Wiley.
Cole, S. R., Kawachi, I., Maller, S. J., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale: Experience from the New Haven EPESE study. Journal of Clinical Epidemiology, 53(3), 285–289.
Salokangas, R. K., Vaahtera, K., Pacriev, S., Sohlman, B., & Lehtinen, V. (2002). Gender differences in depressive symptoms. An artefact caused by measurement instruments? Journal of Affective Disorders, 68(2), 215–220.
Wenzel, A., Steer, R. A., & Beck, A. T. (2005). Are there any gender differences in frequency of self-reported somatic symptoms of depression? Journal of Affective Disorders, 89(1–3), 177–181.
Cameron, I. M., Crawford, J. R., Lawton, K., & Reid, I. C. (2013). Differential item functioning of the HADS and PHQ-9: An investigation of age, gender and educational background in a clinical UK primary care sample. Journal of Affective Disorders, 147(1), 262–268.
Kalpakijan, C. Z., Toussaint, L. L., Albright, K. J., Bombardier, C. H., Krause, J. K., & Tate, D. G. (2009). Patient health questionnaire-9 in spinal cord injury: An examination of factor structure as related to gender. Journal of Spinal Cord Medicine, 32(2), 147–156.
Crane, P. K., Gibbons, L. E., Willig, J. H., Mugavero, M. J., Lawrence, S. T., Schumacher, J. E., et al. (2010). Measuring depression levels in HIV-infected patients as part of routine clinical care using the 9-item patient health questionnaire (PHQ-9). AIDS Care, 22(7), 874–885.
Carleton, R. N., Thibodeau, M. A., Teale, M. J., Welch, P. G., Abrams, M. P., Robinson, T., et al. (2013). The center for epidemiologic studies depression scale: A review with a theoretical and empirical examination of item content and factor structure. PLoS ONE, 8(3), e58067. doi:10.1371/journal.pone.0058067.
Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859–883.
Funding
This study was supported by the Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health (ZIA-AR-041153).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest in this paper. All data are in the public domain. The research was exempted from human subjects review by the NIH Office of Human Subjects Research Protection.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hu, J., Ward, M.M. Screening for depression in arthritis populations: an assessment of differential item functioning in three self-reported questionnaires. Qual Life Res 26, 2507–2517 (2017). https://doi.org/10.1007/s11136-017-1601-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-017-1601-x