Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS®) in acute coronary syndrome patients: differential functioning of items and test

Deng, Nina; Anatchkova, Milena D.; Waring, Molly E.; Han, Kyung T.; Ware, John E.

doi:10.1007/s11136-015-0916-8

Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS^®) in acute coronary syndrome patients: differential functioning of items and test

Published: 20 January 2015

Volume 24, pages 1809–1822, (2015)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Nina Deng^1,2,
Milena D. Anatchkova^1,3,
Molly E. Waring¹,
Kyung T. Han⁴ &
…
John E. Ware Jr.^1,5

581 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose

The Quality-of-life (QOL) Disease Impact Scale (QDIS^®) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample.

Method

The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. “ACS-specific” IRT item parameters were calibrated and linearly transformed to compare to “standardized” IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis.

Results

Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005–0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample.

Conclusion

The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and validation of the coronary heart disease scale under the system of quality of life instruments for chronic diseases QLICD-CHD: combinations of classical test theory and Generalizability theory

Article Open access 04 June 2014

Standardizing disease-specific quality of life measures across multiple chronic conditions: development and initial evaluation of the QOL Disease Impact Scale (QDIS®)

Article Open access 02 June 2016

Validation of the EQ-5D in Taiwan using item response theory

Article Open access 19 December 2021

Abbreviations

ACS:: Acute coronary syndrome
ACS-LT:: ACS-specific linearly transformed
CAT:: Computerized adaptive testing
CDIF:: Compensatory differential item functioning
CFA:: Confirmatory factor analysis
DFIT:: Differential functioning of items and tests
DICAT:: The computerized adaptive Assessment of disease impact project
DIF:: Differential item functioning
DTF:: Differential test (scale) functioning
GPCM:: Generalized partial credit model
ICC:: Item characteristic curve
IPD:: Item parameter drift
IRT:: Item response theory
MLHFQ:: Minnesota Living with Heart Failure Questionnaire
NCDIF:: Non-compensatory differential item functioning
PRO:: Patient-reported outcome
PROMIS:: Patient Reported Outcomes Measurement Information System
QDIS^® :: Quality-of-life Disease Impact Scale
QDIS-7:: 7-item short-form of QDIS^®
QOL:: Quality-of-life
RMSD:: Root-mean-square difference
SAQ:: Seattle Angina Questionnaire
TCC:: Test characteristic curve
TRACE-CORE:: The Transitions, Risks, and Actions in Coronary Events-Center for Outcomes Research and Education project

References

Ware, J. E, Jr, Kemp, J. P., Buchner, D. A., et al. (1998). The responsiveness of disease-specific and generic health measures to changes in the severity of asthma among adults. Quality of Life Research, 7(3), 235–244.
Article PubMed Google Scholar
De Boer, A. G., Spruijt, R. J., Sprangers, M. A., & de Haes, J. C. (1998). Disease-specific quality of life: is it one construct? Quality of Life Research, 7(2), 135–142.
Article PubMed Google Scholar
Spertus, J. A., Winder, J. A., Drewhurst, T. A., et al. (1995). Development and evaluation of the Seattle Angina Questionnaire: a new functional status measure for coronary artery disease. Journal of the American College of Cardiology, 25, 333–341.
Article CAS PubMed Google Scholar
Rector, T. S., Kubo, S. H., & Cohn, J. N. (1987). Patients’ self-assessment of their congestive heart failure. Part 2: content, reliability and validity of a new measure, the Minnesota Living with Heart Failure questionnaire. Heart Failure, 3, 198–209.
Google Scholar
Stewart, A. L., Greenfield, S., Hays, R. D., et al. (1989). Functional status and well-being of patients with chronic conditions. Results from the Medical Outcomes Study. JAMA, 262, 907–913.
Article CAS PubMed Google Scholar
Ware, J. E, Jr, Harrington, M., Guyer, R., & Boulanger, R. (2012). A system for integrating generic and disease-specific patient-reported outcome (PRO) measures. Patient Reported Outcomes Newsletter, 48(Fall), 2–4.
Google Scholar
Ware, J. E, Jr, Gandek, B., & Guyer, R. (2014). Measuring disease-specific quality of life (QOL) impact: A manual for users of the QOL Disease Impact Scale (QDIS ^® ). Worcester, MA: JWRG Incorporated.
Google Scholar
Ware, J. E. Jr., Guyer, R., Gandek, B., Deng, N. Standardizing disease-specific quality of life (QOL) impact measures: Development and initial evaluation of the QOL Disease Impact Scale (QDIS ^® ) (submitted).
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Article Google Scholar
Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.
Oldroyd, J. C., Cyril, S., Wijayatilaka, B. S., et al. (2013). Evaluating the impact of depression, anxiety & autonomic function on health related quality of life, vocational functioning and health care utilisation in acute coronary syndrome patients: the ADVENT study protocol. BMC Cardiovascular Disorders, 13, 103.
Article PubMed Central PubMed Google Scholar
Kim, M. J., Jeon, D. S., Gwon, H. C., et al. (2013). Health-related quality-of-life after percutaneous coronary intervention in patients with UA/NSTEMI and STEMI: The Korean multicenter registry. Journal of Korean Medical Science, 28(6), 848–854.
Article PubMed Central PubMed Google Scholar
Waring, M. E., McManus, R. H., Saczynski, J. S., et al. (2012). Transitions, risks, and actions in coronary events-center for outcomes research and education (TRACE-CORE): Design and rationale. Circulation Cardiovascular Quality and Outcomes, 5(5), e44–e50.
Article PubMed Central PubMed Google Scholar
Howard, K. I., & Forehand, G. G. (1962). A method for correcting item-total correlations for the effect of relevant item inclusion. Educational and Psychological Measurement, 22, 731–735.
Article Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.
Article Google Scholar
Oshima, T. C., & Morris, S. B. (2008). Raju’s differential functioning of items and tests (DFIT). Educational Measurement: Issues and Practice, 27, 43–50.
Article Google Scholar
Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23, 309–326.
Article Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.
Article Google Scholar
Swaminathan, H., Hambleton, R. K., & Rogers, H. J. (2007). Assessing the fit of item response theory models. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics: Psychometrics (pp. 683–718). London: Elsevier Publishing Co.
Google Scholar
Hambleton, R. K., & Han, N. (2005). Assessing the fit of IRT models to educational and psychological test data: A five step plan and several graphical displays. In W. R. Lenderking & D. Revicki (Eds.), Advances in health outcomes research methods, measurement, statistical analysis, and clinical applications (pp. 57–78). Washington: Degnon Associates.
Google Scholar
Muraki, E., & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating scale data [computer program]. Chicago, IL: Scientific Software.
Google Scholar
Liang, T., Han, K. T., & Hambleton, R. K. (2009). ResidPlots-2: Computer software for IRT graphical residual analyses. Applied Psychological Measurement, 33(5), 411–412.
Article Google Scholar
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.
Article Google Scholar
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling and linking: Methods and practices (2nd ed.). New York: Springer.
Book Google Scholar
Sukin, Tia M. (2010). Item parameter drift as an indication of differential opportunity to learn: An exploration of item flagging methods & accurate classification of examinees. Doctoral dissertation. http://scholarworks.umass.edu/open_access_dissertations/301 Accessed 13 March 2014.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502.
Article Google Scholar
Fleer, P. F. (1993). A Monte Carlo assessment of a new measure of item and test bias. Doctoral dissertation, Illinois Institute of Technology. Dissertation Abstracts International, 54, 2266.
Google Scholar
Raju, N. (2000). Notes accompanying the differential functioning of items and tests (DFIT) computer program. Chicago: Illinois Institute of Technology.
Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., et al. (2007). Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): Applications (with illustrations) to measures of physical functioning ability and general distress. Quality of Life Research, 16, 43–68.
Article PubMed Google Scholar
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197–207.
Article Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (Ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Google Scholar
Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: differential item functioning in the CASI. Statistics in Medicine, 23(2), 241–256.
Article PubMed Google Scholar
Reeve, B., Hays, R. D., Bjorner, J., et al., on behalf of the PROMIS cooperative group. (2007). Psychometric evaluation and calibration of health–related quality of life item banks: Plans for the Patient-Reported Outcome Measurement Information System (PROMIS). Medical Care, 45(5), S22–S31.
Google Scholar
Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supports the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology, 61, 17–33.
Article CAS PubMed Google Scholar
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114(3), 552–566.
Article CAS PubMed Google Scholar
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: a comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87(3), 517–529.
Article PubMed Google Scholar
Gregorich, S. E. (2006). Do self-report instruments allow meaningful comparisons across diverse population groups? Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 44(11 Suppl 3), S78–S94.
Article PubMed Central PubMed Google Scholar
Wong, C. K., Lam, C. L., & Mulhern, B. (2013). Measurement invariance of the Functional Assessment of Cancer Therapy—Colorectal quality-of-life instrument among modes of administration. Quality of Life Research, 22, 1415–1426.
Article PubMed Central PubMed Google Scholar
Bjørner, J. B., Rose, M., Gandek, B., et al. (2014). Method of administration of PROMIS scales did not significantly impact score level, reliability, or validity. Journal of Clinical Epidemiology, 67(1), 108–113.
Article PubMed Central PubMed Google Scholar
Alonso, J., Ferrer, M., Gandek, B., et al. (2004). Health-related quality of life associated with chronic conditions in eight countries: results from the International Quality of Life Assessment (IQOLA) Project. Quality of Life Research, 13, 283–298.
Article PubMed Google Scholar
Patrick, D. L., & Deyo, R. A. (1989). Generic and disease-specific measures in assessing health status and quality of life. Medical Care, 27(3 Suppl), S217–S232.
Article CAS PubMed Google Scholar
Ware, J. E, Jr, Guyer, R., Harrington, M., & Boulanger, R. (2012). Evaluation of a more comprehensive survey item bank for standardizing disease-specific impact comparisons across chronic conditions. Quality of Life Research, 21(1 Suppl), 27–28.
Google Scholar
R Core Team (2014). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. http://www.R-project.org/.
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260.
Article Google Scholar
Hidalgo-Montesinos, M. D., & Lopez-Pina, J. A. (2002). Two-stage equating in differential item functioning detection under the graded response model with the Raju area measures and the Lord statistic. Educational and Psychological Measurement, 62, 32–44.
Article Google Scholar
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 2, 113–141.
Article Google Scholar
Oshima, T. C., Raju, N. S., & Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43, 1–17.
Article Google Scholar
Bjorner, J. B., Rose, M., Gandek, B., Stone, A. A., Junghaenel, D. U., & Ware, J. E, Jr. (2013). Difference in method of administration did not significantly impact item response: An IRT-based analysis from the Patient-Reported Outcomes Measurement Information System initiative. Quality of Life Research, 23, 217–227.
Article PubMed Central PubMed Google Scholar
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R Package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and monte carlo simulations. Journal of Statistical Software, 39(8), 1–30.
PubMed Central PubMed Google Scholar

Download references

Acknowledgments

TRACE-CORE is supported by the National Institutes of Health National Heart, Lung, and Blood Institute (1U01HL105268). DICAT is supported by the National Institutes of Health National Institute of Aging (2R44AG025589). Partial salary support was provided by TRACE-CORE and PhRMA foundation Research Starter Grant (M.D.A.). Additional support was provided by NIH Grant KL2TR000160 (M.E.W.). The authors are very grateful for the editor and reviewers’ comments and personal communication with Jakob Bjørner.

Author information

Authors and Affiliations

Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, 01655, USA
Nina Deng, Milena D. Anatchkova, Molly E. Waring & John E. Ware Jr.
Measured Progress, Inc., Dover, NH, 03820, USA
Nina Deng
Evidera, Lexington, MA, 02420, USA
Milena D. Anatchkova
Graduate Management Admission Council, Reston, VA, 20195, USA
Kyung T. Han
John Ware Research Group, Incorporated, Worcester, MA, 01655, USA
John E. Ware Jr.

Authors

Nina Deng
View author publications
You can also search for this author in PubMed Google Scholar
Milena D. Anatchkova
View author publications
You can also search for this author in PubMed Google Scholar
Molly E. Waring
View author publications
You can also search for this author in PubMed Google Scholar
Kyung T. Han
View author publications
You can also search for this author in PubMed Google Scholar
John E. Ware Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nina Deng.

Appendices

Appendix 1: illustration of calculations

Item response theory indeterminacy

The logit function in an IRT model is defined by

$$Logit \, \left[ {P_{i} \left( {\theta_{j} } \right)} \right] \, = \, a_{i} (\theta_{j} {-} \, b_{i)}$$

where P _i(θ _j) is the probability of endorsing item response category i for person j, a _i and b _i are the item discrimination and threshold parameters for response category i, respectively, and θ _j is the IRT-based theta score for person j. The logit will preserve the same value if a _i, θ _j , and b_i were replaced by $a_{i}^{'}$, $\theta_{j}^{'}$, and $b_{i}^{'}$, respectively, which follow a set of linear transformations of

$$b_{i}^{'} = \, A* \, b_{i} + \, B$$

$$\theta_{j}^{'} = \, A*\theta_{j} + \, B$$

$$a_{i}^{'} = \, {{a_{i} } \mathord{\left/ {\vphantom {{a_{i} } A}} \right. \kern-0pt} A}$$

where A is the slope and B is the intercept of the linear transformations.

Slope (A) and intercept (B) of linear transformation using IRT theta scores

$$A \, = \, {{\sigma \, \left( {\theta_{\text{ST}} } \right)} \mathord{\left/ {\vphantom {{\sigma \, \left( {\theta_{\text{ST}} } \right)} {\sigma \, \left( {\theta_{\text{ACS}} } \right)}}} \right. \kern-0pt} {\sigma \, \left( {\theta_{\text{ACS}} } \right)}}$$

$$B \, = \, \mu \left( {\theta_{\text{ST}} } \right){-}A*\mu \left( {\theta_{\text{ACS}} } \right)$$

where σ (θ _ST) and σ (θ _ACS) are the SD of the IRT scores of ACS patients using the standardized (θ _ST) and the ACS-specific (θ _ACS) IRT item parameters, respectively. μ (θ _ST) and μ (θ _ACS) are the means of the IRT scores.

Non-compensatory differential item functioning (NCDIF)

$${\text{NCDIF}}_{i} = \frac{{\mathop \sum \nolimits_{j = 1}^{N} \left( {S_{{{\text{ST}},i,j}} - S_{{{\text{ACS,}}i,j}} } \right)^{2} }}{N}$$

where S _ST,i,j is the IRT model-expected item response score of item i for respondent j using the standardized IRT item parameters. S _ACS,i,j is the IRT model-expected item response score of item i for respondent j using the ACS-LT IRT item parameters. N is the sample size of ACS patients.

Test differential functioning (DTF)

$${\text{DTF}} = \frac{{\mathop \sum \nolimits_{j = 1}^{N} \left( {{\text{TS}}_{{{\text{ST}},j}} - {\text{TS}}_{{{\text{ACS}},j}} } \right)^{2} }}{N}$$

where TS_ST,j is the IRT model-expected scale(test) score for respondent j using the standardized IRT item parameters. TS_ACS,j is the IRT model-expected scale(test) score for respondent j using the ACS-LT IRT item parameters. N is the sample size of ACS patients.

Compensatory differential item functioning (CDIF)

$$d_{i} (j) = S_{{{\text{ST}},i,j }} {-}S_{{{\text{ACS}},i,j}}$$

$$D(j) = {\text{TS}}_{{{\text{ST}},j}} {-}{\text{TS}}_{{{\text{ACS}},j}}$$

$${\text{CDIF}}_{i} = {\text{ COV}}\left( {d_{i} , \, D} \right) + \, \mu \, \left( {d_{i} } \right)*\mu \left( D \right)$$

where d _i is the difference in the IRT model-expected item response score of item i between the standardized and the ACS-LT IRT item parameters. D is the difference in IRT model-expected scale score between the standardized and the ACS-LT IRT item parameters. COV(d _i, D) is the covariance of d _i and D. μ (d _i) and μ (D) are their means, respectively. Of note that DTF is equivalent to the sum of CDIF_i added across all the items: ${\text{DTF}} = \mathop \sum \nolimits_{i = 1}^{I} {\text{CDIF}}_{i}$.

Appendix 2 Raw residual plots of fitting the IRT model in the ACS patients for the QDIS-7 (ordered by row from Item 1 to Item 7)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, N., Anatchkova, M.D., Waring, M.E. et al. Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS^®) in acute coronary syndrome patients: differential functioning of items and test. Qual Life Res 24, 1809–1822 (2015). https://doi.org/10.1007/s11136-015-0916-8

Download citation