A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile

Segawa, Eisuke; Schalet, Benjamin; Cella, David

doi:10.1007/s11136-019-02312-8

A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile

Published: 08 October 2019

Volume 29, pages 213–221, (2020)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Eisuke Segawa¹,
Benjamin Schalet² &
David Cella²

1370 Accesses
58 Citations
3 Altmetric
Explore all metrics

Abstract

Purpose

In the Patient-Reported Outcomes Measurement Information System (PROMIS), seven domains (Physical Function, Anxiety, Depression, Fatigue, Sleep Disturbance, Social Function, and Pain Interference) are packaged together as profiles. Each of these domains can also be assessed using computer adaptive tests (CATs) or short forms (SFs) of varying length (e.g., 4, 6, and 8 items). We compared the accuracy and number of items administrated of CAT versus each SF.

Methods

PROMIS instruments are scored using item response theory (IRT) with graded response model and reported as T scores (mean = 50, SD = 10). We simulated 10,000 subjects from the normal distribution with mean 60 for symptom scales and 40 for function scales, and standard deviation 10 in each domain. We considered a subject’s score to be accurate when the standard error (SE) was less than 3.0. We recorded range of accurate scores (accurate range) and the number of items administrated.

Results

The average number of items administrated in CAT was 4.7 across all domains. The accurate range was wider for CAT compared to all SFs in each domain. CAT was notably better at extending the accurate range into very poor health for Fatigue, Physical Function, and Pain Interference. Most SFs provided reasonably wide accurate range.

Conclusions

Relative to SFs, CATs provided the widest accurate range, with slightly more items than SF4 and less than SF6 and SF8. Most SFs, especially longer ones, provided reasonably wide accurate range.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Health, Health-Related Quality of Life, and Quality of Life: What is the Difference?

Article 18 February 2016

A systematic review of quality of life research in medicine and health sciences

Article Open access 11 June 2019

Differential item functioning of the PROMIS physical function, pain interference, and pain behavior item banks across patients with different musculoskeletal disorders and persons from the general population

Article 02 January 2019

Notes

IRT software computes in Z-scale (mean 0 and standard deviation 1) and converts the final results to T-scale. The prior is in Z-scale which corresponds to the normal prior with mean 50 and standard deviation 10 in T-scale.
Although the SE curves are available analytically as the inverse of the test information functions, we did not use the analytic curves because they are available only for short forms and not CATs. Further, the regression SE curves include the floors and ceilings but the analytic curves do not.

References

Ahmed, S., Berzon, R. A., Revicki, D. A., et al. (2012). The use of patient-reported outcomes (PRO) within comparative effectiveness research: Implications for clinical practice and health care policy. Medical Care,50(12), 1060–1070.
PubMed Google Scholar
Cella, D., Riley, W., Stone, A., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology,63(11), 1179–1194.
PubMed PubMed Central Google Scholar
Cella, D., Yount, S., Rothrock, N., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care,45(5 Suppl 1), S3–S11.
PubMed PubMed Central Google Scholar
Gershon, R., Lai, J., Bode, R., et al. (2012). Neuro-QOL: Quality of life item banks for adults with neurological disorders: Item development and calibrations based upon clinical and general population testing. Quality of Life Research,21(3), 475–486.
PubMed Google Scholar
Gershon, R. C., Bleck, T. P., & Nowinski, C. J. (2013). NIH toolbox for assessment of neurological and behavioral function. Neurology,80(11 Supplement 3), S2–S6.
PubMed PubMed Central Google Scholar
Choi, S., Reise, S., Pilkonis, P., Hays, R., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research,19(1), 125–136.
PubMed Google Scholar
Flynn, K., Dew, M., Lin, L., et al. (2015). Reliability and construct validity of PROMIS^® measures for patients with heart failure who undergo heart transplant. Quality of Life Research,24(11), 2591–2599.
PubMed PubMed Central Google Scholar
Northwestern University. HealthMeasures. (2018). http://www.healthmeasures.net/index.php. Accessed October 5, 2019.
Lai, J. S., Cella, D., Choi, S. W., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS Fatigue item bank example. Archives of Physical Medicine and Rehabilitation,92(10 Supplement), S20–S27.
PubMed PubMed Central Google Scholar
Amtmann, D., Cook, K. F., Jensen, M. P., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain,150(1), 173–182.
PubMed PubMed Central Google Scholar
Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E., Jr. (2014). The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology,67(5), 516–526.
PubMed PubMed Central Google Scholar
Choi, S. W., Victorson, D. E., Yount, S., Anton, S., & Cella, D. (2011). Development of a conceptual framework and calibrated item banks to measure patient-reported dyspnea severity and related functional limitations. Value Health.,14(2), 291–306.
PubMed Google Scholar
Hahn, E. A., DeWalt, D. A., Bode, R. K., et al. (2014). New english and spanish social health measures will facilitate evaluating health determinants. Health Psychology,33(5), 490–499.
PubMed PubMed Central Google Scholar
Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, anxiety, and anger. Assessment,18(3), 263–283.
PubMed PubMed Central Google Scholar
Cella D, Choi S, Schalet B, et al. (2018). PROMIS^® Health Profiles: Efficient short-form measures of seven health domains. Value Health. Submitted.
Cella, D., Gershon, R., Lai, J.-S., & Choi, S. (2007). The future of outcomes measurement: Item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research,16(Suppl 1), 133–141.
PubMed Google Scholar
Cook, K. F., O’Malley, K. J., & Roddey, T. S. (2005). Dynamic assessment of health outcomes: Time to let the CAT out of the bag? Health Services Research,40(5 Pt 2), 1694–1711.
PubMed PubMed Central Google Scholar
Ware, J. E., Kosinski, M., & Dewey, J. E. (2000). How to score version 2 of the SF-36 health survey. Lincoln: QualityMetric.
Google Scholar
Bjorner, J. B., Chang, C.-H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research,16(Suppl1), 95–108.
PubMed Google Scholar
Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C. H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research,16(Suppl 1), 109–119.
PubMed Google Scholar
Reeve, B. B., Hays, R. D., Bjorner, J. B., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care,45(5 Suppl 1), S22–S31.
PubMed Google Scholar
Fayers, P. M. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research,16(Suppl 1), 187–194.
PubMed Google Scholar
Gardner, W., Shear, K., Kelleher, K. J., et al. (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry.,4(1), 13.
PubMed PubMed Central Google Scholar
Fliege, H., Becker, J., Walter, O. B., Bjorner, J. B., Klapp, B. F., & Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research,14(10), 2277–2291.
PubMed Google Scholar
Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2012). Development of a computerized adaptive test for depression. Archives of General Psychiatry,69(11), 1104–1112.
PubMed PubMed Central Google Scholar
Gibbons, R. D., Weiss, D. J., Kupfer, D. J., et al. (2008). Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services (Washington, D. C.),59(4), 361–368.
Google Scholar
Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2014). Development of the CAT-ANX: A computerized adaptive test for anxiety. American Journal of Psychiatry,171(2), 187–194.
PubMed Google Scholar
Eisen, S. V., Schultz, M. R., Ni, P., et al. (2016). Development and validation of a computerized-adaptive test for PTSD (P-CAT). Psychiatric Services (Washington, D. C.),67(10), 1116–1123.
Google Scholar
Holman, R., Weisscher, N., Glas, C. A. W., et al. (2005). The academic medical center linear disability score (ALDS) item bank: Item response theory analysis in a mixed patient population. Health Qual Life Outcomes.,3, 83.
PubMed PubMed Central Google Scholar
Holman, R., Lindeboom, R., Vermeulen, M., & de Haan, R. J. (2004). The AMC linear disability score project in a population requiring residential care: Psychometric properties. Health Qual Life Outcomes.,2, 42.
PubMed PubMed Central Google Scholar
Dumas, H., Fragala-Pinkham, M., Haley, S., et al. (2010). Item bank development for a revised pediatric evaluation of disability inventory (PEDI). Phys Occup Ther Pediatr.,30(3), 168–184.
PubMed PubMed Central Google Scholar
Chakravarty, E. F., Bjorner, J. B., & Fries, J. F. (2007). Improving patient reported outcomes using item response theory and computerized adaptive testing. Journal of Rheumatology,34(6), 1426–1431.
PubMed Google Scholar
Gibbons, R. D., Kupfer, D., Frank, E., Moore, T., Beiser, D. G., & Boudreaux, E. D. (2017). Development of a computerized adaptive test suicide scale-the CAT-SS. Journal of Clinical Psychiatry,78(9), 1376–1382.
PubMed Google Scholar
Tulsky, D. S., Kisala, P. A., Victorson, D., et al. (2015). Overview of the spinal cord injury—quality of life (SCI-QOL) measurement system. Journal of Spinal Cord Medicine,38(3), 257–269.
PubMed Google Scholar
Petersen, M. A., Aaronson, N. K., Arraras, J. I., et al. (2018). The EORTC CAT Core—the computer adaptive version of the EORTC QLQ-C30 questionnaire. European Journal of Cancer,100, 8–16.
PubMed Google Scholar
Petersen, M. A., Gamper, E.-M., Costantini, A., et al. (2016). An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established. Journal of Clinical Epidemiology,70, 90–100.
PubMed Google Scholar
Dirven, L., Groenvold, M., Taphoorn, M. J. B., et al. (2017). Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients. Quality of Life Research,26(11), 2919–2929.
PubMed PubMed Central Google Scholar
Cella, D., Lai, J. S., Nowinski, C., et al. (2012). Neuro-QOL: Brief measures of health-related quality of life for clinical research in neurology. Neurology,78, 1860–1867.
CAS PubMed PubMed Central Google Scholar
Kisala, P. A., Tulsky, D. S., Kalpakjian, C. Z., et al. (2015). Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7. Journal of Spinal Cord Medicine,38(3), 315–325.
PubMed Google Scholar
Kisala, P. A., Victorson, D., Pace, N., Heinemann, A. W., Choi, S. W., & Tulsky, D. S. (2015). Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form. Journal of Spinal Cord Medicine,38(3), 326–334.
PubMed Google Scholar
Tulsky, D. S., Kisala, P. A., Kalpakjian, C. Z., et al. (2015). Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9. Journal of Spinal Cord Medicine,38(3), 335–346.
PubMed Google Scholar
Lai, J.-S., Cella, D., Yanez, B., & Stone, A. (2014). Linking Fatigue measures on a common reporting metric. Journal of Pain and Symptom Management,48(4), 639–648.
PubMed PubMed Central Google Scholar
Varni, J. W., Magnus, B., Stucky, B. D., et al. (2014). Psychometric properties of the PROMIS (R) pediatric scales: Precision, stability, and comparison of different scoring and administration options. Quality of Life Research,23(4), 1233–1243.
PubMed Google Scholar
Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology,61(1), 17–33.
CAS PubMed Google Scholar
Buysse, D. J., Moul, D. E., Germain, A., et al. (2010). Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments. Sleep,33(6), 781–792.
PubMed PubMed Central Google Scholar
Hahn, E. A., Devellis, R. F., Bode, R. K., et al. (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of Life Research,19(7), 1035–1044.
PubMed PubMed Central Google Scholar
Liu, H., Cella, D., Gershon, R., et al. (2010). Representativeness of the patient-reported outcomes measurement information system internet panel. Journal of Clinical Epidemiology,63(11), 1169–1178.
PubMed PubMed Central Google Scholar
Hansen, M., Cai, L., Stucky, B. D., Tucker, J. S., Shadel, W. G., & Edelen, M. O. (2014). Methodology for developing and evaluating the PROMIS^® smoking item banks. Nicotine and Tobacco Research,16(Suppl 3), S175–S189.
CAS PubMed Google Scholar
Yu, L., Buysse, D. J., Germain, A., et al. (2011). Development of short forms from the PROMIS sleep disturbance and sleep-related impairment item banks. Behavioral Sleep Medicine,10(1), 6–24.
PubMed PubMed Central Google Scholar
Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17. Richmond, VA: Psychometric Society. https://link.springer.com/article/10.1007%2FBF03372160. Accessed October 5, 2019.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Publications.
Google Scholar
Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied Psychological Measurement,33(6), 419–440.
PubMed PubMed Central Google Scholar
Cella, D., Choi, S., Garcia, S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research,23(10), 2651–2661.
PubMed PubMed Central Google Scholar
Cook, K. F., Schalet, B. D., Kallen, M., Rutsohn, J. P., & Cella, D. (2015). Establishing a common metric for self-reported pain: Linking BPI pain interference and SF-36 bodily pain subscale scores to the PROMIS pain interference metric. Quality of Life Research,24(10), 2305–2318.
PubMed PubMed Central Google Scholar
R: A language and environment for statistical computing [computer program]. Vienna, Austria: R Foundation for Statistical Computing; 2018.
Paap, M. C. S., Born, S., & Braeken, J. (2019). Measurement efficiency for fixed-precision multidimensional computerized adaptive tests: Comparing health measurement and educational testing using example banks. Applied Psychological Measurement,43(1), 68–83.
PubMed Google Scholar

Download references

Funding

This study was funded by National Institutes of Health (U2CCA186878, Recipient David Cella).

Author information

Authors and Affiliations

SK Data, Chicago, USA
Eisuke Segawa
Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, USA
Benjamin Schalet & David Cella

Authors

Eisuke Segawa
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Schalet
View author publications
You can also search for this author in PubMed Google Scholar
David Cella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eisuke Segawa.

Ethics declarations

Conflict of interest

Dr. Cella is an unpaid board member of the PROMIS Health Organization (PHO). He declares no other conflict of interest. Eisuke Segawa declares that he has no conflict of interest. Benjamin David Schalet declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Procedure and Fee to use PROMIS profile instruments

Details of procedures to use PROMIS profile instruments are found at www.HealthMeasures.net. For its cost in particular, an official answer to the question of “Can I access PROMIS measures for free?” (http://www.healthmeasures.net/resource-center/user-community/forum/promis/128-can-i-access-promis-measures-for-free#336) is found below.

Through www.HealthMeasures.net, you gain free access to hundreds of self- and proxy-report measures from the four measurement systems (PROMIS, NIH Toolbox, Neuro-QoL, and ASCQ-Me), along with information to help you select, administer, score, and interpret measures. We encourage you to go to the Search and View Measures to get more information about individual measures. Fees are associated with the utilization of Assessment Center and other assessment delivery services carry fees associated with maintaining and updating these technologies. HealthMeasures consultation, training, custom software development, and translation services, with pricing available by quote. Please go to the HealthMeasures Pricing page (http://www.healthmeasures.net/resource-center/data-collection-tools/pricing-for-tools) for more information. All services will be performed under a cost-recovery business model with no profit motivation. For more information, contact help@healthmeasures.net.

Item selection for short forms

The selection of items was initially based on two psychometric criteria: (1) maximum interval information and (2) CAT simulations. These two criteria resulted in similar item rankings. For the maximum interval criterion, each item information function was integrated (without weighting) for the interval from 50 to 70 (for symptom banks) or 30 to 50 (for function banks). For the CAT simulations, responses to all items in each bank were generated using a random sample of 1000 simulees drawn separately for each bank (centered on 0.5 SD lower [or higher] than the general population mean). Items were rank ordered based on their average administration rank frequency over the simulees. Content experts from each of the seven domains reviewed the items and rankings and selected 4, 6, and 8 items, considering not only rank, but also content coverage and theta (severity) range.

Specifications for the regression of SEs on scores

This section describes the specifications of the regression of SEs on scores. We use a local regression implemented in the loess function in R. The local regression fits each point using its neighborhood points. The smaller the neighborhood is, the closer the curve to the points. We specify a small neighborhood (span = 0.2) so that the minimum and maximum accurate scores are close to the simulated scores. We use a large number of simulees (10,000) whose smallest latent value is approximately equal to the ceiling and whose latent values increase in equal increments to the value approximately equal to the ceiling. We use the above specifications because the large number of simulees minimizes the sampling variation of the SE curve, and the minimum and maximum covering the entire score range allow us to avoid having an excessive number of scores either at the floor or ceiling. Finally, in order to save computational time of loess due to the large number of simulees, the data are organized as weighted data which consist of distinct scores and their frequencies. Because the number of the distinct scores are less than 1/10th of the number of simulees in the original data, the reduction in computational time is significant.

Table 3 Percentages of numbers of CAT item answered (from 2 to 12) and weighted average of numbers of CAT item answered for each of the seven domains

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Segawa, E., Schalet, B. & Cella, D. A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile. Qual Life Res 29, 213–221 (2020). https://doi.org/10.1007/s11136-019-02312-8

Download citation

Accepted: 30 April 2019
Published: 08 October 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11136-019-02312-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile