Skip to main content
Log in

A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile

  • Published:
Quality of Life Research Aims and scope Submit manuscript

Abstract

Purpose

In the Patient-Reported Outcomes Measurement Information System (PROMIS), seven domains (Physical Function, Anxiety, Depression, Fatigue, Sleep Disturbance, Social Function, and Pain Interference) are packaged together as profiles. Each of these domains can also be assessed using computer adaptive tests (CATs) or short forms (SFs) of varying length (e.g., 4, 6, and 8 items). We compared the accuracy and number of items administrated of CAT versus each SF.

Methods

PROMIS instruments are scored using item response theory (IRT) with graded response model and reported as T scores (mean = 50, SD = 10). We simulated 10,000 subjects from the normal distribution with mean 60 for symptom scales and 40 for function scales, and standard deviation 10 in each domain. We considered a subject’s score to be accurate when the standard error (SE) was less than 3.0. We recorded range of accurate scores (accurate range) and the number of items administrated.

Results

The average number of items administrated in CAT was 4.7 across all domains. The accurate range was wider for CAT compared to all SFs in each domain. CAT was notably better at extending the accurate range into very poor health for Fatigue, Physical Function, and Pain Interference. Most SFs provided reasonably wide accurate range.

Conclusions

Relative to SFs, CATs provided the widest accurate range, with slightly more items than SF4 and less than SF6 and SF8. Most SFs, especially longer ones, provided reasonably wide accurate range.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. IRT software computes in Z-scale (mean 0 and standard deviation 1) and converts the final results to T-scale. The prior is in Z-scale which corresponds to the normal prior with mean 50 and standard deviation 10 in T-scale.

  2. Although the SE curves are available analytically as the inverse of the test information functions, we did not use the analytic curves because they are available only for short forms and not CATs. Further, the regression SE curves include the floors and ceilings but the analytic curves do not.

References

  1. Ahmed, S., Berzon, R. A., Revicki, D. A., et al. (2012). The use of patient-reported outcomes (PRO) within comparative effectiveness research: Implications for clinical practice and health care policy. Medical Care,50(12), 1060–1070.

    PubMed  Google Scholar 

  2. Cella, D., Riley, W., Stone, A., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology,63(11), 1179–1194.

    PubMed  PubMed Central  Google Scholar 

  3. Cella, D., Yount, S., Rothrock, N., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care,45(5 Suppl 1), S3–S11.

    PubMed  PubMed Central  Google Scholar 

  4. Gershon, R., Lai, J., Bode, R., et al. (2012). Neuro-QOL: Quality of life item banks for adults with neurological disorders: Item development and calibrations based upon clinical and general population testing. Quality of Life Research,21(3), 475–486.

    PubMed  Google Scholar 

  5. Gershon, R. C., Bleck, T. P., & Nowinski, C. J. (2013). NIH toolbox for assessment of neurological and behavioral function. Neurology,80(11 Supplement 3), S2–S6.

    PubMed  PubMed Central  Google Scholar 

  6. Choi, S., Reise, S., Pilkonis, P., Hays, R., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research,19(1), 125–136.

    PubMed  Google Scholar 

  7. Flynn, K., Dew, M., Lin, L., et al. (2015). Reliability and construct validity of PROMIS® measures for patients with heart failure who undergo heart transplant. Quality of Life Research,24(11), 2591–2599.

    PubMed  PubMed Central  Google Scholar 

  8. Northwestern University. HealthMeasures. (2018). http://www.healthmeasures.net/index.php. Accessed October 5, 2019.

  9. Lai, J. S., Cella, D., Choi, S. W., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS Fatigue item bank example. Archives of Physical Medicine and Rehabilitation,92(10 Supplement), S20–S27.

    PubMed  PubMed Central  Google Scholar 

  10. Amtmann, D., Cook, K. F., Jensen, M. P., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain,150(1), 173–182.

    PubMed  PubMed Central  Google Scholar 

  11. Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E., Jr. (2014). The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology,67(5), 516–526.

    PubMed  PubMed Central  Google Scholar 

  12. Choi, S. W., Victorson, D. E., Yount, S., Anton, S., & Cella, D. (2011). Development of a conceptual framework and calibrated item banks to measure patient-reported dyspnea severity and related functional limitations. Value Health.,14(2), 291–306.

    PubMed  Google Scholar 

  13. Hahn, E. A., DeWalt, D. A., Bode, R. K., et al. (2014). New english and spanish social health measures will facilitate evaluating health determinants. Health Psychology,33(5), 490–499.

    PubMed  PubMed Central  Google Scholar 

  14. Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, anxiety, and anger. Assessment,18(3), 263–283.

    PubMed  PubMed Central  Google Scholar 

  15. Cella D, Choi S, Schalet B, et al. (2018). PROMIS® Health Profiles: Efficient short-form measures of seven health domains. Value Health. Submitted.

  16. Cella, D., Gershon, R., Lai, J.-S., & Choi, S. (2007). The future of outcomes measurement: Item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research,16(Suppl 1), 133–141.

    PubMed  Google Scholar 

  17. Cook, K. F., O’Malley, K. J., & Roddey, T. S. (2005). Dynamic assessment of health outcomes: Time to let the CAT out of the bag? Health Services Research,40(5 Pt 2), 1694–1711.

    PubMed  PubMed Central  Google Scholar 

  18. Ware, J. E., Kosinski, M., & Dewey, J. E. (2000). How to score version 2 of the SF-36 health survey. Lincoln: QualityMetric.

    Google Scholar 

  19. Bjorner, J. B., Chang, C.-H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research,16(Suppl1), 95–108.

    PubMed  Google Scholar 

  20. Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C. H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research,16(Suppl 1), 109–119.

    PubMed  Google Scholar 

  21. Reeve, B. B., Hays, R. D., Bjorner, J. B., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care,45(5 Suppl 1), S22–S31.

    PubMed  Google Scholar 

  22. Fayers, P. M. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research,16(Suppl 1), 187–194.

    PubMed  Google Scholar 

  23. Gardner, W., Shear, K., Kelleher, K. J., et al. (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry.,4(1), 13.

    PubMed  PubMed Central  Google Scholar 

  24. Fliege, H., Becker, J., Walter, O. B., Bjorner, J. B., Klapp, B. F., & Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research,14(10), 2277–2291.

    PubMed  Google Scholar 

  25. Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2012). Development of a computerized adaptive test for depression. Archives of General Psychiatry,69(11), 1104–1112.

    PubMed  PubMed Central  Google Scholar 

  26. Gibbons, R. D., Weiss, D. J., Kupfer, D. J., et al. (2008). Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services (Washington, D. C.),59(4), 361–368.

    Google Scholar 

  27. Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2014). Development of the CAT-ANX: A computerized adaptive test for anxiety. American Journal of Psychiatry,171(2), 187–194.

    PubMed  Google Scholar 

  28. Eisen, S. V., Schultz, M. R., Ni, P., et al. (2016). Development and validation of a computerized-adaptive test for PTSD (P-CAT). Psychiatric Services (Washington, D. C.),67(10), 1116–1123.

    Google Scholar 

  29. Holman, R., Weisscher, N., Glas, C. A. W., et al. (2005). The academic medical center linear disability score (ALDS) item bank: Item response theory analysis in a mixed patient population. Health Qual Life Outcomes.,3, 83.

    PubMed  PubMed Central  Google Scholar 

  30. Holman, R., Lindeboom, R., Vermeulen, M., & de Haan, R. J. (2004). The AMC linear disability score project in a population requiring residential care: Psychometric properties. Health Qual Life Outcomes.,2, 42.

    PubMed  PubMed Central  Google Scholar 

  31. Dumas, H., Fragala-Pinkham, M., Haley, S., et al. (2010). Item bank development for a revised pediatric evaluation of disability inventory (PEDI). Phys Occup Ther Pediatr.,30(3), 168–184.

    PubMed  PubMed Central  Google Scholar 

  32. Chakravarty, E. F., Bjorner, J. B., & Fries, J. F. (2007). Improving patient reported outcomes using item response theory and computerized adaptive testing. Journal of Rheumatology,34(6), 1426–1431.

    PubMed  Google Scholar 

  33. Gibbons, R. D., Kupfer, D., Frank, E., Moore, T., Beiser, D. G., & Boudreaux, E. D. (2017). Development of a computerized adaptive test suicide scale-the CAT-SS. Journal of Clinical Psychiatry,78(9), 1376–1382.

    PubMed  Google Scholar 

  34. Tulsky, D. S., Kisala, P. A., Victorson, D., et al. (2015). Overview of the spinal cord injury—quality of life (SCI-QOL) measurement system. Journal of Spinal Cord Medicine,38(3), 257–269.

    PubMed  Google Scholar 

  35. Petersen, M. A., Aaronson, N. K., Arraras, J. I., et al. (2018). The EORTC CAT Core—the computer adaptive version of the EORTC QLQ-C30 questionnaire. European Journal of Cancer,100, 8–16.

    PubMed  Google Scholar 

  36. Petersen, M. A., Gamper, E.-M., Costantini, A., et al. (2016). An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established. Journal of Clinical Epidemiology,70, 90–100.

    PubMed  Google Scholar 

  37. Dirven, L., Groenvold, M., Taphoorn, M. J. B., et al. (2017). Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients. Quality of Life Research,26(11), 2919–2929.

    PubMed  PubMed Central  Google Scholar 

  38. Cella, D., Lai, J. S., Nowinski, C., et al. (2012). Neuro-QOL: Brief measures of health-related quality of life for clinical research in neurology. Neurology,78, 1860–1867.

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Kisala, P. A., Tulsky, D. S., Kalpakjian, C. Z., et al. (2015). Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7. Journal of Spinal Cord Medicine,38(3), 315–325.

    PubMed  Google Scholar 

  40. Kisala, P. A., Victorson, D., Pace, N., Heinemann, A. W., Choi, S. W., & Tulsky, D. S. (2015). Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form. Journal of Spinal Cord Medicine,38(3), 326–334.

    PubMed  Google Scholar 

  41. Tulsky, D. S., Kisala, P. A., Kalpakjian, C. Z., et al. (2015). Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9. Journal of Spinal Cord Medicine,38(3), 335–346.

    PubMed  Google Scholar 

  42. Lai, J.-S., Cella, D., Yanez, B., & Stone, A. (2014). Linking Fatigue measures on a common reporting metric. Journal of Pain and Symptom Management,48(4), 639–648.

    PubMed  PubMed Central  Google Scholar 

  43. Varni, J. W., Magnus, B., Stucky, B. D., et al. (2014). Psychometric properties of the PROMIS (R) pediatric scales: Precision, stability, and comparison of different scoring and administration options. Quality of Life Research,23(4), 1233–1243.

    PubMed  Google Scholar 

  44. Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology,61(1), 17–33.

    CAS  PubMed  Google Scholar 

  45. Buysse, D. J., Moul, D. E., Germain, A., et al. (2010). Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments. Sleep,33(6), 781–792.

    PubMed  PubMed Central  Google Scholar 

  46. Hahn, E. A., Devellis, R. F., Bode, R. K., et al. (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of Life Research,19(7), 1035–1044.

    PubMed  PubMed Central  Google Scholar 

  47. Liu, H., Cella, D., Gershon, R., et al. (2010). Representativeness of the patient-reported outcomes measurement information system internet panel. Journal of Clinical Epidemiology,63(11), 1169–1178.

    PubMed  PubMed Central  Google Scholar 

  48. Hansen, M., Cai, L., Stucky, B. D., Tucker, J. S., Shadel, W. G., & Edelen, M. O. (2014). Methodology for developing and evaluating the PROMIS® smoking item banks. Nicotine and Tobacco Research,16(Suppl 3), S175–S189.

    CAS  PubMed  Google Scholar 

  49. Yu, L., Buysse, D. J., Germain, A., et al. (2011). Development of short forms from the PROMIS sleep disturbance and sleep-related impairment item banks. Behavioral Sleep Medicine,10(1), 6–24.

    PubMed  PubMed Central  Google Scholar 

  50. Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17. Richmond, VA: Psychometric Society. https://link.springer.com/article/10.1007%2FBF03372160. Accessed October 5, 2019.

  51. De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Publications.

    Google Scholar 

  52. Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied Psychological Measurement,33(6), 419–440.

    PubMed  PubMed Central  Google Scholar 

  53. Cella, D., Choi, S., Garcia, S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research,23(10), 2651–2661.

    PubMed  PubMed Central  Google Scholar 

  54. Cook, K. F., Schalet, B. D., Kallen, M., Rutsohn, J. P., & Cella, D. (2015). Establishing a common metric for self-reported pain: Linking BPI pain interference and SF-36 bodily pain subscale scores to the PROMIS pain interference metric. Quality of Life Research,24(10), 2305–2318.

    PubMed  PubMed Central  Google Scholar 

  55. R: A language and environment for statistical computing [computer program]. Vienna, Austria: R Foundation for Statistical Computing; 2018.

  56. Paap, M. C. S., Born, S., & Braeken, J. (2019). Measurement efficiency for fixed-precision multidimensional computerized adaptive tests: Comparing health measurement and educational testing using example banks. Applied Psychological Measurement,43(1), 68–83.

    PubMed  Google Scholar 

Download references

Funding

This study was funded by National Institutes of Health (U2CCA186878, Recipient David Cella).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eisuke Segawa.

Ethics declarations

Conflict of interest

Dr. Cella is an unpaid board member of the PROMIS Health Organization (PHO). He declares no other conflict of interest. Eisuke Segawa declares that he has no conflict of interest. Benjamin David Schalet declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Procedure and Fee to use PROMIS profile instruments

Details of procedures to use PROMIS profile instruments are found at www.HealthMeasures.net. For its cost in particular, an official answer to the question of “Can I access PROMIS measures for free?” (http://www.healthmeasures.net/resource-center/user-community/forum/promis/128-can-i-access-promis-measures-for-free#336) is found below.

Through www.HealthMeasures.net, you gain free access to hundreds of self- and proxy-report measures from the four measurement systems (PROMIS, NIH Toolbox, Neuro-QoL, and ASCQ-Me), along with information to help you select, administer, score, and interpret measures. We encourage you to go to the Search and View Measures to get more information about individual measures. Fees are associated with the utilization of Assessment Center and other assessment delivery services carry fees associated with maintaining and updating these technologies. HealthMeasures consultation, training, custom software development, and translation services, with pricing available by quote. Please go to the HealthMeasures Pricing page (http://www.healthmeasures.net/resource-center/data-collection-tools/pricing-for-tools) for more information. All services will be performed under a cost-recovery business model with no profit motivation. For more information, contact help@healthmeasures.net.

Item selection for short forms

The selection of items was initially based on two psychometric criteria: (1) maximum interval information and (2) CAT simulations. These two criteria resulted in similar item rankings. For the maximum interval criterion, each item information function was integrated (without weighting) for the interval from 50 to 70 (for symptom banks) or 30 to 50 (for function banks). For the CAT simulations, responses to all items in each bank were generated using a random sample of 1000 simulees drawn separately for each bank (centered on 0.5 SD lower [or higher] than the general population mean). Items were rank ordered based on their average administration rank frequency over the simulees. Content experts from each of the seven domains reviewed the items and rankings and selected 4, 6, and 8 items, considering not only rank, but also content coverage and theta (severity) range.

Specifications for the regression of SEs on scores

This section describes the specifications of the regression of SEs on scores. We use a local regression implemented in the loess function in R. The local regression fits each point using its neighborhood points. The smaller the neighborhood is, the closer the curve to the points. We specify a small neighborhood (span = 0.2) so that the minimum and maximum accurate scores are close to the simulated scores. We use a large number of simulees (10,000) whose smallest latent value is approximately equal to the ceiling and whose latent values increase in equal increments to the value approximately equal to the ceiling. We use the above specifications because the large number of simulees minimizes the sampling variation of the SE curve, and the minimum and maximum covering the entire score range allow us to avoid having an excessive number of scores either at the floor or ceiling. Finally, in order to save computational time of loess due to the large number of simulees, the data are organized as weighted data which consist of distinct scores and their frequencies. Because the number of the distinct scores are less than 1/10th of the number of simulees in the original data, the reduction in computational time is significant.

Table 3 Percentages of numbers of CAT item answered (from 2 to 12) and weighted average of numbers of CAT item answered for each of the seven domains

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Segawa, E., Schalet, B. & Cella, D. A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile. Qual Life Res 29, 213–221 (2020). https://doi.org/10.1007/s11136-019-02312-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11136-019-02312-8

Keywords

Navigation