Abstract
Purpose
In the Patient-Reported Outcomes Measurement Information System (PROMIS), seven domains (Physical Function, Anxiety, Depression, Fatigue, Sleep Disturbance, Social Function, and Pain Interference) are packaged together as profiles. Each of these domains can also be assessed using computer adaptive tests (CATs) or short forms (SFs) of varying length (e.g., 4, 6, and 8 items). We compared the accuracy and number of items administrated of CAT versus each SF.
Methods
PROMIS instruments are scored using item response theory (IRT) with graded response model and reported as T scores (mean = 50, SD = 10). We simulated 10,000 subjects from the normal distribution with mean 60 for symptom scales and 40 for function scales, and standard deviation 10 in each domain. We considered a subject’s score to be accurate when the standard error (SE) was less than 3.0. We recorded range of accurate scores (accurate range) and the number of items administrated.
Results
The average number of items administrated in CAT was 4.7 across all domains. The accurate range was wider for CAT compared to all SFs in each domain. CAT was notably better at extending the accurate range into very poor health for Fatigue, Physical Function, and Pain Interference. Most SFs provided reasonably wide accurate range.
Conclusions
Relative to SFs, CATs provided the widest accurate range, with slightly more items than SF4 and less than SF6 and SF8. Most SFs, especially longer ones, provided reasonably wide accurate range.
Similar content being viewed by others
Notes
IRT software computes in Z-scale (mean 0 and standard deviation 1) and converts the final results to T-scale. The prior is in Z-scale which corresponds to the normal prior with mean 50 and standard deviation 10 in T-scale.
Although the SE curves are available analytically as the inverse of the test information functions, we did not use the analytic curves because they are available only for short forms and not CATs. Further, the regression SE curves include the floors and ceilings but the analytic curves do not.
References
Ahmed, S., Berzon, R. A., Revicki, D. A., et al. (2012). The use of patient-reported outcomes (PRO) within comparative effectiveness research: Implications for clinical practice and health care policy. Medical Care,50(12), 1060–1070.
Cella, D., Riley, W., Stone, A., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology,63(11), 1179–1194.
Cella, D., Yount, S., Rothrock, N., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care,45(5 Suppl 1), S3–S11.
Gershon, R., Lai, J., Bode, R., et al. (2012). Neuro-QOL: Quality of life item banks for adults with neurological disorders: Item development and calibrations based upon clinical and general population testing. Quality of Life Research,21(3), 475–486.
Gershon, R. C., Bleck, T. P., & Nowinski, C. J. (2013). NIH toolbox for assessment of neurological and behavioral function. Neurology,80(11 Supplement 3), S2–S6.
Choi, S., Reise, S., Pilkonis, P., Hays, R., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research,19(1), 125–136.
Flynn, K., Dew, M., Lin, L., et al. (2015). Reliability and construct validity of PROMIS® measures for patients with heart failure who undergo heart transplant. Quality of Life Research,24(11), 2591–2599.
Northwestern University. HealthMeasures. (2018). http://www.healthmeasures.net/index.php. Accessed October 5, 2019.
Lai, J. S., Cella, D., Choi, S. W., et al. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A PROMIS Fatigue item bank example. Archives of Physical Medicine and Rehabilitation,92(10 Supplement), S20–S27.
Amtmann, D., Cook, K. F., Jensen, M. P., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain,150(1), 173–182.
Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E., Jr. (2014). The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology,67(5), 516–526.
Choi, S. W., Victorson, D. E., Yount, S., Anton, S., & Cella, D. (2011). Development of a conceptual framework and calibrated item banks to measure patient-reported dyspnea severity and related functional limitations. Value Health.,14(2), 291–306.
Hahn, E. A., DeWalt, D. A., Bode, R. K., et al. (2014). New english and spanish social health measures will facilitate evaluating health determinants. Health Psychology,33(5), 490–499.
Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, anxiety, and anger. Assessment,18(3), 263–283.
Cella D, Choi S, Schalet B, et al. (2018). PROMIS® Health Profiles: Efficient short-form measures of seven health domains. Value Health. Submitted.
Cella, D., Gershon, R., Lai, J.-S., & Choi, S. (2007). The future of outcomes measurement: Item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research,16(Suppl 1), 133–141.
Cook, K. F., O’Malley, K. J., & Roddey, T. S. (2005). Dynamic assessment of health outcomes: Time to let the CAT out of the bag? Health Services Research,40(5 Pt 2), 1694–1711.
Ware, J. E., Kosinski, M., & Dewey, J. E. (2000). How to score version 2 of the SF-36 health survey. Lincoln: QualityMetric.
Bjorner, J. B., Chang, C.-H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: Item banking and computerized adaptive assessment. Quality of Life Research,16(Suppl1), 95–108.
Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C. H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research,16(Suppl 1), 109–119.
Reeve, B. B., Hays, R. D., Bjorner, J. B., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care,45(5 Suppl 1), S22–S31.
Fayers, P. M. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research,16(Suppl 1), 187–194.
Gardner, W., Shear, K., Kelleher, K. J., et al. (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry.,4(1), 13.
Fliege, H., Becker, J., Walter, O. B., Bjorner, J. B., Klapp, B. F., & Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research,14(10), 2277–2291.
Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2012). Development of a computerized adaptive test for depression. Archives of General Psychiatry,69(11), 1104–1112.
Gibbons, R. D., Weiss, D. J., Kupfer, D. J., et al. (2008). Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services (Washington, D. C.),59(4), 361–368.
Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., et al. (2014). Development of the CAT-ANX: A computerized adaptive test for anxiety. American Journal of Psychiatry,171(2), 187–194.
Eisen, S. V., Schultz, M. R., Ni, P., et al. (2016). Development and validation of a computerized-adaptive test for PTSD (P-CAT). Psychiatric Services (Washington, D. C.),67(10), 1116–1123.
Holman, R., Weisscher, N., Glas, C. A. W., et al. (2005). The academic medical center linear disability score (ALDS) item bank: Item response theory analysis in a mixed patient population. Health Qual Life Outcomes.,3, 83.
Holman, R., Lindeboom, R., Vermeulen, M., & de Haan, R. J. (2004). The AMC linear disability score project in a population requiring residential care: Psychometric properties. Health Qual Life Outcomes.,2, 42.
Dumas, H., Fragala-Pinkham, M., Haley, S., et al. (2010). Item bank development for a revised pediatric evaluation of disability inventory (PEDI). Phys Occup Ther Pediatr.,30(3), 168–184.
Chakravarty, E. F., Bjorner, J. B., & Fries, J. F. (2007). Improving patient reported outcomes using item response theory and computerized adaptive testing. Journal of Rheumatology,34(6), 1426–1431.
Gibbons, R. D., Kupfer, D., Frank, E., Moore, T., Beiser, D. G., & Boudreaux, E. D. (2017). Development of a computerized adaptive test suicide scale-the CAT-SS. Journal of Clinical Psychiatry,78(9), 1376–1382.
Tulsky, D. S., Kisala, P. A., Victorson, D., et al. (2015). Overview of the spinal cord injury—quality of life (SCI-QOL) measurement system. Journal of Spinal Cord Medicine,38(3), 257–269.
Petersen, M. A., Aaronson, N. K., Arraras, J. I., et al. (2018). The EORTC CAT Core—the computer adaptive version of the EORTC QLQ-C30 questionnaire. European Journal of Cancer,100, 8–16.
Petersen, M. A., Gamper, E.-M., Costantini, A., et al. (2016). An emotional functioning item bank of 24 items for computerized adaptive testing (CAT) was established. Journal of Clinical Epidemiology,70, 90–100.
Dirven, L., Groenvold, M., Taphoorn, M. J. B., et al. (2017). Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients. Quality of Life Research,26(11), 2919–2929.
Cella, D., Lai, J. S., Nowinski, C., et al. (2012). Neuro-QOL: Brief measures of health-related quality of life for clinical research in neurology. Neurology,78, 1860–1867.
Kisala, P. A., Tulsky, D. S., Kalpakjian, C. Z., et al. (2015). Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7. Journal of Spinal Cord Medicine,38(3), 315–325.
Kisala, P. A., Victorson, D., Pace, N., Heinemann, A. W., Choi, S. W., & Tulsky, D. S. (2015). Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form. Journal of Spinal Cord Medicine,38(3), 326–334.
Tulsky, D. S., Kisala, P. A., Kalpakjian, C. Z., et al. (2015). Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9. Journal of Spinal Cord Medicine,38(3), 335–346.
Lai, J.-S., Cella, D., Yanez, B., & Stone, A. (2014). Linking Fatigue measures on a common reporting metric. Journal of Pain and Symptom Management,48(4), 639–648.
Varni, J. W., Magnus, B., Stucky, B. D., et al. (2014). Psychometric properties of the PROMIS (R) pediatric scales: Precision, stability, and comparison of different scoring and administration options. Quality of Life Research,23(4), 1233–1243.
Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology,61(1), 17–33.
Buysse, D. J., Moul, D. E., Germain, A., et al. (2010). Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments. Sleep,33(6), 781–792.
Hahn, E. A., Devellis, R. F., Bode, R. K., et al. (2010). Measuring social health in the patient-reported outcomes measurement information system (PROMIS): Item bank development and testing. Quality of Life Research,19(7), 1035–1044.
Liu, H., Cella, D., Gershon, R., et al. (2010). Representativeness of the patient-reported outcomes measurement information system internet panel. Journal of Clinical Epidemiology,63(11), 1169–1178.
Hansen, M., Cai, L., Stucky, B. D., Tucker, J. S., Shadel, W. G., & Edelen, M. O. (2014). Methodology for developing and evaluating the PROMIS® smoking item banks. Nicotine and Tobacco Research,16(Suppl 3), S175–S189.
Yu, L., Buysse, D. J., Germain, A., et al. (2011). Development of short forms from the PROMIS sleep disturbance and sleep-related impairment item banks. Behavioral Sleep Medicine,10(1), 6–24.
Samejima F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17. Richmond, VA: Psychometric Society. https://link.springer.com/article/10.1007%2FBF03372160. Accessed October 5, 2019.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Publications.
Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied Psychological Measurement,33(6), 419–440.
Cella, D., Choi, S., Garcia, S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research,23(10), 2651–2661.
Cook, K. F., Schalet, B. D., Kallen, M., Rutsohn, J. P., & Cella, D. (2015). Establishing a common metric for self-reported pain: Linking BPI pain interference and SF-36 bodily pain subscale scores to the PROMIS pain interference metric. Quality of Life Research,24(10), 2305–2318.
R: A language and environment for statistical computing [computer program]. Vienna, Austria: R Foundation for Statistical Computing; 2018.
Paap, M. C. S., Born, S., & Braeken, J. (2019). Measurement efficiency for fixed-precision multidimensional computerized adaptive tests: Comparing health measurement and educational testing using example banks. Applied Psychological Measurement,43(1), 68–83.
Funding
This study was funded by National Institutes of Health (U2CCA186878, Recipient David Cella).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Dr. Cella is an unpaid board member of the PROMIS Health Organization (PHO). He declares no other conflict of interest. Eisuke Segawa declares that he has no conflict of interest. Benjamin David Schalet declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Procedure and Fee to use PROMIS profile instruments
Details of procedures to use PROMIS profile instruments are found at www.HealthMeasures.net. For its cost in particular, an official answer to the question of “Can I access PROMIS measures for free?” (http://www.healthmeasures.net/resource-center/user-community/forum/promis/128-can-i-access-promis-measures-for-free#336) is found below.
Through www.HealthMeasures.net, you gain free access to hundreds of self- and proxy-report measures from the four measurement systems (PROMIS, NIH Toolbox, Neuro-QoL, and ASCQ-Me), along with information to help you select, administer, score, and interpret measures. We encourage you to go to the Search and View Measures to get more information about individual measures. Fees are associated with the utilization of Assessment Center and other assessment delivery services carry fees associated with maintaining and updating these technologies. HealthMeasures consultation, training, custom software development, and translation services, with pricing available by quote. Please go to the HealthMeasures Pricing page (http://www.healthmeasures.net/resource-center/data-collection-tools/pricing-for-tools) for more information. All services will be performed under a cost-recovery business model with no profit motivation. For more information, contact help@healthmeasures.net.
Item selection for short forms
The selection of items was initially based on two psychometric criteria: (1) maximum interval information and (2) CAT simulations. These two criteria resulted in similar item rankings. For the maximum interval criterion, each item information function was integrated (without weighting) for the interval from 50 to 70 (for symptom banks) or 30 to 50 (for function banks). For the CAT simulations, responses to all items in each bank were generated using a random sample of 1000 simulees drawn separately for each bank (centered on 0.5 SD lower [or higher] than the general population mean). Items were rank ordered based on their average administration rank frequency over the simulees. Content experts from each of the seven domains reviewed the items and rankings and selected 4, 6, and 8 items, considering not only rank, but also content coverage and theta (severity) range.
Specifications for the regression of SEs on scores
This section describes the specifications of the regression of SEs on scores. We use a local regression implemented in the loess function in R. The local regression fits each point using its neighborhood points. The smaller the neighborhood is, the closer the curve to the points. We specify a small neighborhood (span = 0.2) so that the minimum and maximum accurate scores are close to the simulated scores. We use a large number of simulees (10,000) whose smallest latent value is approximately equal to the ceiling and whose latent values increase in equal increments to the value approximately equal to the ceiling. We use the above specifications because the large number of simulees minimizes the sampling variation of the SE curve, and the minimum and maximum covering the entire score range allow us to avoid having an excessive number of scores either at the floor or ceiling. Finally, in order to save computational time of loess due to the large number of simulees, the data are organized as weighted data which consist of distinct scores and their frequencies. Because the number of the distinct scores are less than 1/10th of the number of simulees in the original data, the reduction in computational time is significant.
Rights and permissions
About this article
Cite this article
Segawa, E., Schalet, B. & Cella, D. A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile. Qual Life Res 29, 213–221 (2020). https://doi.org/10.1007/s11136-019-02312-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-019-02312-8