Original Article
Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments

https://doi.org/10.1016/j.jclinepi.2004.12.004Get rights and content

Abstract

Background and Objective

To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (θIRT) and measures generated using the simulated CATs (θCAT).

Methods

Secondary analysis of retrospective intake rehabilitation data.

Results

Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. θIRT and θCAT measures discriminated patients by symptom acuity, age, and surgical history in similar ways. θCAT measures were as precise as θIRT measures.

Conclusion

Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity.

Introduction

Computerized adaptive testing (CAT) has transformed the process of estimating latent traits [1]. Latent traits or abilities cannot be directly observed, but can be estimated by analyzing a person's performance on a set of items [2]. For the purpose of this study of patients with lower extremity impairments, the latent trait of interest is lower extremity functional status (FS), which we operationally define as the patient's perception of their ability to perform functional tasks described in the FS items. FS is of interest because many people seek rehabilitation to improve functional deficits caused by lower extremity impairments [3].

CAT has its origins in mental [4], educational [5], and military [6] testing, but inexpensive, powerful computers have facilitated development of computerized adaptive tests (CATs) [1], [6]. CATs have recently emerged in the medical [7], [8] and rehabilitation [9], [10] fields, and development of CAT measures of function in rehabilitation has been recommended [11], [12], [13].

CATs offer advantages compared to a computer administered or paper and pencil outcomes instruments. CATs (1) administer informative items, the difficulty of which are matched to the patient's level of ability reducing the number of inappropriate items administered; (2) administer fewer items, reducing respondent burden with little reduction in precision of patient ability estimates; 3) allow the level of measure precision to be established before testing improving control of measurement error during testing; and (4) simplify test revision by allowing adding and testing new items as needed [6], [14]. CATs provide an efficient alternative to traditional paper-and-pencil or computer-administered tests, and allow outcomes data to be collected during the clinical encounter with reduced patient and scoring burden. Therefore, CAT facilitates management of a central conflict in scale development: good measurement precision with low response burden [6], [7] and is applicable to assessment of outcomes, that is, change in FS in patients receiving rehabilitation [9], [10], [15], [16]. Recent symposia in health outcomes methodology and computer-based testing have emphasized the need to improve (1) outcomes assessment for advancing the science and practice of treatment-effectiveness evaluation [17], and (2) chart a path to development of better computer-based tests [18].

The foundation of CAT lies in Item Response Theory (IRT) methods [19], [20], [21], [22]. Briefly, IRT comprises a set of mathematical models and associated statistical procedures that connect observed survey responses to a person's location on an unmeasured, underlying latent trait like FS. IRT models produce item and latent trait estimates that do not vary with population characteristics with respect to the underlying trait, standard errors conditional on trait level, and trait estimates linked to item content. IRT facilitates evaluation of whether items measure the trait of interest similarly in different subgroups of respondents, that is, differential item functioning (DIF) and assesses data fit to the model [23].

This article describes development of CATs using items from the Lower Extremity Functional Scale (LEFS), a common paper-and-pencil outcomes instrument for patients with lower extremity impairments receiving rehabilitation [3]. No articles have described IRT analyses or CAT applications of the LEFS. The overall purpose of this study was to develop CATs of the LEFS. Specific purposes were to (1) test unidimensionality and local independence of the LEFS items, (2) test LEFS item DIF, (3) develop CATs using LEFS items, and (4) compare the discriminant validity of FS measures generated using all LEFS items analyzed with an IRT rating scale model with measures generated from the simulated CATs.

Section snippets

Study design and setting

A secondary analysis of retrospective data collected from patients with lower extremity impairments prior to rehabilitation was conducted. Focus On Therapeutic Outcomes, Inc. (FOTO) Institutional Review Board approved the project.

Subjects

Patients (n = 1772, 48 ± 17 years, 14 to 89 years, 64% female) with lower extremity impairments were analyzed (Table 1). Patients, who represent a sample of convenience, received rehabilitation in 81 outpatient clinics in 20 states (United States) in the consecutive 24

Unidimensionality and local independence

EFA of the 1,772 patients with complete scores on all 20 LEFS items produced a scree plot analysis that supported one dominant factor (first three eigenvalues = 13.1, 1.7, 0.7) with the first three factors explaining 66, 9, and 4% of data variance. In CFA, a three-factor model fit better than a one-factor model, but the correlations between the three factors were high (>0.62) suggesting one dominant factor. Fit statistics from the one- to three-factor models were CFI = 0.93, 0.94, 0.94, TLI = 0.98,

Discussion

Results (1) support body part specific, that is, hip, knee, foot/ankle, CATs can be generated from LEFS items; (2) measures of lower extremity FS generated using these CATs can discriminate known groups of patients in clinically logical ways; (3) θCAT measures were similar to θIRT measures in their discriminating abilities, but (4) because θCAT measures were estimated using on average six LEFS items, the CATs were 67% more efficient compared to using 18 unidimensional LEFS items and 70% more

Acknowledgments

The authors thank John M. Linacre, PhD for his comments regarding statistical analyses, and Karon F. Cook, PhD, for her insightful comments regarding statistical analyses, results, and manuscript edits.

References (72)

  • H. Wainer

    Introduction and history

  • R.K. Hambleton

    Emergence of item response modeling in instrument development and data analysis

    Med Care

    (2000)
  • J.M. Binkley et al.

    The lower extremity functional scale (LEFS): scale development, measurement properties, and clinical application

    Phys Ther

    (1999)
  • F.M. Lord et al.

    Statistical theories of mental test scores

    (1968)
  • F.M. Lord

    Some test theory for tailored testing

  • J.E. Ware et al.

    Practical implications of Item Response Theory and computerized adaptive testing. A brief summary of ongoing studies of widely used headache impact scales

    Med Care

    (2000)
  • J.E. Ware et al.

    Applications of computerized adaptive testing (CAT) to the assessment of headache impact

    Qual Life Res

    (2003)
  • S. Haley et al.

    Extending the frontier of rehabilitation outcome measurement and research

    J Rehabil Outcome Meas

    (2000)
  • C.A. Velozo et al.

    The use of Rasch analysis to produce scale-free measurement of functional activity

    Am J Occup Ther

    (1999)
  • B.G. Dodd et al.

    Computerized adaptive testing with polytomous items

    Appl Psychol Meas

    (1995)
  • K.F. Cook et al.

    Development and psychometric evaluation of the Flexilevel Scale of Shoulder Function

    Med Care

    (2003)
  • D.L. Patrick et al.

    Convening health outcomes methodologists

    Med Care

    (2000)
  • F.M. Lord

    Applications of Item Response Theory to practical testing problems

    (1980)
  • W. van der Linden et al.

    Handbook of modern Item Response Theory

    (1997)
  • S.E. Embretson et al.

    Item Response Theory for psychologists

    (2000)
  • R.K. Hambleton et al.

    Fundamentals of Item Response Theory

    (1991)
  • R.D. Hays et al.

    Item response theory and health outcomes measurement in the 21st century

    Med Care

    (2000)
  • L. Resnik et al.

    Using clinical outcomes to identify expert physical therapists

    Phys Ther

    (2003)
  • G.K. Alcock et al.

    Validation of the Lower Extremity Functional Scale on athletic subjects with ankle sprains

    Physiother Can

    (2002)
  • P.W. Stratford et al.

    Validation of the LEFS on patients with total joint arthroplasty

    Physiother Can

    (2000)
  • P.W. Stratford

    Getting more from the literature: estimating the standard error of measurement from reliability studies

    Physiother Can

    (2004)
  • World Health Organization

    International classification of functioning, disability and health

    (2001)
  • P.F. Lazarsfeld et al.

    Latent structure analysis

    (1968)
  • H. Wainer et al.

    Item response theory, item calibration, and proficiency estimation

  • Cited by (107)

    View all citing articles on Scopus
    View full text