Concordance of Self-Report and Performance-Based Measures of Function and Differences between Clinic and Home among Wheelchair Users

Objective: The main objective of this study was to investigate concordance and differences among self-report and performance-based measures for wheelchair users. Method: The Functioning Everyday with a Wheelchair (FEW); a self-report measure, the FEW-Capacity (FEW-C); a performance-based measure for the clinic and the FEW-Performance (FEW-P) that measures clients’ skills in the home were the measures used in this study. We examined the concordance of the FEW and the FEW-C with the FEW-P as the criterion measure, and investigated the differences between the FEW-C and the FEW-P at pretest and posttest following the provision of a new wheeled mobility and seating device. Results: Our results suggested that the FEW-C was most concordant with the FEW-P for majority of the items compared to the FEW. At both pretest and posttest, for most of the tasks, the FEW-C and FEW-P were comparable suggesting that the environment may have a neutral effect. However, at posttest, the clients’ safety scores for the outdoor mobility task and the clients’ quality scores for the Personal Care task improved significantly suggesting that the standard supportive environment of the clinic may have enabling effect on activity performance. Conclusion: Clinically, rehabilitation clinicians may get a more accurate estimation of functional performance in the home from a clinic assessment, and they are cautioned that the inclusion of self-report assessment and data obtained from clients’ perceptions may be discrepant with actual performance. We also concluded that the impact of the environment on activity performance of wheelchair users can be neutral or enabling depending on time of assessment and tasks being assessed.

clutter-free spaces. Their findings suggested that the impact of the environment on activity performance can be neutral, enabling, or disabling depending on the level of analysis, and the activity being analyzed. Also, they concluded that if a rehabilitation clinician wants to know how a person performs IADLs, the clinician should evaluate that person's performance in the environment in which the client will be functioning. In contrast to the previous studies, in a sample of adults with diagnosed or suspected dementia, they found no overall difference in IADL performance between the clinic and home settings [12].
Overall, research studies comparing performance for ADL and IADL between clinic, self-report, and home settings yielded conflicting results. Despite the importance of assessing functional performance in persons who have been prescribed wheeled mobility and seating device, little is known about the relative concordance of the different methods used to obtain this information (self-report and performance-based outcome measures). Previous studies have reported that self-reports of performance with a wheeled mobility and seating device do not always agree with clinic and home measures of the same performance [6,7,13]. A comparison study of self-report and performance-based instruments to measure change in function following the provision of wheeled mobility and seating interventions for adults with disabilities who used manual or power wheelchairs or scooter as their primary mobility and seating device showed that both self-report and performance measures at the clinic were able to detect significant changes in function over time following the provision of a new wheeled mobility and seating device. However, the self-report often significantly underestimated function and therefore documented greater changes in function over time than did the performance measure at the clinic [14].
The specific aims for this study are (1) to examine the concordance of the self-report; Functioning Everyday with a Wheelchair (FEW) and the FEW-Capacity (FEW-C, a performance-based measure for the clinic) with the criterion measure, the FEW-Performance (FEW-P, a performance-based measure for the home), and (2) to investigate the differences between the clinic and home performance-based measures; the FEW-C and the FEW-P at pretest and posttest following the provision of a new wheeled mobility and seating device.

Hypothesis
Aim 1 is descriptive. For Aim 2, our null hypothesis was that there would be no differences between the FEW-C and the FEW-P for independence, safety and quality data at pretest and posttest following the provision of a new wheeled mobility and seating device.

Design
This study used secondary data analyses of data collected in two previous studies [14,15]. Data in this study were examined to explore the concordance of the FEW and the FEW-C with the FEW-P, and to investigate the differences between the clinic and home performance-based measures; the FEW-C and the FEW-P at pretest and posttest following the provision of a new wheeled mobility and seating device.
In-home performance (FEW-P) was selected as the criterion method because 1) the home is the environment where persons usually perform their routine activities of daily living and either offers the most support or challenges functional performance, 2) the home is a familiar real-world environment where persons wish to remain [3].

Participants
Participants for this study were a subset of participants from the studies by Mills and Schmeler [14,15]. The study sample consisted of 19 wheelchair users with progressive or non-progressive conditions who needed a new wheeled mobility and seating device. Nine were male and 10 were female. The average participant was Caucasian, 53.1 years old, and had used a wheelchair for 9.5 years. Participants with multiple sclerosis comprised over one third of the sample (Table  1). At pretest, 16 of the wheelchairs were manual and 3 were power. The manual wheelchairs, on average, were 3.7 years old and most of them had no seat functions. At posttest, all participants had power wheelchairs, and most of these wheelchairs were equipped with multiple seat functions (Table 2).

Instruments
The FEW, FEW-C and FEW-P were the measures used in this study. The FEW is a 10 item self-report that measures perceived functional independence of individuals who use a wheelchair or scooter as their primary mobility and seating device ( Table 3). The FEW-C is a performance-based measure for the clinic and has 10 items. Items 2 -10 are performance-based, and item 1 is a self-report. The FEW-C was designed to measure function based on the ICF construct of capacity. The FEW-P is a performance-based measure for the home and  has 10 items with items 2 -10 being performance-based, and item 1 being self-report, as in the FEW-C. The FEW-P was designed to measure function in the "lived in" environment according to the ICF. The trio of FEW tools has been used in research and proved to be reliable, valid and useful [14][15][16][17][18][19].

Procedures
After study procedures were explained and written informed consents were signed, the FEW and FEW-C pretest assessments were conducted by trained occupational therapists and occurred on a regularly scheduled clinic visit for a seating evaluation, followed by the FEW-P assessment within 1 week. The posttest assessments occurred in the same sequence (FEW, FEW-C and FEW-P) after receiving the new wheelchair. A fixed rather than a random order of assessment methods was followed, with self-report before performance because perceptions (self-reports) are more likely to be biased by performance than the reverse. The FEW tools have demonstrated excellent interrater reliability. Mean duration between pretest and posttest was 57 days [15,14].

Data Analysis
Percent agreement statistics at both pretest and posttest were computed to determine the concordance among items 2-10 of the three instruments (FEW, FEW-C and FEW-P) for each subject (19 subjects). Percent agreement was calculated by dividing the number of participant agreements by the sum of the number of participant agreements and disagreements. The percentage of items for each method that resulted in either overestimation or underestimation of ability was calculated to identify bias and direction of disagreement. We then examined the differences between the FEW-C and the FEW-P for independence, safety, and quality data for the 9 items at pretest and posttest following the provision of a new wheeled mobility and seating device by analyzing the average total scores using paired t tests. Differences between the FEW and FEW-C and the FEW and FEW-P have been reported elsewhere [14,15]. Stability, durability, and dependability item was not included as it is a self-report item and differs from all other items of the FEW-C and FEW-P. To eliminate the effect of multiple comparisons, we used a Bonferroni adjustment [20].

Concordance and Bias
Tables 4 and 5 present percent agreement, percent overestimation, percent underestimation, and bias for each of the items 2-10 of the FEW and FEW-C relative to the criterion method (FEW-P) at pretest and posttest respectively.
At pretest, the FEW-C was more concordant with the FEW-P compared to the FEW for 8 of 9 items, the exception being indoor mobility. When there was a disagreement, for 7 of 9 items ---all but outdoor mobility and Transportation ---clinic underestimated home, and Outdoor Mobility, underestimated and overestimated equally. Moreover, for 8 of 9 items ---all but Transportation ---self-report underestimated home. Overall, when FEW and FEW-C were not concordant with the FEW-P, they consistently underestimated it with the exception of transportation, which overestimated performance.
At posttest, the FEW-C was more concordant with the FEW-P compared to the FEW for 7 of 9 items ---all except transfer and outdoor mobility. However, when the FEW and FEW-C were not concordant with the FEW-P, they had different tendencies. The FEW-C consistently overestimated the FEW-P, with the exception of Reach. The FEW underestimated the FEW-P for 4 of 9 items ---Comfort Needs, Reach, Personal Care, Indoor Mobility ---and overestimated the FEW-P for 5 of 9 items ---Health Needs, Operate, Transfer, Outdoor Mobility, and Transportation.
At both pretest and posttest, the FEW-C was more concordant with the FEW-P for the majority of the items compared to the FEW.
At pretest, the FEW-C was most concordant with the FEW-P for the Personal Care task and was least concordant with the FEW-P for the Indoor Mobility task. In contrast, the FEW were most concordant with the FEW-P for the outdoor mobility task and were least concordant with the FEW-P for the Reach task.
At posttest, the FEW-C was most concordant with the FEW-P for the Comfort task and was least concordant with the FEW-P for the transfer task. In contrast, the FEW were most concordant with the FEW-P for the Operate and Indoor Mobility tasks and were least concordant with the FEW-P for the Reach and Personal Care tasks.

Differences between the FEW-C and FEW-P at Pretest and Posttest
Below are the results of the paired t-tests of the FEW-C and FEW-P total independence, safety, and quality scores and of the individual items at pretest and posttest (Tables 6-15).  Scooter 0 0

Seat functions
Power tilt in space only 1 3 Power reclining backrest only 0 0 Power seat elevator only 1 1 Tilt-in-space and reclining back only 0 1 All of the above 0 9 All of the above plus passive standing 0 1 For the total scores, at pretest, there was no significant difference between the FEW-C and the FEW-P, whereas, at posttest, the total safety and quality scores differed significantly, with the FEW-C scores being significantly better than the FEW-P scores.
For the individual items, the FEW-C and FEW-P, in general, had consistent results at pretest and posttest. At pretest, the FEW-C and FEW-P, did not differ significantly for independence, safety, and quality. At posttest, the FEW-C and FEW-P, did not differ significantly for independence, safety, and quality except for quality scores for the Personal Care item (Table 12), and safety scores for the outdoor mobility item (Table 14), both of which were significantly better in the clinic (data not shown).

Discussion
Our hypothesis that there would be no differences between the FEW-C and the FEW-P for independence, safety, and quality data at pretest and posttest was partially confirmed. For the total scores, at pretest there were no significant differences, but at posttest the total safety and quality scores differed significantly. At first glance, these findings may seem unexpected because the same items were used to structure both of the FEW-C and FEW-P to observe functional performance of wheelchair users in both performance situations: the clinic and the home. The primary difference in the testing procedure was that the clinic was an unfamiliar, supportive environment, whereas the home was the familiar, naturalistic one. Hence, the actual performance differences were most likely due to environmental factors and that is consistent with previous literature [2,3,11]. For the total scores, and individual item scores the results of our study indicated that at pretest, the effect of the environment was neutral. At posttest, however, the supportive environment of the clinic enabled safety and quality significantly, which was most likely driven by the quality scores for the personal care item and the safety scores for the outdoor mobility item.  Table 4: Percent agreement and bias of the FEW and FEW-C with the FEW-P at pretest.
Note: = FEW-P (home) = The percent agreement with the criterion (FEW-P); > FEW-P (home) = Percent of ratings higher than the criterion (overestimation of performance); < FEW-P (home) = Percent of ratings lower than the criterion (underestimation of performance); Bias = Direction and magnitude of the rating bias compared with the criterion measure (computed as > FEW-P -< FEW-P); FEW: The Functioning Everyday with a Wheelchair instrument (the self-report version); FEW-C: FEW-Capacity (the clinic-version); FEW-P: FEW-Performance (the home-version; the criterion).    Table 6: Differences between the FEW-C and FEW-P for the total scores at pretest and posttest.
Note: p< .01 Table 7: Differences between the FEW-C and FEW-P for comfort needs at pretest and posttest.
Note: p< .01 Table 8: Differences between the FEW-C and FEW-P for Health Needs at pretest and posttest.
Note: p< .01 Table 9: Differences between the FEW-C and FEW-P for operate at pretest and posttest.
Note: p< .01  Table 11: Differences between the FEW-C and FEW-P for transfer at pretest and posttest.
Note: p< .01 Table 12: Differences between the FEW-C and FEW-P for personal care at pretest and posttest.
Note: p< .01 Table 13: Differences between the FEW-C and FEW-P for indoor mobility at pretest and posttest.
Note: p< .01 Table 14: Differences between the FEW-C and FEW-P for outdoor mobility at pretest and posttest.
Note: p< .01 Our results indicated that at both pretest and posttest, the clinic performance-based rating, the FEW-C, was more concordant with the in-home performance-based rating, the FEW-P, than the self-report FEW. The greatest concordance between the FEW-C and FEW-P at pretest was for personal care and at posttest for comfort needs. Moreover, the range of concordance between the FEW-C and FEW-P was 31.6 percent to 78.9 percent at pretest and 42.1 percent to 89.5 percent at posttest. However, the self-report FEW was least concordant with the FEW-P, ranging from 10.5 percent to 52.6 percent at pretest, and 31.6 to 68.4 percent at posttest. Clinically, our findings indicate that rehabilitation clinicians will get a more accurate estimation of performance in the home from a clinic performance assessment compared to a self-report. Based on our findings, there was a distinct discrepancy between what clients said they could do and what they actually did; therefore, information on wheelchair function, obtained from self-report, should be used with caution.
At pretest, when the FEW and FEW-C were not concordant with the FEW-P, both consistently underestimated it with the exception of the transportation item, suggesting greater disability. The underestimation at pretest was more evident in the FEW suggesting that participants perceived greater disability. Because the sample in our study had come to a clinical setting to be evaluated for a new wheeled mobility and seating device, their perceptions of their function as indicated on the FEW may have been worse than their actual performance as indicated on the FEW-C and FEW-P. Underestimating capabilities on the FEW self-report tool compared to pretest performance, is not unusual for individuals who are seeking interventions to obtain health services or a new product and/or equipment [14,21].

Study Limitations and Future Directions
There were several limitations to this study. A major limitation was the small sample size. When assessing the concordance and differences among the FEW-C and FEW-P for the Transportation item, the results should be interpreted with caution due to the smaller sample size and missing data. Several participants were not able to complete all subtasks related to this item due to unavailability of personal and/or public transportation, inability to get the wheelchair out of the house, fatigue, or bad weather conditions at the time of the assessment.
Our sample had adequate cognitive and language status so our findings may not be relevant to those with cognitive or communication impairments. Furthermore, not including new manual wheelchair users as well as some of the primary conditions causing disability among wheelchair users, such as osteoarthritis and spinal cord injuries [22,23] may limit the generalizability of our findings. Future studies with larger samples, studying the impact of progressive and non-progressive conditions, and the inclusion of less-experienced wheelchair users with more diverse diagnoses and cognitive and communication impairments may strengthen the generalizability of future findings.

Conclusion
Our results suggested that the FEW-C was most concordant with the FEW-P for majority of the items compared to the FEW. Clinically, rehabilitation clinicians may get a more accurate estimation of performance in the home from a clinic assessment, and they are cautioned that the inclusion of self-report assessment and data obtained from clients' perceptions may be discrepant with actual performance.
We also concluded that the impact of the environment on activity performance of wheelchair users can be neutral or enabling depending on time of assessment and tasks being assessed. At both pretest and posttest, for most of the tasks, the FEW-C and FEW-P were comparable suggesting that the environment may have a neutral effect. However, at posttest, the clients' safety scores for the outdoor mobility task and the clients' quality scores for the personal care task improved significantly suggesting that the standard supportive environment of the clinic may have enabling effect on activity performance. This research needs to be replicated across a wider range of wheelchair users with primary health conditions and cognitive and language deficits to assess the generalizability of the findings.