Validity of balance measures in cerebellar ataxia: A prospective study with 12‐month follow‐up

Balance deficits are common in cerebellar ataxia. Determining which balance outcome measures are psychometrically strong for this population remains an unmet need.


INTRODUCTION
Cerebellar ataxia is not a single disease but refers to a collection of symptoms associated with cerebellar damage or dysfunction. Depending on the underlying cause, cerebellar ataxia can be categorized as genetic, sporadic, or acquired onset. 1 The prevalence of cerebellar ataxia varies across geographical and ethnic groups. 2 Friedreich's ataxia, a genetic disorder, has a prevalence of 2 to 5 per 100,000, 3 and the incidence of spinocerebellar ataxia ranges from 0.9 to 3 per 100,000, depending on the sub-type. 4 Individuals with cerebellar ataxia report a significant decline in quality of life 5 and a greater reduction in the performance activities of daily living. 6 Cerebellar ataxia is associated with high economic costs, with mean per-patient annual costs reported at EUR 18,776 in Spain (2004), 7 USD 12,850 in the United States (2010), 8 and EUR 59,993 in Ireland (2016). 9 The 6-month per-patient annual cost in Hong Kong is reported to be HKD 146,832 (2020). 10 Balance deficits in cerebellar ataxia are common due to a lack of rhythmic muscle contractions, 11 movement planning and initiation deficits, 12 increased body sway, 13 and poor anticipatory postural adjustments. 14 Up to 93% of patients with cerebellar ataxia report experiencing at least one fall within 12 months, 15 and 87% of patients report experiencing frequent falls. 10 Improving balance to prevent falls and fall-related consequences is a primary focus of rehabilitative interventions in this population. The accurate estimation of balance function requires psychometrically strong and standardized assessment tools, and a wide spectrum of balance outcome measures is currently available. Previous reviews reporting on the currently available balance outcome measures 16 have highlighted the use of specific recommended outcome measures in patients with cerebellar ataxia based on psychometric property testing in this population. 17 Based on the findings of our previous systematic review 17 and a Delphi survey on choosing an appropriate outcome measure for testing balance conducted among physiotherapists and neurologists specializing in cerebellar ataxia, 18 we identified and tested the reliability and validity of the Berg Balance Scale (BBS) 19 and the balance sub-components of Scale for the Assessment and Rating of Ataxia (SARA-bal) among a population of people with cerebellar ataxia secondary to multiple sclerosis. We found these two measures to be both reliable and valid for assessing balance at the activity level in this population. 20 Genetic and sporadic diseases that result in cerebellar ataxia are often chronic and progressive; therefore, outcome measures used to assess symptoms in these populations must be responsive to the change in disease status due to natural progression. However, evaluations of the responsiveness of the BBS in this population are lacking. Two laboratory-based assessments, the Sensory Organization Test (SOT) and the Limits of Stability (LOS) test, are accurate and objective measures of balance at the body, structure and function, and activity levels. 16 However, the psychometric validation of these two measures among people with cerebellar ataxia has not yet been established. Therefore, the current study aimed to test the validity (criterion, convergent and external validity) and responsiveness of four balance measures, including two clinic-based measures (BBS and SARA-bal) and two laboratory-based measures (SOT and LOS), among individuals with genetic and sporadic cerebellar ataxia. 1 Based on the findings of our previous work, we hypothesize a moderate to strong correlation between all four balance measures. We also hypothesize that the ataxia-specific balance outcome measure (SARA-bal) will be the most responsive measure to changes in balance after 6 and 12 months. . Volunteers were included in the study if they had a confirmed diagnosis of cerebellar ataxia secondary to a genetic or sporadic cause and were ambulant either with or without an assistive walking device. We excluded patients who were (1) unwilling to reveal their personal information, (2) bed-ridden, (3) having difficulty understanding simple commands, and (4) diagnosed with cerebellar ataxia due to acquired lesions, such as cerebellar stroke, cerebral palsy, trauma, or multiple sclerosis. 1

Procedure
All participants were assessed on three occasions, and each assessment lasted for $90 minutes. The timeline included assessments at baseline, and after 6 and 12 months. To ensure the repeatability of all assessments, the same assessor performed the assessments at all three assessment time points. The assessments were conducted at the neurorehabilitation research laboratory of Hong Kong Polytechnic University. The venue, lighting, types of equipment used, and instructions provided during the assessments were standardized across all three assessments.

Berg Balance Scale (BBS)
The BBS is a generic measure of balance 19 that includes 14 items relevant to static and dynamic balance, with each item scored between 0 and 4. Higher scores indicate better balance function. The BBS has been shown to be a reliable and valid measure for balance assessments in multiple neurological disorders. For patients with cerebellar ataxia, our previous study reported high intra-rater reliability (intraclass correlation coefficient [ICC] = 0.99), inter-rater reliability (ICC = 0.97), internal consistency (Cronbach's α = 0.94), and construct and criterion validity, with a minimum detectable change value of 3 points. 20 Scale for the Assessment and Rating of Ataxia (SARA) and its balance subcomponent (SARA-bal) The severity of cerebellar ataxia was assessed using the SARA. 21 A higher score indicates increased cerebellar ataxia severity. The "gait," "standing," and "sitting" sub-components of the SARA can be scored independently and together are referred to as the SARA-bal. 20 The SARA has been reported to have high inter-rater reliability (ICC = 0.98), 21 test-retest reliability (ICC = 0.90), 21 internal consistency (Cronbach's α = 0.94), and construct validity 20 for estimating disease severity in patients with cerebellar ataxia.

Sensory Organization Test (SOT)
The SOT is an objective laboratory-based assessment of dynamic balance. We used a Bertec Balance Advantage Dynamic Posturography (CDP) System with a dual-balance force plate and an immersive virtual reality visual surround to assess the SOT. Participants were instructed to stand on the motion-sensing dual-balance force plate with both medial malleoli aligned with the foot markings. To ensure safety, w used a dynamic balance harness for all participants. To protect against fatigue, we offered one practice trial and one test assessment. The participants were free to take any number of breaks during the assessment. Assessments were considered incomplete if participants lost their balance in the middle of the assessment or informed the assessors of any inability to proceed further. The methods used to replace missing values are discussed in the statistics section. The assessor used a touchscreen monitor to conduct each assessment and record assessment results. The participants were instructed to look at the immersive virtual reality visual surround screen in front of them and to follow instructions provided by the assessor. The SOT estimates the influences of visual, vestibular, and somatosensory inputs for balance maintenance. 22 Each participant's ability to stand unsupported was tested across six sensory conditions, using combinations of occluded vision, visual reference, and a swaying surface. Each condition lasted for 20 seconds, with three repeated measurements. In this study, we assessed the Bertec-generated visual (SOT-VIS), vestibular (SOT-VES), somatosensory (SOT-SOM), and composite scores (SOT-COM). Appendix 1A illustrates the assessment set-up for the SOT. The SOT has been reported to have good test-retest reliability (ICC = 0.68) among older adults 23 and is valid for assessing balance among patients with vestibular disorders. 24 Limits of Stability (LOS) The LOS is an objective laboratory-based assessment of dynamic balance that assesses the ability to shift one's body weight in eight different directions without moving the feet. 25 Body sway in all eight directions is guided by visual feedback, provided by the Bertec system described earlier, on the immersive virtual reality visual surround screen.
In this study, we estimated the reaction time (RT-LOS) and the maximal excursion (MXE-LOS) in four directions, including forward, right, back, and left. RT refers to the time delay between the command to move and movement onset and is expressed in milliseconds. The MXE is the maximum displacement of the center of pressure in a given direction. The MXE is expressed as a percentage, with a higher percentage indicating better dynamic control during weight shift in the tested direction. Appendix 1B illustrates the assessment set-up for the LOS.

Other measures
Independence in activities of daily living was assessed using the Barthel Index. 26,27 The EuroQol visual analog scale (EQ-VAS) was used to assess quality of life. 28,29 We used the Patients' Global Impression of Change (PGIC) scale 30 to categorize all included participants into either stable and/or improved (stable) or worsened groups, based on the assessment of disease progression at each follow-up. This classification system was used to anchor the responsiveness of the outcome measures of interest. The PGIC is a self-reported scale that requires the individual to indicate their perceptions of disease progress. In this study, we required the participants to select either "stable and/or improved," if they perceived no change or reduced cerebellar ataxia symptoms, or "worsened," if they perceived any decline in their health status compared with the baseline assessment.

Sample size and psychometric properties tested
COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) recommends a sample size of 30 to 49 to provide "fair" quality of evidence for psychometric property estimation. 31 Considering the limited number of patients with cerebellar ataxia in Hong Kong, 10 we targeted the recruitment of at least 30 participants for this study. The criterion validity is defined as the degree to which the scores of the assessment tool being investigated correlate well with a gold standard. 32 Based on the findings of previous studies supporting the reliability, validity, 20 and appropriateness 33 of the BBS and SARA-bal for assessing balance in people with cerebellar ataxia, we consider these assessments to be gold standards. To estimate criterion validity, we correlated the four balance outcome measures against each other, which is in line with a previous study. 20 For convergent validity, we correlated the measures of balance against disease severity and functional independence, and the measures of balance were correlated against disease duration and quality of life for the estimation of external validity. We define responsiveness as the degree to which the outcome measure accurately distinguishes important changes due to disease progression during the prospective assessment. In line with a similar study, we estimated responsiveness at 6 and 12 months after baseline. 34

Statistical analysis
Descriptive data are reported as the mean and standard deviation (SD) or the number and percentage. Spearman's correlation coefficient (ρS), a bivariate analysis for non-parametric samples, was used to establish criterion and convergent validity. In this study, we interpreted correlation coefficients <0.5 as low, those ranging between 0.5 and 0.79 as moderate, and those ≤0.8 as high. 35,36 Responsiveness was estimated using the area under the receiver-operating characteristic (ROC) curve (AUC ROC ), which was plotted for each change in outcome measure score at 6 and 12 months, using the PGICbased classification (stable vs. worsened) as the external criterion. An AUC value of ≤0.70 was considered to indicate satisfactory discrimination ability between the stable and/or improved group and the worsened group. This method is in line with that of a similar previous study on responsiveness. 37 The standard response mean (SRM), an indication of the effect size, was estimated for both groups to gauge responsiveness using the formula SRM = Mean score change / SD of the change. Due to the prospective nature of the study design, we anticipated the need to replace missing data. Missing data were replaced using the series mean imputation method, 38 and all analyses were conducted using the complete data set containing imputations for missing values. Sensitivity analysis was performed to increase the robustness of the findings. A secondary analysis was conducted for the psychometrically strong measures by excluding participants with missing data (complete case analysis).

RESULTS
Seventy-four potential participants were approached through the HKSCAA, and 42 volunteers responded to our invitation. Among these 42 volunteers, 2 were excluded, one of whom was bound to a wheelchair/bed (n = 1) and one who was diagnosed with cerebellar stroke (n = 1). During the 6-month follow-up assessments, we lost four participants, three of whom withdrew and one who relocated. During the 12-month follow-up, we lost two additional participants, one of whom we were unable to reach (n = 1) and one who did not attend the scheduled assessment (n = 1). within the community. The demographic data for the included participants are presented in Table 1. The mean and SD for all outcome measures, assessed at baseline and at 6 and 12 months, are reported in Appendix S2.

Validity
We found a significant negative correlation between the two clinic-based balance outcome measures (ρS = À0.89), indicating strong criterion validity for both the BBS and the SARA-bal. The correlations between the clinic-based measures and the SOT-COM and MXE-LOS in the forward and right directions were weak to moderate (ρS range 0.11 to À0.46), indicating weak to moderate criterion validity. Table 2 reports the criterion validity estimated for the four balance outcome measures. We found significant correlations between the clinic-based outcome measures (BBS and SARA-bal) and both functional independence (Barthel index: ρS 0.74 and À0.75, respectively) and disease severity (SARA: ρS = À0.85 and 0.92, respectively), indicating moderate to good convergent validity. The correlations between the clinic-based outcome measures and both quality of life (EQ-VAS) and disease duration were insignificant, indicating poor external validity. Among the laboratory-based outcome measures, the correlation between the SOT-COM and the SARA (ρS = 0.50) was significant and moderate, and the correlation between MXE-LOS in the forward direction and the

Responsiveness
Among the 36 participants who attended the 6-month assessments, 24 participants were categorized as stable, and the remaining were categorized as worsened, based on their PGIC responses. Two participants from the stable group were lost to follow-up at 12 months, and the data were replaced using the imputation method. The clinic-based measures for balance (BBS and SARA-bal), and the laboratory-based MXE-LOS in the right direction, demonstrated significant responsiveness for the correct classification of participants as either stable or worsened at both the 6-and 12-month followups. Cerebellar ataxia-specific measures (SARA and SARA-bal) showed higher accuracy at both 6 (AUC ROC > 0.8) and 12 months compared (AUC ROC > 0.7) with both the generic balance scale (BBS, AUC ROC > 0.7-0.8) and the other scales. Table 4 reports the SRM and responsiveness for selected balance measures and SARA with significant results at 6 and 12 months. Appendix S4 reports the SRM and responsiveness for all assessed outcome measures. The ROC curves for the T A B L E 4 Mean baseline change score, standardized response mean (SRM) for the "stable" and "worsened" group and area under the receiver-operating characteristic curve (AUC ROC ) for selective measures of balance and SARA with significant findings at 6 and 12 months  BBS, SARA, and SARA-bal at 6 and 12 months are illustrated in Figure 1(A),(B). The ROC curves for the SOT and LOS variables are illustrated in Appendices S5a and 5b.

Sensitivity analysis
No differences were found in the complete case analysis performed for those measures of balance with significant findings in the main analysis, with results comparable to those obtained in the main analysis. Appendices S6 and S7 report the results of criterion validity and responsiveness for the complete case analyses of selected measures at 6 and 12 months.

DISCUSSION
We aimed to estimate the validity and responsiveness of two clinic-based measures (BBS and SARA-bal) and two laboratory-based measures (SOT and LOS) for assessing balance among individuals with genetic and sporadic cerebellar ataxia. Our findings demonstrated strong criterion validity, moderate to strong convergent validity, weak external validity, and adequate responsiveness for the clinic-based outcome measures, in line with our previous findings showing good reliability and validity for the BBS and SARA-bal in cerebellar ataxia secondary to multiple sclerosis. 20 Among the laboratory-based measures, the SOT-COM and the MXE-LOS in the forward and right directions demonstrated moderate to weak criterion, convergent, and external validity. The prospective change scores measured at 6 and 12 months were represented by the SRM, which showed large effects for both the BBS and SARA-bal at 6 months and moderate effects at 12 months. The MXE-LOS in the right direction demonstrated a weak to moderate ability to discriminate participants between the stable and worsened groups at 6 and 12 months. Cerebellar ataxia is progressive, depending on the underlying cause. Therefore, identifying outcome measures that are responsive to changes secondary to natural disease progression is crucial. Although previous studies have reported the responsiveness of the SARA 21,34 and the SARA-bal to disease progression, 39 this study is the first to report the responsiveness of the BBS, SOT, and LOS in individuals with cerebellar ataxia. Fitzpatrick et al. 40 provided recommendations for choosing suitable outcome measures, which included the combination of both measure (such as the BBS) and diseasespecific (such as the SARA-bal) balance measures. We observed that both clinic-based balance measures were well tolerated by all patients, resulting in minimal missing values, further supporting the acceptability of these scales. By contrast, the laboratory-based measures demonstrated high floor effects. Finally, both of the clinic-based scales are available free of cost and require no more than 20 minutes to complete, making the outcome measures feasible for all clinical practices. Therefore, the BBS and SARA-bal appear to fulfill most of the recommended criteria, including acceptable psychometric properties. Based on the findings of this study, we recommend the use of the BBS and SARA-bal for the assessment of balance in individuals with cerebellar ataxia, both in clinical practice and research settings.
Except for the SOT-COM and MXE-LOS in the forward and right directions, the variables examined on the laboratory-based tests did not meet the criteria for acceptable validity or responsiveness. The order of assessment may have contributed to the significant findings detected for the MXE-LOS in the forward and right directions. The LOS variables are programmed and assessed in the same order for all participants, starting with the forward direction, followed by the right, back, and left directions. We speculate that the patients' performance was acceptable during the early assessments (forward and right) and may have reduced due to fatigue during the later assessments (back and left). Second, the patients' righting reactions may also have influenced better performance in the forward and right directions. However, restricting the LOS to the forward and right directions is questionable. Future studies may consider randomizing the order directions during LOS assessments to better assess the psychometric strength of the LOS in the back and left directions.
The poor psychometric scores observed for the SOT and LOS are likely due to the patient population tested in this study. Our participants had a mean ataxia severity score of 21, as measured by the SARA, indicating moderate-to-maximal dependency for activities of daily living. 41 We did not restrict the participants in this study based on their balance abilities. Among the 40 enrolled participants, only 34 were able to complete all of the items measured during the SOT and LOS assessments. Both the SOT and LOS require the user to stand unsupported or to counteract unpredictable challenges, such as changes in the visual surroundings during the SOT. Among those participants who were assessed by the SOT and LOS, 15% of values were missing due to the inability of the participants to complete all of the test tasks, indicating significant floor effects for both outcome measures among this population. The participants who were unable to complete all items were either ambulant with an assistive walking device or were unable to stand unsupported for more than 20 seconds, suggesting that these two measures may be more suitable for patients with reasonable balance, such as those who do not use an assistive device for walking. Future studies may consider retesting the psychometric properties of these two laboratory-based scales among cerebellar ataxia patients by restricting the population to those with the ability to stand unsupported for at least 20 seconds, thereby ensuring that all participants are capable of completing the tasks required for the SOT and LOS with minimal missing values. We speculate that the psychometric properties of these measures are likely to improve in both performance and accuracy in such a population.
In a previous systematic review, the authors reported the need to identify a classification-based assessment tool for balance. 42 We used the International Classification of Functioning, Disability, and Health model to categorize balance outcome measures into those that assess the body, structure and function, activity, or participatory levels. 43 Based on the findings of this study, the BBS and SARA-bal have adequate psychometric properties to assess balance at the activity level. The SOT and LOS, which both assess balance at the body, structure and function, and activity levels, demonstrated limitations in both validity and responsiveness in the population tested. Future studies testing or developing balance outcome measures for the assessment of balance at the body and structure and function levels for people with cerebellar ataxia remain warranted. Few outcome measures that assess balance at the participatory level are currently available, 16 and no evidence has been reported regarding the psychometric properties for participatory level measures of balance in cerebellar ataxia. Future research and tool development for disease-specific balance measures for participatory level assessments are recommended.

Study limitations
The findings of this study should be interpreted with caution, and the following limitations must be carefully considered. (1) First, the relatively small sample size (n = 40) may have influenced the power of the study. Hong Kong is a small city with a low prevalence of cerebellar ataxia, and the HKSCAA is the only special interest group that exists to support people with ataxia in Hong Kong. However, the association currently includes only 250 registered patients, and our recruited sample represents one-sixth of the total population, which may be considered a representative sample.
(2) Second, the order of the assessments was not randomized, and the SOT and LOS were assessed toward the end of each session, which may have resulted in fatigue-induced underperformance for these tasks. In addition, while assessing the LOS, the sequence of directions was standardized, which may have influenced the performance of the participants in the directions tested toward the end of the assessment. Therefore, future studies should consider randomizing the order of assessments to reduce response bias.
(3) Finally, the generalizability of the study findings is restricted to patients with either genetic or sporadic underlying causes for cerebellar ataxia and to those who are ambulant with or without a walking assistive device. Balance assessment among non-ambulant patients is more challenging, and future studies need to consider including such populations.

CONCLUSION
This study demonstrates moderate to strong criterion and convergent validity and adequate responsiveness for the BBS and SARA-bal for assessing balance among people with cerebellar ataxia. The SOT and LOS have high floor effects among this population. The SOT-COM and the MXE-LOS in the forward and right directions showed moderate to weak criterion, convergent, and external validity and responsiveness. Future studies remain necessary to assess the psychometric properties of SOT and LOS among populations restricted to ambulant people with the ability to stand unsupported for at least 20 seconds. Based on the findings of this study, we recommend the use of the BBS and SARA-bal for clinical practice and research. The SOT and LOS may not be suitable for balance assessments in all people with cerebellar ataxia, and we recommend restricting the use of the SOT and LOS in ambulant patients who either do not use an assistive device for walking or can stand unsupported for 20 seconds or more.