Reliability of a New Test of Balance Function in Healthy and Concussion Populations

Providing quantitative measures of balance and posture is a valuable aid in clinical assessment and in recent years several devices have been introduced that have demonstrated the accurate measure of balance via deviation of center of mass utilizing software algorithms and mobile devices. The purpose of this study was to assess the accuracy of EQ Balance against the SwayTM Balance System (Sway), another balance device that is currently established as an accurate measure of balance, and to evaluate the test–retest reliability of EQ Balance. Seventy individuals presenting to a sports medicine and concussion clinic volunteered to participate in the assessment of balance utilizing Sway and EQ Balance simultaneously. The group included 25 males and 45 females (mean age: 37.8 ± 14.8, range: 13–65) with and without concussion or other neurological conditions (39 concussed vs. 31 non-neurologically injured, or healthy). Twenty-six of the concussed participants were balance-impaired. Participants performed five postures while holding the mobile device against their chest. Participants held a device holder that secured two devices: one iPhone 6 with EQ Balance and a second iPhone 6 with Sway Balance. The average balance score on all five stances was recorded as the “average balance score”. Average balance scores were in statistical agreement between the two methods across the entire group, and for sub-groups according to the Deming regression (p < 0.01). The intra-class correlation (ICC) for the cohort was 0.87 (p < 0.001). Across the cohort, EQ Balance measured significantly worse balance scores in the balance-impaired group, comprised of participants with brain injury who failed a clinical balance screening test, compared to the group without clinically-determined balance impairment (this group includes healthy and some concussed patients). EQ Balance demonstrated safety, as it was considered safe to perform independently (i.e., without an observer) in those with impaired balance, and high test- retest reliability in the healthy and concussed patient population. Statistical agreement was established between the two measures of EQ Balance and Sway Balance for the average balance score across all five stances. The ICC analysis demonstrates strong consistency of the task output between test sessions. Given these results, EQ Balance demonstrates strength as a new balance assessment tool to accurately measure balance performance as part of a unique and novel gamified application in healthy and neurologically injured populations.


Introduction
Balance is a term that describes the integration of sensory, motor, and bio-mechanical processes that enable a person to maintain their center of mass within their base of support [1,2]. Maintenance of balance, or postural stability, is a critical component of motor skills and ranges from the simple maintenance of posture to the performance of complex voluntary movements in athletics or daily living [3][4][5]. Clinically, improvement in balance function has been shown to support injury prevention, promote recovery from injury, and facilitate improved functional athletic performance [3,6,7]. Given

Materials and Methods
Performance testing was completed by Highmark Health Sport and Concussion Clinic. EQ Balance (Highmark Interactive, Toronto, Canada) and Sway (Sway Medical, Aledo, TX, USA) testing were performed for a total of 70 subjects (25 male, 45 female) with and without concussion or other musculoskeletal conditions (39 concussions vs. 31 non-neurologically injured, or "healthy"). A subgroup of the concussed participants had confirmed balance impairment, determined by failure of the modified Balance Error Scoring System (mBESS) test, as part of the standard testing protocol used by the study physician (26 participants were "balance-impaired") [23]. As participants presented to HighMark Sport and Concussion clinic at various time points following concussion, the timing of administration of the mBESS test differed amongst participants. While the authors acknowledge the limits of the mBESS test to contribute to the diagnosis of concussion following the initial 3-5 days post injury, the mBESS was used in the study as a screening tool for balance deficits as opposed to a diagnostic tool for concussion.
The mBESS protocol requires participants to balance in three stances: double leg, single leg and tandem stance, with eyes closed and hands placed on hips, on a firm surface, for 10 s each [23]. The maximum balance score is 30, with points subtracted for each mistake with a maximum of 10 points subtracted per stance [23]. This score was used by the clinician to diagnose balance impairment in the context of the clinical visit, while taking into account the normative data for the patient's demographic [24][25][26].
In the study trials, balance measures were obtained for a total of five stances, which included the double leg stance, right tandem stance (i.e., right foot behind), left tandem stance (i.e., left foot behind), right single leg stance and left single leg stance. Each stance was held for 10 s, completed on a firm surface and with eyes closed. The amount of movement was calculated by each balance test application algorithm using data generated by the mobile device accelerometer. A device holder secured two iPhones (version 6s, Apple, Cupertino, CA, USA), one running EQ Balance and the other running Sway. The devices were held by the participant at chest level against the approximate mid-point of their sternum, and the devices were confirmed to be in equivalent triaxial planes of movement. See Figure 1, below, for a depiction of how the devices were held. A total of three trials were completed. The first trial was considered the practice trial and was excluded from analysis. This was immediately followed by a second trial where EQ Balance and Sway were used simultaneously. Immediately following this, a third trial was conducted with EQ Balance alone, to collect reliable data. The EQ Balance and Sway applications were initiated and terminated simultaneously in order to maintain synchronized recording. Upon completion of the test, the scoring data generated by EQ Balance and Sway were documented for analysis. Descriptive analysis, Spearman's rho, intraclass correlation, standard error of measurement (SEm) and all figures were completed with IBM SPSS Statistics Subscription 1.0.0.1072 (IBM, Armonk, NY, USA). Intraclass correlation coefficient (ICC) was used to calculate reliability. ICC was calculated with a two-way mixed effects model, evaluating for single measurements, and absolute agreement between tests [27]. SEm was calculated by multiplying the standard deviation of the scores in the group by the root of 1 − reliability, where Chronbach's alpha was used for reliability. Deming regression was calculated using MedCalc Statistical Software version 19.0.4 (MedCalc Software bvba, Ostend, Belgium; https://www.medcalc.org/; 2019). To calculate Deming regression, coefficients of variation were calculated for each measure separately using test results from timepoint 1 (T1) and 2 (T2), based on within-subject standard deviation for each test. These coefficients were then used to calculate the Deming regression for measures at T2. The intercept and slope were calculated according to Cornbleet and Gochman [28]. Standard errors were calculated using the jackknife method described by Armitage et al. 2002. Standard errors were used to run a t-test to evaluate the difference between the Deming regression equation and y = x + 0. That is, a t-test was calculated for the slope and intercept separately, where the intercept was compared against 0 and the slope was compared against 1. This was conducted for each pose and each subgroup. For Deming regressions that included the entire cohort of 70 participants, the coefficient of variation was calculated on that dataset. However, for subgroups, the smaller number of participants made the coefficient of variation less reliable so the coefficient of variation was taken from the full cohort's average of all poses. The two devices were affixed to the rigid device-holder and participants were instructed to hold the device-holder against their chest.

Instrumentation
The current study utilized the Sway Balance mobile application (Sway Medical, Aledo, TX, USA) and the EQ Balance application (Highmark Interactive, Toronto, Canada) installed on two Apple iPhones (version 6s, Apple, Cupertino, CA, USA). The Sway application, installed on the mobile Figure 1. The two devices were affixed to the rigid device-holder and participants were instructed to hold the device-holder against their chest.

Post-Hoc Analysis: Group Comparison
A post-hoc analysis was conducted to investigate the difference in balance performance, as measured with EQ Balance, between healthy and balance-impaired cohorts. A Mann-Whitney test was used as a non-parametric test to compare balance performance based on balance injury status (i.e., healthy vs. impaired balance). This analysis was completed with IBM SPSS Statistics Subscription 1.0.0.1072.
All testing procedures were approved by the Veritas Institutional Review Board for Research involving Human Subjects, and informed consent was obtained from all subjects prior to participation.

Instrumentation
The current study utilized the Sway Balance mobile application (Sway Medical, Aledo, TX, USA) and the EQ Balance application (Highmark Interactive, Toronto, Canada) installed on two Apple iPhones (version 6s, Apple, Cupertino, CA, USA). The Sway application, installed on the mobile device, accesses the output of the device's accelerometer to determine a balance score. The units representing the balance score are determined by undisclosed calculations from Sway Medical. A validation study of Sway compared the output of Sway results to a static force plate (Biodex Balance System SD, Biodex Medical Systems, Shirley, NY, USA) while participants did a single-leg stance for 10 sec. The study found no significant difference between the two balance scores and a significant correlation between the two sets of results, across the cohort of 30 participants [22].
Similarly, EQ Balance application, installed on the mobile device, accesses the output generated from the accelerometers to determine a balance score. The units representing the balance score are also determined by undisclosed calculations from Highmark Interactive.
These mobile device systems incorporate accelerometers that measure the instantaneous acceleration of an object at any given point in time. These accelerometers, termed tri-axial or tri-axis accelerometers, incorporate three orthogonal measurement axes that quantify acceleration independently yet are housed in the same device.

Descriptive Analysis
In total, 70 participants were enrolled into the study. The entire study cohort was comprised of 39 individuals with concussion and 31 healthy controls. Demographic information is presented in Table 1 below. The distributions of scores for both methods of assessment (EQ Balance and Sway) are presented in Figure 2. The distributions of scores for both methods of assessment (EQ Balance and Sway) are presented in Figure 2.
Means for the assessments conducted at each time point are presented in Figure 3.  Means for the assessments conducted at each time point are presented in Figure 3.

Figure 3.
Bar plot representing the median scores of each test (EQ Balance, gray bars; Sway Balance, blue bars) at different time points. Timepoint 1 (T1) was used as a practice/acclimatization session. Timepoint 2 (T2) was the main test session. Timepoint 3 (T3) was used in the test-retest analysis for EQ Balance. Data are plotted as medians with 95% confidence intervals.

Analysis 1: Relationship between Measures
Methods were compared using Spearman's rank-order correlation to assess the strength and direction of association between measures, and a Deming regression to assess each pose individually.
Deming regression was derived for validity assessment. Table 2 shows Spearman's rho (rho) correlations and Deming regressions for results of each pose across all participants and two subgroups: healthy balance and impaired balance. Across all participants, a strong, statistically significant correlation was identified between the measures (rho = 0.848, p < 0.001). Separating by subgroup finds a strong correlation for all poses except for balance-impaired during right single leg stance where a weak correlation was found (rho = 0.43, p = 0.03) and in healthy participants with feet together, where there was a non-significant positive correlation between measures (rho = 0.21, p = 0.177).
By observation, this analysis identifies tight clustering of individuals both with and without concussion whose balance deficits are relatively low (i.e., those with scores > 80.0). Greater dispersion is observed with those from both groups with apparent balance deficits; however, these data maintain a strong correlation: running a correlation analysis on the subgroup of participants with an EQ result below 80 found the Spearman's correlation coefficient, rho = 0.86 (p < 0.001) between measures in this group.
Deming regressions were run to compare Sway and EQ while accounting for variability in both measures. This analysis of the average balance score across the entire group found EQ Balance = Sway Balance * 0.78 + 18.36, with the t-test finding no significant difference from y = x + 0 at p < 0.95%. Table  2 shows Deming regression results for EQ Balance vs. Sway Balance for all poses, where x represents the Sway Balance score and y represents the EQ Balance score for the given pose. Figure 4 illustrates the relationship between scores on tasks in comparison to unity (y = x). Bar plot representing the median scores of each test (EQ Balance, gray bars; Sway Balance, blue bars) at different time points. Timepoint 1 (T1) was used as a practice/acclimatization session. Timepoint 2 (T2) was the main test session. Timepoint 3 (T3) was used in the test-retest analysis for EQ Balance. Data are plotted as medians with 95% confidence intervals.

Analysis 1: Relationship between Measures
Methods were compared using Spearman's rank-order correlation to assess the strength and direction of association between measures, and a Deming regression to assess each pose individually.
Deming regression was derived for validity assessment. Table 2 shows Spearman's rho (rho) correlations and Deming regressions for results of each pose across all participants and two subgroups: healthy balance and impaired balance. Across all participants, a strong, statistically significant correlation was identified between the measures (rho = 0.848, p < 0.001). Separating by subgroup finds a strong correlation for all poses except for balance-impaired during right single leg stance where a weak correlation was found (rho = 0.43, p = 0.03) and in healthy participants with feet together, where there was a non-significant positive correlation between measures (rho = 0.21, p = 0.177).
By observation, this analysis identifies tight clustering of individuals both with and without concussion whose balance deficits are relatively low (i.e., those with scores > 80.0). Greater dispersion is observed with those from both groups with apparent balance deficits; however, these data maintain a strong correlation: running a correlation analysis on the subgroup of participants with an EQ result below 80 found the Spearman's correlation coefficient, rho = 0.86 (p < 0.001) between measures in this group.
Deming regressions were run to compare Sway and EQ while accounting for variability in both measures. This analysis of the average balance score across the entire group found EQ Balance = Sway Balance * 0.78 + 18.36, with the t-test finding no significant difference from y = x + 0 at p < 0.95%. Table 2 shows Deming regression results for EQ Balance vs. Sway Balance for all poses, where x represents the Sway Balance score and y represents the EQ Balance score for the given pose. Figure 4 illustrates the relationship between scores on tasks in comparison to unity (y = x).   Figure 4. Relationship between the scores on the Sway Balance task and the EQ Balance task. Data points are differentially identified by balance impairment as measured by the mBESS (blue = pass, red = fail). Unity line Y = X is shown as a black dotted line.
The median values for healthy participants are shown, by pose, in Table 2. The Wilcoxon signed ranks test was used as a non-parametric method for calculating the difference between the two measures. It failed to find a statistically significant difference between the results of each test (z = −1.958, based on negative ranks, p > 0.05 2-tailed). The median values for healthy participants are shown, by pose, in Table 2. The Wilcoxon signed ranks test was used as a non-parametric method for calculating the difference between the two measures. It failed to find a statistically significant difference between the results of each test (z = −1.958, based on negative ranks, p > 0.05 2-tailed).

Analysis 2: Test-Retest Analysis of the EQ Balance Task
Consistency of the EQ Balance measure was conducted by implementing the assessment on two successive occasions on the entire study cohort. The intra-class correlation for the cohort was 0.87 (p < 0.001) with a 95% confidence interval of 0.81 to 0.92. The standard error or measurement was calculated as 2.72. EQ Balance task scores for each test session are presented in Figure 5. This analysis demonstrates strong consistency of the task output between test sessions.

Analysis 2: Test-Retest Analysis of the EQ Balance Task
Consistency of the EQ Balance measure was conducted by implementing the assessment on two successive occasions on the entire study cohort. The intra-class correlation for the cohort was 0.87 (p < 0.001) with a 95% confidence interval of 0.81 to 0.92. The standard error or measurement was calculated as 2.72. EQ Balance task scores for each test session are presented in Figure 5. This analysis demonstrates strong consistency of the task output between test sessions. Figure 5. Scatterplot depicting the test-retest data for the EQ Balance task taken at Timepoint 2 (T2) and Timepoint 3 (T3) for the balance-impaired group (red) and healthy group (blue). The unity line (y = x) is represented by the dashed line.

Post-Hoc Analysis: Group Comparison
As a post-hoc analysis, the Mann-Whitney test was used as a non-parametric test to identify a difference in balance performance, measured with EQ Balance, between healthy and balanceimpaired cohorts. The results of this test for each measure, and each pose are shown in Table 3.

Discussion
Balance assessment tools can be utilized to assess the neuromuscular effects of aging, identify neurological disorders, aid in the diagnosis and management of injury related to brain trauma, and identify functional deficits related to activities of daily living. Additionally, assessment tools are Figure 5. Scatterplot depicting the test-retest data for the EQ Balance task taken at Timepoint 2 (T2) and Timepoint 3 (T3) for the balance-impaired group (red) and healthy group (blue). The unity line (y = x) is represented by the dashed line.

Post-Hoc Analysis: Group Comparison
As a post-hoc analysis, the Mann-Whitney test was used as a non-parametric test to identify a difference in balance performance, measured with EQ Balance, between healthy and balance-impaired cohorts. The results of this test for each measure, and each pose are shown in Table 3.

Discussion
Balance assessment tools can be utilized to assess the neuromuscular effects of aging, identify neurological disorders, aid in the diagnosis and management of injury related to brain trauma, and identify functional deficits related to activities of daily living. Additionally, assessment tools are critical in the successful implementation of a balance training program to demonstrate the current level of function and to track progress achieved over time. Measures of balance performance can provide valuable information on recommended training for injury prevention and improving athletic performance and, therefore, can be used as a prescriptive rehabilitative tool in injury recovery [4,5,13,29,30]. However, assessment protocols must be reliable, valid, reproducible, and sensitive enough to measure significant changes, all of which are common limitations to subjective clinical tests.
Exploration of the dataset revealed that balance results, as measured by both tools, were negatively skewed ( Figure 2). This is expected to occur on tests with an upper bound and where the clinical group under investigation can still be high functioning. Recent work by Inness and colleagues [31] identified that only 32-48% of adults with concussion demonstrate balance deficits on sway-related metrics. The distributions were comparable between tests (i.e., similar distributions in both EQ Balance and Sway) and between individuals (i.e., concussion and healthy controls).
The data were explored further with Spearman's rho non-parametric test of correlation. This showed a strong, positive relationship between the average balance output (average of all five stances) of EQ Balance and Sway across the entire group (rho = 0.85, p < 0.01). Subgroup analysis showed a strong to moderate positive relationship for healthy and balance-impaired groups, respectively (healthy rho = 0.67, balance-impaired rho = 0.50, p < 0.01). Analysis of each individual stance found a significant positive correlation between the two measures on all stances except for the easiest pose (feet together) in the healthy group. Review of the scatterplot of this pose ( Figure 6) indicates that this low correlation is due to a very low degree of variability in the results, which is to be expected in a healthy population performing an easy balance stance. Full details are shown in Table 4.
provide valuable information on recommended training for injury prevention and improving athletic performance and, therefore, can be used as a prescriptive rehabilitative tool in injury recovery [4,5,13,29,30]. However, assessment protocols must be reliable, valid, reproducible, and sensitive enough to measure significant changes, all of which are common limitations to subjective clinical tests.
Exploration of the dataset revealed that balance results, as measured by both tools, were negatively skewed (Figure 2). This is expected to occur on tests with an upper bound and where the clinical group under investigation can still be high functioning. Recent work by Inness and colleagues [31] identified that only 32-48% of adults with concussion demonstrate balance deficits on swayrelated metrics. The distributions were comparable between tests (i.e., similar distributions in both EQ Balance and Sway) and between individuals (i.e., concussion and healthy controls).
The data were explored further with Spearman's rho non-parametric test of correlation. This showed a strong, positive relationship between the average balance output (average of all five stances) of EQ Balance and Sway across the entire group (rho = 0.85, p < 0.01). Subgroup analysis showed a strong to moderate positive relationship for healthy and balance-impaired groups, respectively (healthy rho = 0.67, balance-impaired rho = 0.50, p < 0.01). Analysis of each individual stance found a significant positive correlation between the two measures on all stances except for the easiest pose (feet together) in the healthy group. Review of the scatterplot of this pose ( Figure 6) indicates that this low correlation is due to a very low degree of variability in the results, which is to be expected in a healthy population performing an easy balance stance. Full details are shown in Table 4.   The aim of this study was to evaluate the performance of EQ Balance, a gamified mobile balance assessment, and Sway, a more traditional mobile balance assessment, in the clinical setting. The specific objectives were to a) determine if EQ Balance results are substantially equivalent to the Sway results and b) evaluate the intrasession reliability, or test-retest reliability of the output of EQ Balance. In the context of this clinical study, test-retest reliability will be impacted by variability in the participants' balance performance as well as measurement variability introduced by the device. The analysis in support of the first objective consisted of a Deming regression. The Deming regression corroborated the correlation analysis, finding that average balance output of EQ Balance and Sway across the entire group was not significantly different from the equation EQ = Sway + 0. This supports equivalence of EQ Balance to Sway Balance for the average balance score across all five stances. This holds true across the range of performance values when healthy and balance-impaired groups are included. However, the Deming regression identified some poses/groups where the slope and intercept were significantly different from 1 and 0, respectively. This indicates that the results of individual poses should not be interpreted interchangeably between the two devices. Instead, these results support the use of the average of all five poses as the primary clinical output. That is, during a clinical balance assessment, no single pose should be assessed in isolation. This is consistent with instructions for use for EQ Balance, Sway (the predicate device), and the mBESS assessment that was used by the clinician to assess the concussed patients included in this study. It is important that device labelling clearly indicates this precaution.
The analysis of the second objective found strong consistency of the task output between test sessions. Consistency of the EQ Balance measure was conducted by calculating the intra-class correlation (ICC) on two consecutive EQ Balance tests. The ICC for the cohort was 0.87 (p < 0.001).
While the mBESS test is a fast, reliable measure of balance, this digital assessment tool provides the potential for individuals to assess their balance independently and track performance over time.
More importantly, such tools provide objective vs subjective measures of balance function and remove the potential effect of inter-rater variability on balance assessment and re-assessment.

Future Directions
The demonstration of repeated measurements of balance allows individuals to establish their personalized healthy balance results. By effectively comparing a new result against this individual's history, as opposed to being compared to a heterogeneous normative population dataset, it may be possible to more accurately identify subtle deviations from an individual's healthy range, thus enabling a more informed assessment of the significance and causality of change.
Future work should continue to characterize typical healthy values and additional factors that may influence EQ Balance results in healthy participants. For example, the effects of transient factors such as acute exercise and sleep, as well as personal differences including body type (e.g., height), history of injury, and physical literacy.
In addition, future studies should examine the reliability of EQ Balance over varying number of timepoints as well as time between points to identify factors such as learning effects.
Future studies should compare EQ Balance to the results of a force platform, the putative gold standard of balance assessment.
Finally, future work could focus on the clinical utility of EQ Balance in screening for, assessing, and monitoring specific conditions and diseases. The post-hoc analysis presented in the current study reveals a significant difference between healthy and impaired-balance groups. Figure 5 shows a cluster of results above 80, and a long tail of results below this value, mostly including balance-impaired patients. These results could provide pilot data for an investigation into a clinically-significant EQ Balance threshold that indicates balance impairment or increased risk of falls. To further substantiate such an investigation, it would be important to collect symptom severity and time since injury.

Limitations
The study sample was limited to 70 volunteers, which could limit the generalizability of the results. Theoretically, a larger sample size should lead to an increase in the strength of the observed relationships; however, one cannot know how the results will generalize to all populations. In addition, no effect of age or sex was observed; however, as this was not the focus of the study, it is likely that the sample size was underpowered to observe such an effect. Finally, it is important to recognize that EQ Balance is not intended to replicate the complexity of an appropriately credentialed clinician-mediated medical history and clinical neurological exam.

Conclusions
EQ demonstrated safety and reliability in measuring balance function in healthy and concussed patient populations. The Deming regression showed strong agreement between average balance measures (the primary device output), and agreement on all poses except right single leg and tandem left. Test-retest reliability was shown by ICC analysis, which demonstrated strong consistency of the task output between test sessions. Poorer balance scores were observed for subjects with balance impairments, which included subjects with brain injury, suggesting potential for further research is recommended on the clinical utility of EQ Balance in screening for, assessing, and monitoring specific conditions and diseases, while the mobile application makes it enticing to study in many different settings. Overall, EQ Balance, as part of EQ Brain Performance, adds value to the existing balance assessment tools as an objective accurate balance measure presented in a novel, gamified way.
Author Contributions: M.K. was responsible for conceptualization, methodology, formal analysis, investigation, resources, data curation, writing, and project administration. The author has read and agreed to the published version of the manuscript.
Funding: This research received no external funding