Automated stage discrimination of Parkinson’s disease

Treament plans for Parkinson’s disease are based on a disease stage scale, which is generally determined using a manual, observational procedure. Automated, sensor based discrimination saves labour and cost in clinical settings and may offer augmented stage determination accuracy. Previous automated devices were either cumbersome or costly and were not suitable for individuals who cannot walk without support. Since 2017, a device has been available that successfully detects Parkinson’s disease and operates for people who cannot walk without support. In the present study, the suitability of this device for automated discrimination of Parkinson’s disease stages is tested. The device consists of a walking frame fitted with sensors to simultaneously support walking and monitor patient gait. Sixty-five Parkinson’s disease patients in HYstages 1 to 4 and twenty-four heathy controls were subjected to supported timed up and go (TUG) tests, while using the walking frame. The walking trajectory, velocity, acceleration and force were recorded by the device throughout the tests. These physical parameters were converted into symptomic spatio-temporal quanitities that are conventionaly used in Parksinon’s disease gait assessment. An ANOVA Test extended by a confidence interval analysis indicated statistically significant seperability between HYstages for the following spatio-temporal quantities: TUG time (p<0.001), straight like walking time (p<0.001), turning time (p<0.001) and step count (p<0.001). A negative correlation was obtained for mean step velocity (p<0.001) and mean step length (p<0.001). Moreover, correlations were established between these, as well as additional spatio-temporal quanitities, and disease duration, levodopa dose, motor fluctuation, dyskinesia and the mobile part of the unified Parkinson’s disease rating scale. We have proven that stage discrimination of Parkinson’s disease can be automated, even to patients who cannot support themselves. A similar method might be successfully applied to other gait disorders. by on a larger cohort. more patients from HY stages 1 and 4 are to provide a balance dataset. The study is a preliminary one and the first to attempt to perform a 5-stage discrimination of PD using the automated frame method. Two statistical analysis methods were implemented in order to show which of the features measure by the device are adequate to imply a significant discrimination between the stages. The results in this preliminary study indicate a potential to provide insights into the manifestation of gait features in PD progression. In addition, the analysis method of confidence intervals overlaps employed in this study is indicated as a reliable and useful metric to convey discrimination between multiple groups.


Introduction
The most common rating scales in Parkinson's disease are the Unified Parkinson Disease Rating Scale (UPDRS) and the Hoehn and Yahr (HY) staging (1). The HY 5-stages scale is the shorter of the two and primarily describes the progression of motor PD (2). This scale is based on the scenario that the motor symptoms of PD begin on one side of the body and then become bilateral, where compromise of balance/gait comes last. The HYscale thus grades PD progression, starting with a unilateral dysfunction (stage 1), following bilateral involvement, initially without postural instability (stage 2), then postural instability develops (stage 3) until physical independence is lost (stage 4) and at terminal stage (stage 5) the patients become wheelchair bound or bedridden. The HYscale is weighted heavily toward postural instability, and does not sufficiently capture impairments or disability from other motor features of PD, such as manual dysfunction or tremor (3). However, where gait disorders are examined, this scale can provide a disease stage description.
The staging of the HYscale involves subjective assessment of the examining physician it may lead to inter-rater , and even to intra-rater variability (4). Particularly, bias has been observed in the discrimination between stage 2 and 3 due to different skills and interpretation between different physicians . The inherent characteristics of the scale as categorical instead of numerical: The scores are not interval scales, hence distances between values on these scales are not quantified. The scale is non-linear in its description of progression between stages; i.e. a Stage 1 PD subject who develops postural instability before developing bilateral signs must be rated as Stage 3, having never been Stage 2, additionaly limits its capability on providing quantitative information. Last, but not least, the examination process involved in HYstage determination takes a considerable time of clinicians and other healthcare professionals and hence is extensive and expensive. This reduces the accessibility and affordability of this assessment to many patients.
The motor part of the Unified PD rating scale (mUPDRS) is a continuous numerical scale.
This sclae offers a more elaborate range of symptoms compared the Hoehn and Yahr (HY), and can complement the assessment of gait disorders (5). However, mUPDRS still shares the limiations of the HYin its non linear unquantified intervals between scores,as well as its length, labour and cost, and probable bias (1).
The timed up and go (TUG) test, is an assessment tool, often coupled with the clinical HY to quantify the gait disorder in a shorter and simpler process. Initially introduced to assess functional mobility in the elderly (6) and in subjects during rehabilitation (7), this test has been proven instrumental for Parkinon's disease stage evaluation. In the standard TUG, the subject is instructed to stand up from a chair, walk 3 meters, turn back, walk 3 meters and sit down on the chair. Test completion time is measured with a stopwatch. This test has a potential to provide an objective measure of disease severity. The procedure still requires, however, the attention of a supervisor and relies on manual stopwatch manipulation.
Moreover, as TUG measures only completion time, it does not quantify its different segments, like straight line and turning time which may provide a more complete gait chatacterization (8).
In view of the aforementioned limitations in prevalent PD severity scales, automated assessment tools were proposed. Automated assessments are inherently more objective and quantitative have the potentioal to aid in quantifying Parkinsons's stage diagnosis and add to both accuracy and efficiency of the assessment process.
Quantitative sensor-based methods were suggested in former studies to quantitatively asses gait disorders in Parkinson's disease. Many of these methods use the TUG protocol, capture the subject's motion and provide quantitative models that discriminate PD subjects' gait from healthy control subjects' gait (9) (10). The sensors used by these methods are either strapped on the patients body, (11,12) or implemented as wearable sensors (13), or are fitted in walkway systems which measure the pressure exerted by the patients' foot as they walk (14,15) . A drawback of the first two methods is their complexity, expense and time demands.
Additionally, these devices are often cumbersome and uncomfortable to wear, thereby negatively affecting the user's experience, especially for motor impaired persons (16,17). The walkway systems offer high accuracy and lower costs but require large physical space and dedicated environment. All three methods are inappropriate for an assessment of severe cases of PD, when the patient requires a walking aid (16).
Previous sensor-based gait data acquisition methods extracted complex and abstract mathematical features from the sensors' outputs used machine learning tools for feature selection and discrimination. These computational analysis studies often used combinations of features, which could not be readily separated (i.e. using the principal component analysis) and interpreted (18)(19)(20). This limits the usage of these methods for clinical use and for providing clinical insight into gait disorders in PD.
Previous sensor-based measurements of gait were used to discriminate PD patients from controls or to detect a specific symptom in PD gait, ie. Dyskinesia (21,22). Their analysis, however, aimed to distinguish between normal and impaired gait (23,24) and did not attempt to assess disease severity or stage. One of the challenges involved in disease stage assessment is that statistical analysis methods can provide significance difference in terms of p-value, but this value is not indicative of the magnitude of differences between the different groups, nor does it quantify the amount of overlap between the groups.
The current study addresses all th aforementioned limitations. The data is acquired by an exo-body walking frame, fitted with sensors to monitor patient gait and support walking concurrently. This device offers a solution to the disadvantages of both strapped-on and walkway methods. Particularly, being a walking aid makes this device suitable for assessments of severe stage of the disease, ie HY4. Preliminary results have shown that this device can provide accurate discrimination of PD patients and control subjects (16). The measurements analysis in the current study considered only features which could be observed and related to the physical properties of the movement, and thus may provide an insight into the condition studied. Extended statistical analyses were employed to quantify these measurements' capability to discriminate the five HY stages of PD. Due to the inherent limitations of the HY scale, the automated analysis results were also tested for correlations with the mUPDRS and with complementary clinical data on the patients and their treatment.

Population
Sixty six consecutive patients diagnosed with idiopathic PD according to the UK bank criteria, attending a Movement Disorders Institute at a tertiary Medical Center were recruited for the study. This patients cohort included stages 1 to 4 of the HY scale. Twenty four Healthy age-matched control subjects (HC, also designated as stage 0 of the HY scale) were recruited from the pool of hospital staff, patients' (unrelated) family members, caregivers or accompanying friends that arrived at the clinic. The exclusion criteria were: PD patients with additional neurologic disorders or any other disorder potentially affecting gait, patients after neurosurgical intervention for PD (such as deep brain stimulation and thalamotomy), patients with balance or gait disorder not related to PD, and patients with musculoskeletal problems causing gait impairment. The study was approved by the local institutional review board of the Sheba Medical Center (Ethics number: 3036-16-SMC). All subjects signed an informed consent form. Ethics approval for re-use of data was obtained from the University of the Witwatersrand, Human Research Ethics Committee, clearance number is M180202.

Instrumentation
The instrumented walker is an off-the shelve aluminum walker frame fitted with an instrumentation kit. The kit includes two force sensors underneath the hand grips that measure the grip force, two digital encoders on the walker's front wheels that measure the position and velocity of the walker and a tri-axial accelerometer in the control box of the walker. An embedded microcontroller (Arduino Nano V3) in the control box executes the commands and control functionality and acquires the data at a sampling rate of 21.5 Hz. The data is written to a secured digital (SD) card in the form of a CSV file (16). This data consists of the trajectory, velocity, acceleration and force signals, which were recorded by the sensors throughout the subjects' walking experiment. The parameters computed from this data include the following spatio-temporal parameters: step count, mean step time, mean step length, mean step velocity, mean acceleration, standard deviation (STD) of step time, STD of step length, STD of step velocity, STD of acceleration, total timed up and go (TUG) time, total walk time, total turn time and cadence. The force sensors provided force, force difference between right and left force sensors and the correlation between right and left force sensors.

Protocol
The study was approved by the local institutional review board and all subjects signed an informed consent form. Each patient underwent a full neurological examination and was rated using part III (motor examination) of the Unified PD rating scale (UPDRS, yielding a m-UPDRS score) and the Hoehn and Yahr (HY) stage was determined. The presence of motor fluctuations and dyskinesia were specifically assessed and noted.
All subjects underwent a TUG test while holding the instrumented walker: Subjects sat comfortably on a chair with no armrest and then spontaeously held on to the instrumented walker and stood up. The subjects then (holding the walker) walked at their natural speed straight ahead towards a cone positioned on the floor (3 meters away from the start line), turned around the cone, walked back and then sat back down on the chair (still holding the walker). If a subject failed to perform the procedure correctly (e.g., due to poor understanding of the task or distraction), that trial was discarded and immediately repeated.

Clinical data
HYstage, motor UPDRS score and the presence of motor fluctuations or dyskinesia were assessed and logged. Complementing clinical data including age at PD onset, disease duration and use of L-dihydroxyphenylalanine-(L-DOPA) in the medication regimen. Age and gender data were logged for all subjects.

Extracting features from the signals
Data analysis was performed on all the signals captured by the walkers' sensors. The preprocessing of the signals included noise and artifact removal, segmentation of the walking into strides in the straight-line walking and turning, and footfall detection (25). The signals were compressed into a set of mathematical variables, which represent the spatio-tempral parameters of gait, ie mean step time, mean step velocity. All these variables have been used in previous sensor-based studies on gait, and are easily interpreted into clinician observation of gait.

Statistical analysis
The study population represented 5 groups: PD patients according to HYstages 1-4 and healthy controls, which may be referred to as HY0, respecting the hierarchical order between groups. The analysis aimed to determine the importance of each feature extracted from the signals, in terms of its discrimination performance of the five groups.
The first task in this analysis was to find the features which provide the highest differences between the groups: HYstages 0 to 4. The Kruskal Wallis ("one-way ANOVA on ranks") test was used to check for significant difference (p-value ≤ 0.05) between the five groups, for each variable, where the variables included both the demographic and clinical variables and the instrumented walker features.
A flaw in the ANOVA analysis methods is that the their p-value is not indicative of the magnitude of the differences between the groups, nor does it indicate an overlap between the groups. The analysis was refined, using Confidence Intervals (CIs), to estimate the probability that the range of values for a specific feature in one group does not overlap with the ranges of that feature in the other groups (26). Plotting the Confidence Intervals can provide a clear visual display of the overlaps and hence of the differences between multiple groups.
The difference between each pair of groups is based on the overlap of the two Confidence Intervals belonging to two groups and is calculated as follows: an overlap greater than 50% corresponds to no statistical significance of difference; less than 50% corresponds to a 95% statistical significance of difference and no overlap corresponds to a 99% statistical significance of the difference (27).
Graphs of the CIs were plotted to illustrate the statistical differences between feature value ranges in the different HYgroups, and CI overlaps computed Both p-values and CI-overlap values were used as metrics to determine the importance of each feature exctracted from the walker signals. A ranking of the features according to these two measure was performed. The ranking indicates which features are most informative in providing HYgroup separability.
Lastly, Pearson's correlation was employed to map the correlations between all pairs of instrumented-walker features and demographic/clinical variables. This correlation data and the corresponding p-values and significance (P<0.05) were listed in a concluding table.

Results
The demographic and clinical characteristics of the study population are provided in Table 1.   Table 3 presents the pair-wise CI overlap percentages for the first six gait features in Table 2.
Zero overlaps, corresponding to a 99% statistical significance of the difference are marked by two asterisks. Overlaps of less than 50%, corresponding to a 95% statistical significance of difference, are marked by one asterisk. All other entries have overlaps larger than 50%, corresponding to statistically insignificant ifference.
The first three rows of table 3 indicate no statistical difference between the controls (HY stage 0) and HYstages 1 and 2 groups. The subsequent rows in the table demonstrate a 95% to 99% statistically significant difference between all other pairs of HYgroups, for all features.
A visual display of the CI analysis is presented in Figure 2. The six graphs in the figure correspond to the six features in Table 3. The overlap or lack of overlap between groups can be observed in this graphical representation, as well as the range of values for each feature and for each group. These figure conveys that the controls -HYstage 0 -and HYstage 1 patients have a higher step velocity and a greater step length, lower step count, slower straight-line walking, shorter turning time and shorter total TUG time. HY stages 2,3 and 4 demonstrate increasingly lower step velocity, smaller step length, higher step count and higher mean straight-line walking, turning and total TUG time in all six graphs of Figure 2.
Patients with HYstage 4 differ significantly in all gait features from all other groups. Table 4  From the two PD rating scales, mUPDRS is correlated with TUG straight line walking time (p=0.02) and with step count (p=0.03) and was inversely correlated with cadence (p=0.005), while HYstage is correlated with turning time (p<0.001), step count (p<0.001), TUG time (p<0.001) and walking time (p<0.001) and is inversely correlated with mean step velocity (p<0.001), and mean step length (p<0.001).

Discussion
The method proposed in this study provided automated discrimination of five HY stage in PD, where previous studies aimed to distinguish between normal and impaired gait or to detect a specific symptom in PD gait, ie. Dyskinesia (21-24) and were not applied for disease severity or stage assessment. Importantly, the gait characteristics which were used in this fiveclasses discrimination provide an easily-imterpretable, quantitative insight into gait change with disease progression. A modified HY scale was introduced by Hoehn and Yahr, which included contain additional "mid-scale" values of 1.5 and 2.5 (28). A five-stage scale is, however, still widely used in clinical practice. The experiment in the study included a ground truth of clinicians' HY stage determination. This was conducted as a part of the regular clinical assessment during the patients' visit at the hospital, as described in the Methods section. This shorter scale was thus the one used for the the study, aiming at a preliminary feasibility of multi-class discrimination.
The discrimination between the healthy controls and the four HY stages, as provided by the gait features, was quantified using two statistical methods: The traditional correlation and pvalue and an augmented confidence intervals analysis.
The correlations and p-values analysis indicated seven gait features which were significantly correlated to the HY stage (Table 2). This analysis could provide, however, only mean and standard deviation values of these gait features for the different HY groups. Moreover, the p-values are not indicative of the magnitude of the differences between the different HY groups, nor does it quantify the amount of overlap between the groups.
The confidence intervals analysis method manifested both the range of gait feature values in each HY group and the amount of overlap between these groups ( fig. 2 and Table 3).
Additionally, the confidence intervals can be portrayed in a graph ( fig. 2) and provide an intuitive, "at a glance" illustration of the differences between the groups, and of the discrimination power of each gait feature. Table 3 highlights the features that provided no overlap, reflecting 99% significance of difference, between the HY groups, and features that provided overlaps of less than 50%, reflecting 95% significance of difference between the groups. This analysis conveys that HY stages 3 and 4 were significantly different from each other and from the earlier stages (HY 1 and 2) and from the healthy controls, according to all six gait features. Not all the features were able to significantly discriminate the earlier stages, HY 1, 2, and the healthy controls. This information was not apparent in Table 2, where only p-values were computed, provided only average discrimination statistics. This highlights the importance of looking beyond p-values in discrimination problems of multiple classes.
As its widespread usage and prolific literature convey, the HY scale effectively identify and discriminate the stages of PD. The PD patients' cohort in this study were labeled and grouped by their HY stage. The four groups (stages 1 to 4) were significantly different in terms of age, disease duration and L-dopa treatment prevalence (Table 1). Similar to previous literature, however, the patients' data in this study does not convey significant correlation of the HY grouping to clinically observed symptoms such as Motor fluctuations and Dyskinesia (29,30).
The gait data acquired by the walker-mounted sensors provided insight into the features that could characterize observed symptoms and clinical information of the patients. Table 4 portrays correlations between the sensor-acquired features and the clinical data, incorporating also the motor part of the UPDRS, Levodopa usage and dose, motor fluctuations and Dyskinesia. Each one of the six features that were indicated in Tables 2 and 3  Unlike previous studies (18)(19)(20), the current analysis considered only features which could be observed and related to the physical properties of the movement, and thus may provide an insight into the condition studied. Additionally this study focused an analysis of each gait feature separately. Each one of the features was separately considered and its relevance and contribution to the discrimination of the five HYstages was quantified. This analysis therefore simplified the interpretation and makes it more useful clinically. Table 4 shows, however, that additional features, such as cadence and imbalance in pressure on the walker's handle sensors may be also manifested in PD progression. These features, however, do not show significant correlation to HY. Previous studies have indicated an enhanced performance of sensor-based gait features, when all features were jointly analyzed using machine learning.
An analysis of the full feature*-set will be performed when a larger sample is collected.
All the above observations and interpretation are limited by the relatively small number of subjects, need to be validated by on a larger cohort. Particularly, more patients from HY stages 1 and 4 are needed to provide a balance dataset. The study is a preliminary one and the first to attempt to perform a 5-stage discrimination of PD using the automated frame method.
Two statistical analysis methods were implemented in order to show which of the features measure by the device are adequate to imply a significant discrimination between the stages.
The results in this preliminary study indicate a potential to provide insights into the manifestation of gait features in PD progression. In addition, the analysis method of confidence intervals overlaps employed in this study is indicated as a reliable and useful metric to convey discrimination between multiple groups.
The present analysis was undertaken to provide insight into the manifestations of gait characteristics in PD stages. In this analysis we deviate from the use of complex features and machine learning (7, 9-16, 18-20, 22-24) and, with an integrated team of expert neurologists and engineers, investigate the task of discriminating 5-stages of PD severity, using only clinically-interpretable features (1-5) and explanatory statistical analysis (26,27). This approach differs from earlier ones by the following traits: 1) Only features that could be clinically interpreted and used to characterize the symptoms observed by clinical experts were included in the analysis. 2) Among these features were parameters from turning segment of the walking test, whereas most previous sensor- The results indicate that six of the gait parameters were able to provide a statistically significant discrimination of HY stages, as well as value ranges for each of the stages, as conveyed by Table 3 and Figure 2. Two of the parameters had 95%-99% in their discrimination between all the pairs of PD stages, one yielded 95%-99% discrimination significance for all pairs except the pair of HC and HY1, and the other three yielded 95%-99% discrimination significance for all pairs except HC-HY1, HC- The prediction power of the six gait features that provided the strongest discrimination between all PD stage groups will be further explored and generalized in future studies employing larger cohorts of patients.
Therefore, although the data size is small, the results convey high statistical significance, implying that the recorded gait parameters and their correlation with HY stages can be used for the discrimination/prediction of HY stages in PD. These results should be validated on a larger cohort of patients. Our findings, however, provide a clinical value of these gait parameters, which is superior to other previously reported methods by introducing only clinically-interpretable parameters, explainable statistical analysis and a capability to discriminate 5 stages of disease severity.