Using Features Extracted From Upper Limb Reaching Tasks to Detect Parkinson’s Disease by Means of Machine Learning Models

While in the literature there is much interest in investigating lower limbs gait of patients affected by neurological diseases, such as Parkinson’s Disease (PD), fewer publications involving upper limbs movements are available. In previous studies, 24 motion signals (the so-called reaching tasks) of the upper limbs of PD patients and Healthy Controls (HCs) were used to extract several kinematic features through a custom-made software; conversely, the aim of our paper is to investigate the possibility to build models–using these features–for distinguishing PD patients from HCs. First, a binary logistic regression and, then, a Machine Learning (ML) analysis was performed by implementing five algorithms through the Knime Analytics Platform. The ML analysis was performed twice: first, a leave-one out-cross validation was applied; then, a wrapper feature selection method was implemented to identify the best subset of features that could maximize the accuracy. The binary logistic regression achieved an accuracy of 90.5%, demonstrating the importance of the maximum jerk during subjects upper limb motion; the Hosmer-Lemeshow test supported the validity of this model (p-value=0.408). The first ML analysis achieved high evaluation metrics by overcoming 95% of accuracy; the second ML analysis achieved a perfect classification with 100% of both accuracy and area under the curve receiver operating characteristics. The top-five features in terms of importance were the maximum acceleration, smoothness, duration, maximum jerk and kurtosis. The investigation carried out in our work has proved the predictive power of the features, extracted from the reaching tasks involving the upper limbs, to distinguish HCs and PD patients.

both accuracy and area under the curve receiver operating characteristics. The top-five features in terms of importance were the maximum acceleration, smoothness, duration, maximum jerk and kurtosis. The investigation carried out in our work has proved the predictive power of the features, extracted from the reaching tasks involving the upper limbs, to distinguish HCs and PD patients.
Index Terms-Machine learning, rehabilitation engineering, modelling.

I. INTRODUCTION
N EUROLOGICAL disorders, with particular reference to neurodegenerative diseases, have a negative impact on the quality of life and are the leading cause of disability and death globally [1], [2], [3]. Among the neurodegenerative movement disorders, Parkinson's Disease (PD) is one of the most common movement disorders after Essential Tremor [4]. The motor dysfunction in PD is due to the degeneration of dopaminergic neurons in the substantia nigra [5]; in addition, the disease also involves the degeneration of neurons in regions of the brain controlling autonomic functions, cognition, and mood [6]. Nevertheless, PD, among the neurodegenerative disorders, is better managed by using a combination of medication and regular physiotherapy. Indeed, physical rehabilitation is considered as an adjuvant to pharmacological treatments for PD to maximize functional ability and minimize complications; exercise increases synaptic strength and influences neurotransmission, thus potentiating the functional circuitry in PD [7]. There is a considerable literature which provides evidence that physical exercise of moderate intensity leads to an increase in the level of dopamine, which suggests that an exercise program for PD patients would be beneficial [8]. For instance, Formisano et al. evaluated the efficacy of physical therapy together with drug therapy in a group of parkinsonian patients, compared with a group of patients treated using drug therapy only; patients treated by means of physiotherapy showed an improvement at the end of the study in both clinical scales and motor performance tests [9]. On the same line, Donisi et al. showed the positive impact of short-time gait rehabilitation in PD patients on gait parameters [10]. Several studies focused on the evaluation of lower limb parameters through instrumentation for gait analysis, since individuals with PD exhibit a gait pattern characterized by short stride This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ length, increased cadence, and reduced velocity [11], [12]. Furthermore, also the evaluation of the upper limb performances is of paramount importance. Several studies, in fact, specifically focused on discovering new rehabilitation tasks to improve the upper limb function in PD patients [13], [14], [15], [16].
In this context, a potential obstacle for an appropriate design and implementation of a rehabilitation task for the upper limbs is related to the high degree of freedom of movement of these limbs, which are among the most flexible in the human body, thanks to a high number of joints. Moreover, the intrinsic complexity of their control mechanism is, among the others, one of the factors which can limit the benefits which exercises (and their therapeutic impact) can bring. Albeit these potential limitations, previous research has documented that the investigation of grasping tasks, aimed at studying kinematic and dynamic aspects of the spontaneous movement of human upper limbs, have allowed to find common kinematic features and patterns which can quite appropriately describe spontaneous human movements [17], [18], [19]. To this regard, several of the more important studies have been proposed by Hogan, who was the first to publish research papers focused on analyzing the movements of healthy subjects from a theoretical point of view; the impact of this research was so important that many physicians, when studying spontaneous arm movements, still refer to these publications [20]. Briefly, the main finding presented by Hogan was that healthy subjects tend to create a straight, regular path without interrupting acceleration, when moving their upper limbs from one point in space to another. That study had such a strong impact on research and technological innovation that, nowadays, the literature is rich in papers that have demonstrated the possibility of extracting a number of useful biomechanical parameters, starting from biomedical signals. Such parameters are suitable to investigate the "quality" of movement of healthy and diseased subjects; even more important, they can be monitored in different rehabilitation settings and for different tasks. Approaches in this sense have been proposed for robot-mediated or robot assisted rehabilitation exercises [21], to improve upper limb [22] or lower limb rehabilitation [23], after spinal cord injuries [24], stroke [25], or PD [26], [27], also with the aim of proposing low-cost robot platforms for physiotherapy [28], [29] and intelligent data-driven approaches for gait training [12], motion reaching tasks [30], [31], [32], and motion prediction purposes [33].
Simultaneously, several studies have aimed to exploiting motion analysis data and comparing different instrumentations for diagnostic purposes [34], [35]. Albeit the promising results, nowadays a key limitation found in previous research is that the biomechanical parameters still could not completely contribute to form an effective and quantitative rehabilitation outcome, e.g., to monitor the improvements in the motor functions of PD patients. In addition, the previous issue could be also ascribed to already not standardized rehabilitation protocols.
On the other hand, in this context, the growing availability of data provided from devices allow the use of data-driven models based on artificial intelligence to find hidden pattern among data and, in turn, to potentially help clinicians in the decision-making process. Machine Learning (ML), in fact, is gaining popularity in several fields to address different issues and purposes, such as to identify potential digital biomarkers for the diagnosis of specific diseases [36], to improve the management of care processes by predicting patients outcome variables [37], [38], [39], to classify biomedical signals like heart rate, respiration [40], [41], [42], [43], [44], [45], [46], and motion [47], [48], [49], with the aim of finding patterns of movements in gait [50], [51], and distinguishing healthy from pathological subjects (e.g. the PD ones), even following a rehabilitation program [52]. In particular, ML has been also used for investigating the lower limb of PD patients; previous research has, indeed, documented that several symptoms of PD have been correlated to gait patterns through ML, for example mild cognitive impairment and freezing of gait [53], [54]. Therefore, in this last context, ML has demonstrated to be a useful technique for analyzing data from motion analysis, since it is often possible to extract quantitative features describing movement which can be given as input to the algorithms.
However, although the findings suggest that an exercise program is beneficial to PD patients, this area still lacks validity. Indeed, there is the need, not only to standardize the rehabilitation protocol, but also to define measurable, repeatable, and reliable indicators to monitor the effectiveness of the training program over time and to provide a quantitative measure of the rehabilitation outcome. This is a necessary step towards the design of personalized treatment programs, where the rehabilitation exercises are centered on both the patients characteristics and the motion impairment, or on specific pathology they are affected with.
In our previous paper [55], we have extracted, from the available motion signals, a set of features which have been demonstrated, through a statistical analysis, to be useful for distinguishing PD patients from healthy controls. Starting from the study developed in [55], the main contribution of this paper is assessing whether these features may serve as a basis to define a potential predictive model to diagnose PD.
To this end, we first apply a binary logistic regression (BLR) procedure in order to avoid the inclusion of correlated variables into the model; moreover, both Cook's distance and Center Leverage value are used to check the presence of outliers. Finally, the goodness of the BLR model will be checked through the Hosmer-Lemeshow test.
Then, our investigation is reinforced through the comparison of several ML algorithms, in order to find a subset of the above-mentioned features, composed of those variables that are more meaningful, to monitor the patients status and, eventually, a rehabilitation treatment. In particular, ML models are implemented twice. The first analysis consists in including all the features from [55] and model the data by using a leaveone-out cross-validation method. In the second phase, the dataset is split into two parts, as per hold-out cross-validation, and the wrapper method is exploited in order to find the best subset of features that could maximize the model performance, We shall show that the results obtained by the ML approach are extremely positive, since, from one hand they confirm the outcome of the BLR approach, and, from the other hand, almost all algorithms achieve a perfect score as regards all the evaluation metrics.

A. Data and Signal Processing
Retrospective data and signals were collected for 12 subjects: six of them were healthy (from a population of healthy individuals with: age = 40.0 ± 5.7 years old; Body Mass Index (BMI) = 26.0 ± 4.0) while the other six were Parkinsonians (from a population of pathological individuals with: age = 51.7 ± 13 years old; BMI = 27.6 ± 4.6).
Signals consisted in an angular displacement, acquired through goniometer sensors, by implementing a kinematic task made up of four movements performed by the upper limb, as also reported elsewhere in the literature [56], [57], [58]. The movements, a horizontal reaching task and a vertical one, have been performed along two axes in a two-dimensional plane starting from a reference position. During the acquisition, the patient is in an upright position, with a straight trunk and neck and the gaze fixed on the central point of the plane, on which the reaching movements are implemented. Patients explore four positions: top, bottom, right, and left; nevertheless, these movements can also be divided into eight distinct kinematic phases, four of elevation and lowering in the sagittal plane and four of extension and flexion in the horizontal plane.
Signals have been processed and analyzed to extract a set of kinematic parameters that were used to distinguish the two groups through ML algorithms. Each subject performed the task twice for a total of 24 signals.
Signal processing has been carried out in Matlab (Math-Works, R2021a, Natick, MA, USA). In accordance with other methodological approaches to process motion signals proposed in the literature [55], [59], [60], [61], [62], after the signal acquisition, velocity, acceleration, and jerk profiles have been estimated from discrete position signal versus time information by means of a derivative operation. Based on the velocity profile, eight submovements have been detected by means of an iterative algorithm aimed at identifying the onset and offset of each submovement from the detection of local minima and maxima of the velocity curve above and below a predetermined threshold. After the segmentation of the signal into eight submovements, kinematic and statistical descriptors of each submovement have been computed, namely: • amplitude and duration; • mean velocity; • the maximum values for both velocity, acceleration, and jerk; • the coefficient of symmetry of the submovement curve; • mean and mean square root values of the position profile; • variance, skewness, and kurtosis of the velocity profile; • the smoothness factor.

B. Binary Logistic Regression
A BLR was performed on the dataset to measure the capability of distinguishing pathological signals from healthy ones through a traditional statistical modeling technique. Before building the model, a few conditions should be checked. The features to be used as input were reduced by studying the coefficient of correlation among the variables; in particular, the features correlated with a coefficient lower than 0.70 were kept. Then, in order to remove possible outliers, the Cook's distance where • n is the number of observations • y j is the j-th fitted response value.
• y j(i) is the j-th fitted response value, where the fit does not include observation i.
• MSE is the mean squared error.
• p is the number of coefficients in the regression model, was computed together with the Center Leverage value [63]. They are both statistics of influence; the former provides an indication of how much influence a single case has over a regression model, while the latter is a measure of the effect of a particular observation on the regression predictions, due to the position of that observation in the space of the inputs.
Finally, the goodness of fit with the confusion matrix were computed.
SPSS Statistics (IBM, v. 25, Armonk, NY, USA) was used to perform this analysis.

C. ML Tool and Algorithms
In order to overcome the limitation of the BLR, a ML analysis was conducted. It was performed through Knime analytics platform (v. 4.2.1), an open-source platform for developing workflows of ML analysis [64], which has been recognized as one of the best choice in this context [65]. It has been widely employed in the literature for conducting biomedical studies of various types, such as foetal well-being [66] and motion analysis [35]. In this research, ML models were implemented twice. The first analysis consisted in including all the features and model the data by using a leave-one-out cross-validation. In the second analysis, the dataset was split into two parts, as per hold-out cross-validation; on the training set (70% of the total) the wrapper method was used to find the best subset of features that could maximize accuracy, while the model created with the training set was then applied on the test set, and accuracy, sensitivity, specificity and Area Under The Curve Receiver Operating Characteristics (AUCROC) were computed as evaluation metrics [67]: • Specificity (equation 2): ability to correctly classify subjects not belonging to the examined group, • Sensitivity (equation 3): ability to correctly classify subjects belonging to the examined group, Cook's distance and Center Leverage Value to detect the presence of outliers. The record with a center leverage value greater than 0,6 and Cook's distance greater than 0,6; 3 outliers were removed.
• Accuracy (equation 4): overall percentage of correct predictions, • AUCROC a qualitative index ranging from 0 to 1 for the binary classification with 0.5 indicating a classification not better than random guessing, where, as usual, TP, TN, FP and FN stand for true positive, true negative, false positive and false negative respectively. Several algorithms were implemented to classify the data into healthy and Parkinsonians. Decision Tree (in the form of J48) and its ensemble form Random Forest (RF) were used as tree-based algorithms, while k-Nearest Neighbour (k-NN) and Support Vector Machine (SVM) were implemented as instance-based algorithms; finally, Naïve Bayes (NB), which is based on the a priori probability theorem of Bayes, was the last algorithm implemented. As regards hyperparameters tuning, RF had 100 trees, no maximum depth and split criterion set on information gain ratio, k-NN had a value of k equal to 3 and no distance weighting and SVM was based on the linear kernel.

III. RESULTS
As anticipated in the previous section, the first analysis was a BLR; the correlation among the variables was checked in order to avoid the inclusion of correlated variables into the model, and both Cook's distance and Center Leverage value were used to check the presence of outliers; at the end of the process three subjects were removed (Fig. 1). The results of this first analysis are shown in Table I, while the matrix of correlation is shown in Table II. In summary, the accuracy of the BLR model was 90.5% and the maximum jerk resulted statistically significant in the multivariate analysis (p-value=0.026).
In conclusion, the results of the Hosmer-Lemeshow test confirmed the goodness of the BLR model (p-value=0. 408).
Secondly, a ML analysis was implemented by including the whole dataset (i.e. 13 features and 24 records) and cross validating the data through a leave-one-out cross-validation. The features employed in this analysis were the average of all features related to the eight sub-movements (the results for each sub-movement are shown in the Supplementary Tables S1-S8). Table III shows the evaluation metrics per each algorithm.
Following this analysis, a feature importance based on the split criteria of the RF was computed; as illustrated in Fig. 2 the most important features were maximum acceleration, smoothness, duration, maximum jerk and kurtosis.
Thirdly, the ML analysis was implemented after a feature selection. The wrapper method was used to build the models (J48, RF, KNN; SVM, NB) on the training set and then applied on the test set to compute the evaluation metrics (Table IV).
The results were extremely positive, since four algorithms out of five achieved a perfect score as regards all the evaluation metrics (accuracy, sensitivity, specificity, AUCROC). In order to investigate the pattern of data, they were also investigated graphically. Fig. 3 shows that the data were almost linearly separable, thus explaining the reason why so many algorithms were able to achieve a perfect classification score.

IV. DISCUSSION
In this paper, displacement signals from healthy and PD patients were acquired and processed to obtain a set of kinematic parameters describing the upper limb movement. These parameters were used as input features to perform a BLR and then fed as input to five ML algorithms based on different operating principles.
The BLR techniques showed interesting results by reaching an accuracy of 90,5% and identifying the maximum jerk as an important feature which is reasonable since the presence of jerk in the movement can be considered a characteristic of PD patients. After applying the BLR, in order to have a larger overview of the results, the ML algorithms were implemented because they can be used without the assumptions of the BLR.
The models identified in this research paper showed high goodness in light of the high evaluation metrics obtained (up to 100% of accuracy, sensitivity, specificity and AUCROC). This finding, using the wrapper technique, was not unexpected due to the (almost) linear separability of the two classes (Fig. 3) for the selected features. Moreover, the use of the wrapper technique in the ML analysis allowed us to identify the most important features useful to distinguish the groups. As Fig. 2 shows, smoothness and duration demonstrated both informative. These results are consistent with those found in the previous study [55] where the statical analysis revealed both smoothness and duration were statistically significant. It is worth noting the among the top-5 features both the BLR and the ML reported maximum jerk.
The promising findings obtained, to the best of the authors knowledge, could represent a partially unexplored strategy in this field, which set this study apart from other scientific contributions on these topics, which are discussed with greater details in the following.
Recently, Bai [16] demonstrated that the combination of drawing tests and inertial sensors was an effective method to acquire biomedical signals (namely, time-frequency spectra) of the upper limbs movements; nevertheless, Kotsavasiloglou et al. previously demonstrated that a simple drawing tests setup (using a commercial tablet connected to a PC) could be enough to extract features to be fed to ML algorithms in order to classify unknown healthy/PD subjects based on their line-drawing performance [68]. The authors found subsets of the extracted features (linked to the kinematics of hand motion during line drawing tasks) demonstrated an effective predictive power to distinguish healthy and PD subjects; the ML algorithms used (like those used in this study) showed really high scores (greater than 85%) considering the best possible features subsets. In the same year, Butt et al. presented a different strategy that demonstrated able to achieve an effective classification of healthy and PD subjects. This scatterplot shows how the data are almost linearly separable when considering smoothness and duration. This plot should only be considered as an explanation for the high results of ML.
In this case, the authors acquired time domain information using a non-contact optical device (namely, the Leap Motion Controller) to further extract biomechanical features related to four tasks, including the "forearm pronation/supination" task that can be considered similar to the ones considered in this paper. Butt et al. demonstrated that the 17 overall features extracted from the four tasks, when fed to different ML algorithms, allowed to fairly classify healthy and PD patients' (accuracy higher than 75%) [27].
Later, the same authors also investigated the possibility to use other strategies to acquire biomedical signals, focusing, in particular, on wearable sensors. For instance, in the recent work [69], an Inertial Measurement Units based wireless device ("SensHand V1") was exploited to acquire acceleration and angular rate data from the same (plus two novel) biomechanical tasks. After the post-processing operations (the authors extracted up to 23 features for the six biomechanical tasks related to the upper limbs), the extracted features were fed to both RF and SVM demonstrating promising scores for both distinguishing healthy and PD subject and, in addition, to classify healthy, PD and idiopathic hyposmia patients. These results were simultaneously confirmed in another work of the group, where the authors considered more in deep potential differences between the limbs and increased ML classification difficulty. The achieved scores (adding NB as additional algorithm) confirmed the promising findings of [52], where RF demonstrated the best classifier. Later, the same group published another paper [70] were the number of patients (idiopathic hyposmia excluded) were more than doubled, NB as classifier was considered in addition to RF, and SVM -where linear and Gaussian kernels were implemented simultaneously with a third-polynomial kernel -and different features datasets were analyzed. Overall, the metrics achieved by the algorithms resulted promising, since all the classifiers showed excellent ability in distinguishing the two groups, according to the motor performance analysis. For all the datasets, algorithms accuracies proved higher than 90% [70].
Earlier in 2018, Belgiovine et al. conducted a similar work either to detect and classify L-dopa-induced dyskinesia [71] or to detect dyskinesia in upper or lower limbs [72] separately. In both works, 18 PD patients were enrolled and each of them worn -on the most compromised arm -a commercial smartwatch, integrated with accelerometer and gyroscope, while performing several tasks (e.g., writing). After a postprocessing step to extract time and frequency domain (28 and 168, respectively) features from acceleration and angular velocity signals, the features were fed to both decision tree and (linear and Gaussian) SVM, demonstrating, on average, that SVM performed better with respect to Decision Tree and linear and gaussian kernel SVM showed comparable performances. In further research, the authors enrolled three more patients, confirmed the previous results and indicated the overall strategy allows the detection of unknown patients in real-time [73].
Finally, a recent study of Monje et al. also demonstrated that the findings -based on data extracted from signals acquired using wearable sensors -agree with data collected with other strategies (e.g. video analyses); in this context, the authors reported a significant negative correlation of the e.g. pronation-supination movement of the hand of the most affected site -of a cohort of PD patients (22) -with the corresponding results found using wearable sensors [74].
Based on the discussion above, it is possible to conclude that, with respect to the existing literature, the strength of our study does not rely on the mere application of ML algorithms for distinguishing PD from healthy controls; rather, it is based on the use of the features listed in Section II-A. to diagnose PD, and specifically on the subset of the more significant variables showed in Fig 2. In particular, the almost linear separability of the two groups observed in Fig. 3, obtained thanks to the optimal choice of the features, provides evidence that the tasks performed by healthy and parkinsonian patients could be straightforwardly distinguished; at the best of our knowledge, such evidence is not found elsewhere in the literature. Moreover, it is worth noting that, differently from most of the previous studies, the motion signals, which have been exploited in our work, are related to various upper limbs reaching tasks. In addition, the features extracted according to our methodology, could be also used for other purposes, since they also exhibits clinical relevance. To this regard, other researchers have recently shown, after a rehabilitative therapy of 10 weeks, that smoothness increases while jerk decreases in PD patients, testifying that these features are sensitive enough to be used for rehabilitation purposes [75], therefore suggesting that the research is already going toward this approach.
Of course, this study has some limitations. The most relevant is represented by the low number of patients, which does not allow us to perform a further validation of our analysis. Consequently, acquiring a larger cohort of patients would allow to investigate more deeply patients affected by PD, parkinsonism, and related symptoms.

V. CONCLUSION
In conclusion, this paper has proved the predictive power of features -related to upper limb kinematics tasks -fed to ML algorithms to distinguish healthy and PD patients. The ML algorithms achieved promising high scores; in fact, more than one algorithm presented a very high metric (mainly after using the wrapper method). Consequently, addressing the limitations highlighted in the previous paragraph, the data obtained suggest the approach could be applicable to find other powerful features subsets able, not only to distinguish healthy subjects from, potentially, PD or other pathologies, but also to define helpful rehabilitation outcome metrics in subjects whose symptoms could show a non-conventional behaviour of upper limbs motor tasks.