On the Use of t-Distributed Stochastic Neighbor Embedding for Data Visualization and Classification of Individuals with Parkinson's Disease

Parkinson's disease (PD) is a neurodegenerative disorder that remains incurable. The available treatments for the disorder include pharmacologic therapies and deep brain stimulation (DBS). These approaches may cause distinct side effects and motor responses. This work presents the application of t-distributed stochastic neighbor embedding (t-SNE), which is a machine learning algorithm for nonlinear dimensionality reduction and data visualization, for the problem of discriminating neurologically healthy individuals from those suffering from PD (treated with levodopa and DBS). Furthermore, the assessment of classification methods is presented. Inertial and electromyographic data were collected while the subjects executed a sequence of four motor tasks. The results were focused on the comparison of the classification performance of a support vector machine (SVM) while discriminating two-dimensional feature sets estimated from Principal Component Analysis (PCA), Sammon's mapping, and t-SNE. The results showed visual and statistical differences for all three investigated groups. Classification accuracy for PCA, Sammon's mapping, and t-SNE was, respectively, 73.5%, 78.6%, and 96.9% for the training set and 67.8%, 74.1%, and 76.6% for the test set. The possibility of discriminating healthy individuals from those with PD treated with levodopa and DBS highlights the fact that each treatment method produces distinct motor behavior. The scatter plots resulting from t-SNE could be used in the clinical practice as an objective tool for measuring the discrepancy between normal and abnormal motor behaviors, being thus useful for the adjustment of treatments and the follow-up of the disorder.


Introduction
Parkinson's disease (PD) is one of the most common neurodegenerative disorders, which remains incurable and affects approximately 3% of the population over 65 years of age [1]. Patients affected by PD may have resting tremor (oscillatory movement), bradykinesia (slowness of movement), rigidity (increased muscular tone), and impairment in their ability to initiate and sustain movements [1][2][3][4]. e PD incidence ratio is expected to increase as people live longer; thus, aging is an important risk factor in PD [5]. e disease diagnosis is usually a critical point. It is estimated that currently 20% of patients are not correctly diagnosed [6]. According to a review [7] which evaluated the accuracy of clinical diagnosis of PD from 1988 to 2014, the correct diagnosis is crucial for prognostic and therapeutic reasons and clinical, pharmacologic, and epidemiologic studies as well. Despite advances in neuroimaging and genetics, the diagnosis of PD remains primarily clinical [7].
Epidemiology is the study of how often diseases occur in different groups of people and why [8]. e quantitative element of epidemiological studies is directly related to the diagnosis of a disease, in this case, PD. If a subject is misdiagnosed with PD, this affects the statistics of epidemiological studies and vice versa. Furthermore, this information is used in many types of research.
A number of rating scales are used for the evaluation of motor impairment and disability in patients with PD. e Unified Parkinson's Disease Rating Scale (UPDRS) is the most well-established subjective scale for assessing disability and impairment [9,10]. Such scale is composed of four parts: Part I (nonmotor experiences of daily living), Part II (motor experiences of daily living), Part III (motor examination), and Part IV (motor complications). ere are a number of alternative rating scales that are used for the evaluation of motor impairment and disability in patients with PD, but these scales have not been fully evaluated for validity and reliability [2]. Due to these subjective methods that are currently used and the need for improving the diagnosis and treatment efficacy, studies must be performed to provide feedback for neurologists during clinical evaluation of patients, reducing the time and effort required to achieve optimal outcomes and improving the treatment.
Some of the PD symptoms can be reduced with pharmacological and/or surgical intervention, and the lifespan of the patients can consequently be extended. e drug levodopa (LD) is one of the most effective and widely used for PD treatment [11,12]. Surgical interventions, such as pallidotomy (ablation/lesioning) and Deep Brain Stimulation (DBS), have also established efficacy in the treatment of PD [13].
DBS therapy delivers electrical stimulation to areas in the brain, alleviating PD motor symptoms. e patient is a candidate for this type of therapy if the symptoms do not respond effectively to levodopa [14].
Regarding the differences between DBS and medicationbased treatments, several studies [14][15][16][17][18] show comparative results. Most of these studies assess PD patients treated with DBS versus medication employing subjective scales to evaluate each method. ey found that DBS provided better outcomes in motor activity. Furthermore, the authors highlighted that the group which received neurostimulation is more susceptible to serious adverse effects, including fatal cerebral hemorrhage.
An extensive review suggests that the major surgeryrelated risk is intracranial hemorrhage and the overall incidence of hemorrhage was 5.0%, with symptomatic hemorrhage occurring in 2.1% of patients and hemorrhage resulting in permanent neurological deficit or death in 1.1% [19].
Additionally, objective approaches to evaluate DBS and medication-based treatments are not well explored. Machado et al. [20] conducted a study to compare, in an objective way, three groups of subjects (i.e., PD patients treated with DBS and levodopa, PD patients treated only with levodopa and healthy subjects). Each subject performed a set of static and dynamic tasks. e aim of the study was to introduce a method for automatic classification among these groups in a high-dimensional space.
Although several studies investigated and compared DBS versus medication-based treatments by means of rating scales (e.g., UPDRS) until now, just a few studies used objective methods for comparing and visualizing the possible differences between patients treated differently. As reported in [14,18], subjects treated with DBS plus medication presented better results than medication treatment alone in terms of motor behavior. In this way, an automatic classification of these groups could be able to compare them and show if patients treated with DBS present the expected improvements or/and if they have the DBS parameters correctly set.
A relevant area for data visualization is dimensionality reduction (DR). DR focuses on keeping data relationship from high-dimensional (e.g., original data) to lowdimensional (e.g., reduced data) spaces. In addition, DR methods are used to simplify data visualization, making it easier for human evaluation. Data visualization is an important application of DR. It is the study of the visual representation of data through graphical representations, and it is effective in exploratory data analysis [21,22]. DR algorithms can be divided into different categories based on different criteria, e.g., linear and nonlinear dimensionality reduction algorithms. Classically, the problem of dimension reduction and data representation has been approached by applying linear transformations such as the well-known principal component analysis (PCA) [23,24].
ose linear techniques focus on keeping the lowdimensional representations of dissimilar data points far apart. However, PCA is not capable of representing higher order, nonlinear, and local structure in the data. In the last decades, some nonlinear DR algorithms have been proposed to deal with complex nonlinear data.
Many nonlinear and linear DR methods are reported in the literature [25,26]. In this paper, three of these methods are assessed: PCA [23], Sammon's mapping [27], and t-distributed stochastic neighbor embedding (t-SNE) [28]. Features in a low-dimensional space are classified based on their ability to discriminate neurologically healthy individuals, individuals suffering from PD treated with levodopa and individuals suffering from PD treated with DBS.

Participants and Data Collection.
is study was conducted in the Federal University of Uberlândia (UFU), Uberlândia, Brazil, and at the University of California, Los Angeles (UCLA), USA. Both institutions provided ethical approval for the experimental procedures (CAAE 07075413.6.0000.5152; UCLA IRB 14-001491). A complete description of the procedure employed for data collection is available in [20]. e dataset consists of motor task measurements collected from 38 subjects. e subjects were divided into the following groups: neurologically healthy individuals (S H � 10), individuals suffering from PD treated with levodopa (S PD � 16), and individuals suffering from PD treated with DBS (S DBS � 12). All the subjects with PD that participated in this study were rated as 2 (i.e., bilateral or midline involvement without impairment of balance) or 3 (i.e., bilateral disease: mild-to-moderate disability with impaired postural reflexes; physically independent) by using Hoehn and Yahr scale [29]. e dataset used in this study resulted from four motor tasks depicted in Figure 1, performed by the volunteers: finger taps (Task 1 -T1), finger to nose (Task 2 -T2), supination and pronation (Task 3 -T3) and rest (Task 4 -T4).
Each subject executed the sequence of four tasks depicted in Figure 1 five times. At least 30 s was allowed for rest after the end of the execution of each sequence (from tasks 1 to 4).
During the execution of the tasks, two sets of three-axial inertial sensors (i.e., accelerometer, gyroscope, and magnetometer), weighing 1 g each, were positioned on the dorsal surface of hand and forearm. Two pairs of disposable electromyographic (EMG) sensors were placed on the muscles flexor and extensor of the forearm. Both inertial and the envelope of EMG signals were digitized at 50 Hz. Figure 2 illustrates typical waveforms of resultant components (i.e., a combination of x, y, and z coordinates) for the inertial sensors and the signal envelope for the electromyographic activity. e periods of the sequence of executed tasks (T1, T2, T3, and T4) are delimited by rectangular windows, indicating the beginning and end of each task.
Since each subject repeated each task five times, it was computed the coefficient of variation (CV) [30] to estimate the ratio of the standard deviation to the mean among the repetitions. For the reproducibility perspective, CV value can be used as one parameter to guide other studies in the reproduction of the experiment results.
On average, Table 1 shows the coefficients of variation for the subjects per group. S H presented lower CV value among the three groups indicating that the subjects from this group do not vary in terms of the motor pattern as much as subjects from S PD and S DBS groups. On the other hand, subjects from S PD and S DBS groups vary more, which is expected once they suffer from PD presenting different motor patterns according to their physiological conditions (e.g., under medication and anxiety).

2.2.
Steps for Data Processing. Focusing on data visualization and the discrimination between healthy subjects from those suffering from PD, the present study assesses features estimated from data projection techniques (PCA, Sammon's mapping, and t-SNE) classified by a support vector machine (SVM) classifier. e main steps of this study are shown in Figure 3. e extracted features were standardized (step 2a in Figure 3) and then split into training and test sets (step 2b in Figure 3). e high-dimensional feature vectors of the training set were submitted to dimension reduction (step 3 in Figure 3). e corresponding low-dimensional map point for the test set was produced by means of an out-of-sample extension technique (step 4 in Figure 3). is step was accomplished by using an artificial neural network (ANN).
Feature reduction was followed by supervised learning and classification, which was achieved through SVM [31] (step 5 in Figure 3). ese steps aim to evaluate the DR techniques in order to explore the PD motor task data. Each used method is described in detail in the following subsections.

Feature Extraction.
Feature extraction was performed over the filtered signals (FS), the instantaneous amplitude (IA), and the instantaneous frequency (IF), estimated from the Hilbert transform [32], as pointed out in the step 1c of Figure 3. e following features, which are fully described in Table 1 of [33,34] For each method (i.e., FS, IA, and IF), a feature matrix was created containing the features extracted from all sensors. In addition, it was analyzed the combination of features estimated from each method: FS-IA, FS-IF, IA-IF, and FS-IA-IF. e aim was to identify which combination could provide the best discrimination results. e preprocessing methods (step 1a in Figure 3) are fully described in [20].

Data Standardization and
Splitting. Since we have data from different sensors (i.e., accelerometer, gyroscope, magnetometer, and electromyography) which are on different scales, it is common to standardize the data. us, the features were standardized by using the zscore method (step 2a in Figure 3), where x is the feature to be standardized, μ is the mean of the feature including all samples, and σ is the standard deviation of that feature. e standardized feature vectors were then separated randomly into training and test sets (step 2b in Computational and Mathematical Methods in Medicine Figure 3) comprising 90% and 10%, respectively, of the data from each group of subjects (S H , S PD , and S DBS ) before proceeding. A strict separation between training and test sets is crucial for a more real and reliable evaluation of the automated classification task. is is an improvement while compared to the study described in [20], where the dimension reduction step was applied to the entire dataset prior to machine learning.

Unsupervised Dimension Reduction Analysis.
In this work, three unsupervised DR methods were evaluated (step 3 in Figure 3). e first one was the linear feature reduction PCA [23,24]. e second was Sammon's mapping, one of the first nonlinear mapping algorithms for analysis of multivariate data [27]. e third, also a nonlinear mapping technique, was t-SNE of van der Maaten and Hinton [28]. t-SNE is an improved variation of the stochastic neighbor embedding (SNE) [35]. t-SNE tries to place a point from high-dimensional space in a low-dimensional one so as to preserve neighborhood identity. e SNE algorithm converts Euclidean distances between high-dimensional data points into conditional probabilities representing similarities; closer data points mean high similarity. e similarity of data point x j to data point x i is represented by the conditional probability p j|i . ese similarities express the probability that x i would select x j as its neighbor. For the low-dimensional counterparts y i and y j of the high-dimensional data points x i and x j , it is computed a similar conditional probability denoted by q j|i .
Once conditional probability distributions are calculated for the data points in both the high-and low-dimensional representations, the goal of the algorithm is to minimize the mismatch between the two. e cost function (Equation (2)) which should be minimized is the sum of Kullback-Leibler (KL) divergences over all points using a gradient descent method: in which P i represents the conditional probability distribution over all data points given a data point x i and Q i represents the conditional probability distribution over all other map points given map point y i . t-SNE improves SNE in two points [28]: (1) by using a symmetrized version of the SNE cost function with simpler Results of the application of the windowing and filtering steps described in [20]. e distinct tasks (T1, T2, T3, and T4) are separated by pulses.  (2) by applying Student's t-distribution rather than a Gaussian to compute the similarity between two points in the low-dimensional space.
For each of employed DR method, the high-dimensional data (i.e., all features estimated from EMG, accelerometer, gyroscope, and magnetometer sensors) were reduced to a two-dimensional space. Data projections were carried out for each scenario or experiment (see Section 2.5.1 for more details) and then a scatter plot of the obtained projection was generated (step 4c in Figure 3) so that possible differences among the studied groups could be visualized.

Parameter Setting. Sammon's mapping and t-SNE
have several free parameters, such as the number of iterations for which the cost function optimization is processed and the learning rate used in the gradient descent method. In addition, t-SNE has perplexity parameter, which can be defined as a smooth measure of the effective number of neighbors.
In our experiments, we did an exhaustive search in order to evaluate the influence of each DR parameter in the quality of the generated maps. All the parameter settings are shown in Figure 4.
Each DR method was evaluated across some experiments without repetition (same combination more than once), which are composed by different parameter settings (as shown in Figure 4); for example, PCA experiments are arranged by the combination of preprocessing methods (v) and tasks (τ), resulting in 28 experiments. Following, with a total of 700 experiments is Sammon's mapping by the combination of v, τ, number of iterations (l), and learning rate (η). Lastly, t-SNE experiments combine all parameters depicted in Figure 4, which sums 3,500 experiments.
For each setup shown in Figure 4, the procedure was (1) execute DR method; (2) execute the out-of-sample process; (3) train and test the SVM classifier; and (4) compute performance indices in order to evaluate the parameters setup.

Out-of-Sample Extension.
A plenty of nonlinear DR methods only map a given finite set of data points to lowdimension, not providing a built-in way to map new data points to the corresponding low-dimensional representation. Sammon's mapping and t-SNE fall into this category of DR methods. e training set of high-dimensional data x i and their corresponding mapped low-dimensional representation y i was used to train a feedforward neural network with weights w, which act as a mapping function f∶x i → y i in which for each x i , we have a y i to determine the lowdimensional representation of the test set (step 4b in Figure 3).
Before proceeding to use an ANN, the high-dimensional training set passes through PCA by preserving 90% of the total variance of the data (step 4a in Figure 3). is step avoids the curse of dimensionality [36] and speeds up ANN training. Bayesian regularization backpropagation [37] was the training function used to update w and bias values. e analysis of the lower dimensional data was performed by means of the evaluation of classification results.

Classification Analysis.
In order to evaluate the DR techniques, a supervised machine learning classifier, support vector machine (SVM), was employed for data classification (step 5 in Figure 3). Once trained, the model was crossvalidated using a leave-one-out (LOO) method and the cross validation loss of the model was calculated. rough empirical tests, the best parameters for our SVM classifier were Gaussian kernel function with 0.35 for kernel scale.
Classification accuracy was defined as  Success rate was defined as where R TP is the true positive rate, v indicates the number of preprocessing methods, and τ represents the number of tasks. Cross validation is a statistical method for assessing how the result models will generalize to an unknown dataset [38]. In this research was used LOO cross validation, where the number of folds equals the number of samples in the dataset. us, the SVM algorithm was applied once for each sample, using all other samples as a training set and using the selected samples as a single-item test set. As we have three classes (i.e., S H , S PD, and S DBS ), it was employed a multiclass classification [39] in a one-versus-all strategy, which employs binary classifiers to assume that one class is positive and the rest are negative.

Results
e experimental results of the assessed classification methods are shown in this section.
One hundred and seventy-one training samples were collected from 38 subjects within the training set, each composed of 408 to 1,224 dimensional features, which were reduced to two-dimensional features and evaluated with leave-one-out cross validation (LOO CV). e rest of the samples, which is 10% as described in Section 2.4, compose the test set. Each data from the test set was submitted to the out-of-sample extension in order to be mapped in a 2-dimensional space. In the end, these 2D points were labeled by the SVM model. Figures 5-8, we show some of the results of our experiments with PCA, Sammon's mapping, and t-SNE on the datasets built with the tasks depicted in Figure 1. e visualizations are scatter plots representing dimensionless scores of the projection of highdimensional feature vectors. Additionally, it was drawn the decision boundary generated by a multilayer feedforward network in such a way to enhance the visual analysis.

Visual Representation of Mappings. In
Each setup, as depicted in Figure 4, creates one scatter plot. e scatter plots shown in Figures 5-8 were selected using a quality ratio defined as where OSR is the overall success ratio defined by where TP is the number of true positive of all classes and TNS is the total number of samples. Since OSR is given in percentage and could range from 0 to 100%, QR also follows this interval. is ratio aims to guide in the selection of scatter plots which reach best results in the classification process, considering each DR method and each task. In this way, Figures 5-8 represent the scenarios which achieved higher quality ratio. Table 2 summarizes the parameters and performance values for each selected scenario.
Analyzing Table 2, t-SNE achieved better performance in all scenarios, reaching mean QR of 99.42%. Secondly it was Sammon's mapping with mean QR of 90.72% and finally PCA with mean QR of 81.36%. Finger to nose (T2) was the task with highest QR value considering all DR methods, and Rest (T4) was the task with the lowest performance.

Classification Performance of Projected Data.
Figures 9 and 10 present the boxplots of success rate (normalized between 0 and 1, in which 1 means 100%) for the data from the training set and test set, respectively. In Figure 9, for all three classes of data, the true positive success rate distribution remains similar, except for PCA for the S PD  class. In Figure 10, the true positive success rate of Sammon's mapping and t-SNE were similar and higher than PCA for S H class. For S PD and S DBS classes, t-SNE yielded superior performance.
Analyzing the boxplots of Figure 9, it is observed that there is a clear difference among all DR methods, whereas in Figure 10 for S DBS group, there also was a difference among DR methods, but for S H and S PD groups, the difference was not clear.
In order to confirm the analysis of boxplots, a statistical test was conducted. Only Sammon's mapping and t-SNE were considered for statistical analysis since the PCA method has one value in the context of boxplots. e normality presupposition was not satisfied for any of the distributions. e normality presupposition was verified by means of the onesample Kolmogorov-Smirnov test. Table 3 presents the p values estimated by means of the two-sample Kolmogorov-Smirnov test between success ratios achieved by Sammon's and t-SNE methods. e statistical difference of 95% was confirmed for all cases, except for S H group from the test set.
Overall, these findings show that when t-SNE is combined with either the SVM algorithm, a notable improvement is seen over other investigated DR methods. When examining the mean of each distribution shown in Figure 9, the improved classification was seen when compared t-SNE to Sammon's, increased 18.1%, 18.4%, and 18.8% for classes S H , S PD, and S DBS , respectively. When examining the mean of each distribution shown in Figure 10, the improved classification was seen when compared t-SNE to Sammon's, increased 2% and 6% for classes S PD and S DBS , respectively, but decreased by 0.6% for class S H . Next, Table 4 shows the grand average confusion matrix of SVM classifier for all studied DR methods, including data from the training set (LOOCV) and test set. In this table, the diagonal cells in bold show the normalized percentage of correct classifications by the SVM. For example, 70 samples of S PD group were correctly classified when t-SNE DR method was employed.
is corresponds to 98% of all training set samples of S PD group. Similarly, 6 samples of the same group were correctly classified when, again, t-SNE DR method was employed. is corresponds to 78% of the test set samples of S PD group.
Overall, using the PCA DR method 73.5% of the training set and 67.8% of the test set was correctly   Figures 11-13 show the ROC curves of the LOOCV of the training set and test set validations for each class along with the mean area under the curve (AUC) while each DR method was employed as a step before classification process. For the LOOCV, the confidence bounds of 95% were computed for ROC curves by means of Bootstrap, with 1,000 replicas.
For the S H class, t-SNE achieved remarkable performance considering LOOCV, with the highest mean AUC (0.99) and with the lowest deviation from the mean. Sammon's mapping and PCA reached mean AUC of 0.91 and 0.85, respectively, and both showed a similar deviation from the mean. Considering the test set, t-SNE and Sammon's mapping show similar responses when observing the shape of the curve, mean AUC, and the balance point (i.e., the point where the ROC curve reaches the equality between specificity and sensitivity-diagonal dashed line in Figures 11-13). PCA, on the other hand, had the lowest performance. e ROC curves of Figure 12 show the discrimination ability of the SVM classifier for S PD class for both, training (LOOCV) and test validation sets. Examining Figure 12(a), the results indicate that t-SNE obtained similar results when compared to the same method applied in S H group, whereas Sammon's and PCA decreased their performance. Note that for S PD class, these two methods present overlapped area in ROC curve along with confidence bounds as much as for S H class. However, for S PD class, the confidence bounds are narrower. Figure 12(b) shows ROC curves for the test set. e behavior of the curve for each DR method was similar, t-SNE reached the best AUC (0.86), right after are PCA (0.84) and Sammon's with AUC of 0.83. At the balance point view, t-SNE was the best method and PCA was the worst one. Considering S H and S PD classes, PCA improved for classification of S PD samples from the test set. On the other hand, Sammon's and t-SNE decreased its performance for S PD class. e classification performance for S DBS class is also shown in ROC curves of Figure 13. e results showed in Figure 13(a) present training set performance curves for S DBS class, again, t-SNE achieved the best performance in terms of AUC and balance point. Next, Sammon's mapping and PCA with 0.87 and 0.84 of mean AUC, respectively, showing great overlapped area between its confidence bounds. For S DBS class, t-SNE showed the wider confidence bound while compared with the performance achieved for S H and S PD classes. Figure 13(b), in turn, shows that the three DR methods yielded the same results for S PD and S DBS classes in terms of mean AUC.

Discussion
is kind of study is not often found in the literature. e reasons could be related to the complexity of the recruitment of volunteers since, in this study, three distinct groups (i.e., S H , S PD, and S DBS ) were evaluated. is type of data are expensive, and their acquisition demands specialized professionals.
In the literature, there are a plenty of studies which propose and evaluate methods for discrimination between individuals with PD from neurologically healthy ones. However, some studies show that there are key points to be overcome for realizing the full potential of this technology in PD research and practice [40,41], for instance (1) the machine learning methods are challenging to evaluate and apply without a basic understanding of the underlying logic on which they are based; (2) the ability to algorithmically analyze and synthetically display clinically and disease-  Figure 9: Boxplots of grand average of true positive success rate achieved by SVM using LOOCV for PCA, Sammon's, and t-SNE DR techniques for participants of S H (left), S PD (center), and S DBS (right) groups. As in this study, PCA has no parameters to be varied ( Figure 4); it is depicted by one value, which represents all possible combinations for the PCA DR method. relevant information to physicians and patients remains limited. is study brings a comparison among three DR methods with the aim to address these two points.
PD treatment is also another topic extensively discussed. e two fields inside this area related to our study are an investigation of motor behavior while using medicationbased treatments and surgical ones. According to [40], it is lacking an objective way to adjust drug (e.g., levodopa) release as the patient needs. Besides that, the DBS treatment has different points for improvements, one of that concerns the implementation of closed-loop (i.e., self-adjustable parameters) DBS. e present study moves toward these directions, comparing these groups of subjects and characterizing its motor behavior.
As reported in the literature [14,20,[42][43][44], our results demonstrated differences between movement patterns for the three groups. On the other hand, we introduce the comparison of visualization and classification tools, which allows for an objective evaluation of subjects. Based on our review, just a few studies approached the challenge of visualizing and classifying motor activities of the three classes evaluated. Even so, the studies that explored this area did not go as far as our study. e visual representation of mappings presented in Figures 5-8 show the ability of each DR technique to deal with high-dimensional data since these figures show the scenarios which achieved higher quality ratio. Considering the visual aspect (i.e., clustering and boundary of classes), t-SNE produces better visualizations, followed by Sammon's mapping in second place and PCA in the third one. In fact, the t-SNE ability to keep global and local structures implies in better visualizations as stated in [25]. Sammon's mapping, in turn, improves PCA, adding the ability to handle with nonlinear data. In every mentioned figure, the map built by    Sammon's has a similar shape while compared with PCA map. is occurs due to the PCA initialization strategy for Sammon's algorithm [45].
Classification accuracy for PCA, Sammon's mapping, and t-SNE was, respectively, 73.5%, 78.6%, and 96.9% for the training set and 67.8%, 74.1%, and 76.6% for the test set. According to [38], the training set is used to fit the models and the test set is used for assessment of the generalization error of the final chosen model. Furthermore, there are subtle differences between the training set and test set. e reasons of that are (1) differences in motor behavior between inter and intragroups; (2) the training and test sets are built randomly; (3) the out-of-sample step introduces error which is related to the mapping of high-dimensional information onto a 2-dimensional space; (4) the classifier generalization ability varies, and this factor impacts directly in the prediction accuracy, especially when new samples are presented.
Visual representation presented in Figures 5-8 could be used as a visualization tool for follow-up of treatments of PD by means of definition of the control zone, so that the closer this zone to the subject is better in terms of motor behavior. Furthermore, to achieve a smooth control of this zone, an individual analysis for each patient could help.
Our results take into account the differentiation of PD treatments and a healthy control group without considering the subtypes of the disease. e variability found in some methods may be due to this factor, since tremor, bradykinesia, and rigidity present different movement patterns. A further study with the use of our system and protocol in new groups of participants, separated by PD subtypes, could address this limitation. e tasks performed in this study are well established, described in the UPDRS [46] and used in clinical evaluation [47][48][49][50][51][52]. In Figure 5, the finger taps (Task 1) using t-SNE projection reached a quality ratio of 99.42% ± 0.8 as well as the clearer visual representation among all mappings shown in Figures 5-8. Sammon's mapping, in turn, presented a spherical projection, which is characteristic of this method and achieved 88.60% ± 8.6 of QR, around 10% less than t-SNE.
Finger to nose (T2) and pronation and supination (T3) were the performed tasks with highest mean QR, 93.47% and 92.50%, respectively, considering all DR methods. Both movements are more complex than the other two performed tasks, finger taps (T1), and rest (T4). e higher motor pattern complexity of T2 and T3 tasks reflect in a higher success rate on discrimination of the three classes (S H , S PD , and S DBS ). e finger to nose task shares its dominant kinematic pattern with a variety of activities of daily living (ADL) such as eating, drinking, and answering a phone. Pronation and supination task, on the other hand, is a commonly used task to assess bradykinesia [53,54].
Regarding discrimination among groups, t-SNE showed the highest success rates for the LOOCV followed by Sammon's mapping. Similar performance was achieved when t-SNE was applied as a step before proceeding with the classification using the test set. Although the success rate reached by t-SNE was superior, its performance was weak while compared with itself in LOOCV. is drop occurs due to the step to allow project new data points, called out-ofsample (step 4 in Figure 3). e out-of-sample (OOS) process was carried out by means of a PCA along with an ANN as explained in Section 2.6. Our OOS approach reached overall mean squared error of 17.9 ± 10.5 and 3.6 ± 2.7 for Sammon's mapping and t-SNE, respectively, and an overall R value of 0.97 ± 0.02 and 0.95 ± 0.03 also for Sammon's mapping and t-SNE, respectively.
Despite our good results in OOS step, in many cases, the high variability of intragroup motor patterns, mainly in S PD and S DBS , turns the OOS a hard process. ere are in the literature other methods to deal with OOS [55]; these methods could improve the results presented in this study.
In this study, three preprocessing methods were employed. e first (FS) was based on the filtered signal, which yields data more correlated with the original data; the second (IF) captures changes in the signal frequency over time and the third (IA) takes into account changes in the amplitude of the signal.
Concerning to the preprocessing methods, our results show that the combination of features extracted from the methods FS and IF was the one that yielded the best overall success rate (86.14% ± 4.3), in accordance with [20]. e success of this combination may be related to the cardinal symptom tremor, which induces oscillatory movements in individuals with PD. ese oscillatory movements could vary around 6 Hz [52].
Proceeding to classification analysis, Table 4 summarizes the classification results by using the confusion matrix style. Machado et al. [20] employed a similar analysis in some points, using only Sammon's mapping. ey reported an overall mean success rate as given below: In our experiments using t-SNE we achieved an overall mean success rate as given below (from Table 4

Conclusion
is study investigated the motor behavior of three distinct groups of individuals: neurologically healthy, PD treated with levodopa, and PD treated with DBS. In order to analyze the motor behavior of each group, four motor tasks were performed by the subjects and recorded using inertial and EMG sensors. In spite of the large possibilities of sensors to be used for collecting various data that can quantify PD symptoms, the same progress cannot be seen while dealing with large and complex data such as the kind of data collected in this study. e assessment of the classification methods showed that the visualization provided by the t-SNE enhanced the visual discrimination of the groups so that they could be clearly identified for all investigated tasks. For automatic discrimination among groups, SVM was used after the data reduction step. e SVM performance was higher in almost all scenarios while t-SNE was employed. Furthermore, the noted improvement was irrespective of the group or task or of the preprocessing method utilized, with an improvement of around 18% for the training set, considering t-SNE versus Sammon's mapping. For t-SNE versus PCA, the improvement was around 23% for the training set.
Data Availability e set of features used to support the findings of this study are included in the supplementary information in a Microsoft Excel file.

Conflicts of Interest
e authors declare that they have no conflicts of interest.