Classification of Parkinson’s disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples

Background The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. Methods In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Results Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. Conclusions This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.


Background
Parkinson's disease (PD) is a neurodegenerative disorder of the central nervous system that is characterized by a partial or full loss in motor reflexes, speech, behavior and other vital functions. It is generally observed in elderly people and causes disorders in patient speech and motor abilities (writing, balance, walk, etc.) [1]. PD is one of the most common neurodegenerative disorders, with an incidence rate of 20/100,000 and affecting approximately 5 million people worldwide, with half of PD cases found in China [2]. Moreover, these statistics are likely to be underestimating the incident rate due to difficulties in diagnosing PD. With populations growing, the number of diagnosed PD patients will continue to grow, thus increasing the damage of this disease in the future [2].
While identifying the causes of PD onset remain elusive, PD often referred to as an idiopathic disorder, genetic and environmental factors have been implicated [1]. Unfortunately, a reliable PD biomarker has yet to be identified, but the symptoms can often reflect PD occurrence and progression, such as tremor, rigidity, loss of muscle control, slowness in movement, poor balance and, especially, voice problems [1][2][3][4][5][6]. Therefore, diagnosing PD based on symptoms is reasonable and effective [7][8][9][10][11][12][13][14]. Among them, speech has been shown to be a useful signal for discriminating PWP (people with Parkinson's) from healthy controls, with clinical evidence suggesting that the vast majority of PWP typically exhibit some form of vocal disorder [3,7,[14][15][16]. In fact, vocal impairment may be among the earliest prodromal PD symptoms and can be detectable up to five years prior to clinical diagnosis [2]. Thus, utilizing speech data can aid in the development of a noninvasive early PD diagnostic method, with speech alterations including reduced loudness, increased vocal tremor and breathiness (noise), while PD-associated vocal impairment is characterized by dysphonia (inability to produce normal vocal sounds) and dysarthria (difficulty in pronouncing words). Dysphonia can be measured by utilizing acoustic tools that detect voice abnormalities, such as aperiodic vibrations, non-Gaussian randomness, abnormality of vowel "a" phonations, etc. [2], with certain special words and sentences, such as Arabic numerals, special words, etc., used for detection.
While a reliable PD diagnosis is difficult, the effectiveness of speech based diagnostic approaches has inspired researchers to develop decision support tools able to extract dysphonic speech features and design classification algorithms to distinguish PD patients from healthy ones, with speech feature extraction, feature selection/transformation and classifier design [17][18][19][20][21][22][23][24].
While the methods mentioned above have been effective to some extent, obtaining optimal speech samples is difficult, with low quality samples being prone to misleading the classifiers and negatively impacting accuracy. Based on this potential pitfall, this study examined a speech sample selection algorithm to improve outcomes. First, a multi-edit-nearest-neighbor (MENN) algorithm was utilized to select the optimal speech samples iteratively, thereby enhancing the separability of the training samples. The MENN algorithm effectively optimizes training samples by removing samples that could mislead the classifier [50,52,54]. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE) was employed for classification on the selected training samples. These two ensembles were chosen for point of comparison, with RF being a classical ensemble learning algorithm with a proven stability [2,34,41], while DNNE is a newer ensemble learning algorithm able to effectively maximize differences between sub-learning algorithms [51]. Lastly, a classification was conducted on the test samples based on the trained ensemble learning algorithms.

Data descriptions
The data utilized in this study was obtained from the Parkinson speech dataset with multiple types of sound recordings [1], was deposited by Sakar et al. [1] and is available on the University of California, Irvine (UCI) machine learning dataset repository website. The deposited dataset (Table 1) included two datasets entitled "Training_Data" and "Test_Data". The "Training_Data" set included 40 subjects, which were 20 PD (6 women, 14 men) and 20 healthy subjects (10 women, 10 men), with 26 speech samples collected per subject. The samples contained a variety of speech segments with varying pronunciations, to include continuous vowel sounds, digital pronunciation, word pronunciation and phrase sentence pronunciation. To set-up a feature vector, linear and nonlinear feature parameters were extracted for each speech sample in 26 dimensions. The "Test_ Data" set included 28 subjects (all PD) with six speech samples collected per subject, with half containing pronunciation of the continuous vowel 'a' , while the other half had pronunciation of the continuous vowel 'o' . Feature vectors were then constructed by the 26 dimensions.
The data collected in the context of this study belongs to 20 PWP (6 female, 14 male) and 20 healthy individuals (10 female, 10 male) who appealed at the Department of  [1]. In addition to the dataset deposited by Sakar et al. [1], another broadly studied dataset also exists that was deposited by Little et al. [2,14]. The major reasons the Sakar et al. dataset was chosen is as follows: (1) the Little et al. dataset has been more broadly studied, with a classification accuracy over 95% achievable, which is comparable to the close to 100% accuracy achieved herein; (2) the Sakar et al. dataset contains more samples, thus providing more statistically meaningful results; (3) only a few studies have examined the Sakar et al. dataset, thus making findings more insightful; and (4) corresponding classification accuracies for the Sakar et al. dataset are not promising at time. The aim of this study is to show that this type of data collection can lead to high classification accuracy of PD just by altering the classification method. Therefore, in the experimental section, most of the experiments are based on the PD data from Sakar et al. [1]. Since the PD data from Max Little was investigated frequently, the proposed algorithm is verified based on the data too.

Flowchart of the proposed algorithm (PD_MEdit_EL)
For simplification, the proposed algorithm was called PD_MEdit_EL and a flowchart can be seen below (Fig. 1). The PD_MEdit_EL algorithm includes two major parts, with one part optimizing speech samples via MENN while the other performs the classification using the ensemble learning algorithm, with either RF or DNNE used. Chronologically, training speech samples are optimized using the MENN algorithm, these training samples are then processed with the ensemble learning (EL) algorithm and a trained learning model is obtained. Finally, the trained learning algorithm is applied to aid in test sample classification.

Multi-edit-nearest-neighbor algorithm (MENN)
The MENN algorithm serves as a prototype selection algorithm when samples with different classes have overlapping distributions, the overlapping regions are more likely to be misclassified. Presumably, if the overlapping regions are accurately removed, the 'noise' or misleading data can be greatly suppressed. Thus, the remaining samples will better reflect the representative samples for different classes, thereby improving the classification accuracy [52].
The two-edit-nearest-neighbor algorithm mainly includes editing and classification. First, samples are divided into training and validation samples, with the validation Fig. 1 Flowchart of the PD_MEdit_EL algorithm samples classified and the misclassified samples removed. Generally, if the sample size is large enough, MENN should be applied. The main process of the algorithm is as follows: Step 1: The original samples X = {x i = (x i1 , x i2 , . . . , x id )|i = 1, 2, . . . N } were randomly divided into s subsets, X 1 , X 2 , . . . , X s (s > 3), each containing M 1 , M 2 , . . . , M s samples. d is the dimension of the sample, N is the number of samples.
Step 2: Editing process. X i+1 is used as the training set and KNN is used to classify the validation samples in set X i , with misclassified samples deleted. These set are repeated for i = 1 to i = s − 1. If i = s, X 1 is the training set.
Step 3: The samples retained after step 2 are merged to form a new sample set (X New ).
Step 4: Repeat the above steps until all misclassified samples are removed.
In Fig. 2, the results of the training samples being edited with the MENN algorithm can be seen, with the separability of on the training samples improved when comparing pre-MENN editing (Fig. 2a) and post-MENN editing (Fig. 2b).
Previous studies have shown that the nearest neighbor algorithm can achieve a classification error rate close to the Bayesian error rate P* when a large enough sample size is utilized [53]. For the two-edit-nearest-neighbor algorithm, the progressive conditional false recognition rate is as follows: When the MENN algorithm is applied, the progressive conditional false recognition rate is as follows: As seen in the formula, the progressive conditional false recognition rate will be greatly reduced when Mincreases. If M → ∞, then lim M→+∞ P M (e/x) = min

Random forest
When the RF algorithm is utilized as the classifier, the corresponding training is conducted as follows [54]: Step 1: Bootstrap method is used to re-sample and randomly generate T training sets: Step 2: Corresponding decision trees are generated for each set: C 1 , C 2 , . . . C T and mattributes are randomly selected from M attributes, with the determined optimal splitting used to split the node.
Step 3: Each tree is allowed to grow integrally, without being pruned.
Step 4: Based on each decision tree, the sample in the test set X is classified to generate the corresponding classes: Step 5: The voting method is applied, output categories are determined based on the T decision trees and the category with the maximum votes becomes the assigned category.

Dnne
The main principle of the DNNE algorithm [51] is described as follows.
Initialization Training setD t ; base function G (option: sigmoid function); regularizing factor λ ∈ [0, 1]; the number of the sub-classifiers is M, dimension of sample is L. Procedure Trained DNNE model. 1: Construct a M single-layer feedback neural network and an ensemble learning algorithm (model). 2: Randomly initialize weights w j and bias b j . 3: Input training samples D i and calculate the output g ij (x n ) of all the sub-classifiers, where the functional connection network (RVFL) for constructing random vector needs training samples D t with N dimensions, D t = x 1 , y 1 , x 2 , y 2 , . . . , x N , y N and where x n , y n ∈ R d × R belongs to pairs of observations. The G ij () means the ij sub-classifier. 4: Calculate constants C 1 , C 2 as Eq. (4): ϕ i, j, k, l represents the correlation between the jth hidden neuron of ith independent RVFL network and the lth hidden neuron of kth independent RVFL network. ϕ i, j signifies the correlation between the jth hidden neuron of ith based network and object function ψ(x). Thus, C 1 and C 2 can be represented as Eqs. (5) and (6): By following the formula above, the errore i , output weight β ij and M × L linear equation can be obtained.
So, the linear system can be calculated based on theβ ij and the RVFL ensemble model can be obtained. In order to simplify the calculation, the matrix form of the linear system is as follows Eq. (7): H corr is hidden correlation matrix, B ens is the global output weights matrix and T h is the hidden-target matrix. H corr is defined as follows Eqs. (8) and (9): where p, q = 1, . . . , M × L; m = p L , n = ((p − 1) mod L) + 1; k = q L ; l = ((q − 1) mod L) + 1; and mod means modular operation. B ens and T h can be defined as follows Eq. (10): β ij can be modified using the optimization technique of gradient descent and is formulated as follows Eq. (11): The DNNE algorithm can be programmed and realized. It's pseudo code is as follows:

Experimental condition
One thousand and forty training samples collected from 40 subjects within the "Train_ Data" set, with each composed of 26 dimensional feature parameters, were evaluated with leave one out cross validation (LOO CV). If the majority of subject samples were patient, then the subject was deemed a patient; otherwise, the subject was deemed healthy. To reduce the effect of outlier samples from the same subject since not all samples can reflect speech characteristics equally, a cross validation method called leaveone-subject-out (LOSO) was employed. When completing the DNNE algorithm, the linear exhaustive searching algorithm was used to search for maximum parameter values, with the search range of M within [2,15], the base function L of the RVFL network within [5,50] and the penalty coefficient γ and boosting threshold value ф within [0, 1]. Their step sizes were 0.1 and 0.01, respectively. Additionally, 60% of the training samples are frequently used for bagging and boosting the ensemble. When performing RF, the statistically optimal number of decision trees was 500. The SVM was based on libsvm.
When performing the MENN algorithm, the setup parameter (s) was based on statistical experiments. For MENN, the s = 4. For the weights of the classification model, they are set by supervised training. In this paper, the experimental operating system platform was the Windows, version 7, 32-bit operating system, and the memory size was 4 GB. The data processing was completed in MATLAB, version 2014a. The SVM, RF and relief algorithms are from the toolboxs under MATLAB environment. The DNNE algorithm is from the official website of MATLAB. Original MENN algorithm is from the file exchange related websites (http:// www.pudn.com), but it was modified and combined with the DNNE and RF by us. Other parts are designed by us.

Performance evaluation criteria
To verify the effectiveness of the proposed algorithm, the classification accuracy, sensitivity and specificity were used as an evaluation standard utilizing the terms TP: true positive, TN: true negative, FP: false positive and FN: false negative. Specific formulas are as follows: Sensitivity, also called the true positive rate, determines the percentage of accurately identified disease subjects by the equation: Specificity, also called the true negative rate, determines the percentage of accurately identified healthy subjects by the equation:

Classification performance of the PD_MEdit_EL algorithm
In the present study, PD_MEdit_EL algorithm, which utilized the ensemble learning algorithms RF and DNNE, was examined. Until now, only four studies have examined the Sakar et al. [1] dataset, with two only reporting on the classification accuracy of the validation set [42,49]. Therefore, only two studies [1,27] remain that are comparable to the PD_MEdit_EL algorithm examined herein. Currently, SVM is widely applied in  (Table 2). These results showed that the PD_MEdit_EL algorithm provided the most accurate classification in terms of classification accuracy (ACC), sensitivity (SEN) and specificity (SPE), regardless of LOO and LOSO. When examining all classification determinants, the RF (with MENN) was the best overall classification algorithm, with the highest mean SPE of LOSO seen with the DNNE (with MENN) algorithm. When comparing the LOO and LOSO methods, the classification performance of LOSO was higher than LOO, possibly due to LOSO effectively reducing outliers within a subject.
Overall, these findings show that when MENN is combined with either ensemble learning algorithm, a notable improvement is seen over other classification methods. When examining ACC in the RF (with MENN) method, improved classification was seen when compared to the SVM (linear), increased 16.5%, SVM (RBF), increased 14%, method in [1], increased 29.44% and method in [27], increased 0.3%. When examining Table 2 Classification accuracy for the "Training_Data" set DNNE (with MENN) and RF (with MENN): reflect the proposed PD_MEdit_EL algorithm; SVM (with linear): SVM with the linear kernel function; SVM (with RBF): SVM with radial basis function kernel; Method in [1]: classification algorithm from the Ref. [1] and Method in [27]: classification algorithm from Ref. [27] Leave-one-out method LOO (%) accuracy in the DNNE (with MENN), classification improved by 8.37% relative to SVM (linear), 5.87% relative to SVM (RBF) and 21.31% relative to the method in [1], but decreased by 2.5% relative to the method in [27]. As for the SEN for the RF (with MENN) method, a classification improvement of 27.5% was seen relative to the SVM (linear), 12.5% relative to the SVM (RBF), 37.58% relative to the method in [1] and 7% relative to the method in [27]. However, when examining the SEN for the DNNE (with MENN) method, a classification improvement was only seen relative to the SVM (linear), increased 6% and the method in [1], increased 16.08%. These results showed that the RF method was more robust than the DNNE algorithm in terms of classification performance. Next, the classification accuracy of several classification algorithms was examined for the "Test_Data" set (Table 3). Since this set only contained patient subjects, the sensitivity and specificity could not be calculated. Furthermore, the method in Ref. [27] did not report the classification accuracy for the "Test_Data", so it was omitted. The results when using this dataset further pointed out the strength of the proposed PD_MEdit_EL algorithm, with significantly higher classification accuracies seen in the RF (with MENN) and DNNE (with MENN). When comparing the DNNE and RF algorithms, no significant difference in terms of classification accuracy was noted. The results obtained in Tables 2 and 3 are also graphically displayed in Figs. 3 and 4.
To further examine classification performance, differences between algorithms relative to run number were examined; both the 'Training_Data' and 'Test_Data' sets were examined 10 times each for each algorithm. These findings were consistent with the data presented in Tables 2 and 3. When examining the "Training_Data" set, both the RF (with MENN) and DNNE (with MENN) showed the highest classification accuracies. However, the RF (with MENN) showed a greater degree of stability compared to the DNNE (with MENN), possibly due to the DNNE being complex. While the DNNE (with MENN) method showed an overall improved classification accuracy, its low stability make it a less ideal candidate. While the method referenced in [1] was not included  When examining the "Training_Data" set ( Fig. 5), the RF (with MENN) again showed the highest degree of accuracy with a high stability, while both the RF (with MENN) and DNNE (with MENN) showed a higher classification accuracy than the SVM, which is currently used in the classification of Parkinson's disease. When examining this dataset, it is worth noting that the RF becomes more stable, thus suggesting a higher compatibility with this set.
When examining the "Test_Data" set ( Fig. 6), the RF (with MENN) again showed the highest degree of accuracy with a high stability, while both the RF (with MENN) and DNNE (with MENN) showed a higher classification accuracy than the SVM, which is currently used in the classification of Parkinson's disease. When examining this dataset, it is worth noting that the DNNE becomes more stable, thus suggesting a higher compatibility with this set.
The proposed algorithm was verified in the PD data from [2]. The Max Little et al. introduced several machine learning algorithms into their dataset [2]. According to the results, the SVM with RBF kernel and relief algorithm was best. Therefore, it was realized here for comparison with the proposed algorithm. For fair comparison, the proposed algorithm was based on relief algorithm too. The CV method is tenfold CV as same as the paper [2] did.
Seen from the Table 4, the classification accuracy rates are better than those in the Table 2. For the proposed algorithm and LOSO, the classification accuracy is improved from 81.5 to 87.8%; the sensitivity is improved from 92.5 to 95.4% respectively. For the proposed algorithm and LOO, the classification accuracy is improved from 70.93 to 87.8%; the sensitivity is improved from 73.27 to 95.4% respectively. The possible reason is that the number of samples in the dataset is smaller than that in the dataset on the Table 2. Besides, it is worth noting that the sensitivity and the specificity are quite It is worth noting that the 1040 samples are from the 40 subjects, therefore, the samples belonging to same subject are dependent each other to some extent. So, if the samples in training set and test set are independent, the classification accuracy rates possibly become worse. However, the verification section in the relevant papers did not consider this point except the paper the data originates from. In order to further study the problem, an experiment was conducted. In the experiment, when a sample is classified, the other 25 samples belonging to same subject are not used for building the classification model. It can guarantee the samples in training set and test set are independent each other. The Table 5 shows the differences between considering dependency and independency. Seen from the Table 5, the RF_MENN means the algorithm not considering the dependence of the samples; the RF_MENN_inDe means the algorithm considering the dependence of the samples; the SVM_RBF means the SVM with RBF kernel  algorithm not considering the dependence of the samples; the RF_MENN_inDe means the SVM with RBF kernel algorithm considering the dependence of the samples. Seen from the Table 5, when considering the dependence of the samples, the classification accuracy rates become worse regardless of RF and SVM algorithm. The extent of becoming worse is large. However, the proposed algorithm considering dependence of samples is still better than the paper [1]. Compared with the proposed algorithm and the SVM with RBF kernel, the former is still better when considering dependence of samples. It means that the proposed algorithm is valuable.

Effect of the MENN algorithm
To examine the effects of the MENN algorithm, the "Training_Data" set, which included 20 PD (6 women, 14 men) and 20 healthy (10 women, 10 men) subjects with 26 speech samples per subject, were examined. In total, 1040 speech samples were utilized and examined pre-and post-MENN, with 1039 trained under LOO (Table 6). With the MENN algorithm, the total number of training samples was reduced from 1039 to 731, with the number of healthy subjects reduced from 519 to 364 and the number of patient subjects reduced from 520 to 367. Furthermore, after applying the MENN algorithm, the number of training subjects did not change despite reducing the sample number. Thus, the MENN algorithm can meet the requirement of the subsequent machine learning for number of training samples.
The observed effect of the MENN algorithm was further visualized (Fig. 7), with the three dimensions being the first three features in the datasets and used for coordinate determination. Prior to applying the MENN algorithm, the PWP mean patient samples and normal mean healthy samples were very mixed and difficult to separate (Fig. 7). After apply the MENN algorithm, the sample mixing was greatly improved, thus enabling better subsequent classifications. Next, an ensemble learning algorithm with and without the MENN algorithm was examined using the "Training_Data" set (Table 7), with the same abbreviations as in Tables 2 and 3   The ensemble learning algorithms with and without the MENN algorithm were also compared using the "Test_Data" set (Table 8), with the same abbreviations as in Tables 2  and 3 used. These results were in accordance with those presented in Table 5. When examining the ACC, the DNNE for samples improved 14.88%, while the DNNE for subjects improved 10.72%. Also when examining ACC, the RF for samples improved 43.03%, while the RF for subjects improved 45.72%. The highest ACC that was obtained was 100% and the largest increase for the samples was 43.03% and for the subjects was 45.72%. This high degree of improvement may have been due to this dataset only containing patient samples, thus making the classification less difficult.

Level of significance of PD_MEdit_EL algorithm
In an attempt to establish that a significant difference is present between the PD_MEdit_ EL algorithm and the other examined algorithms, the p values of the ACCs, SENs and SPEs between the algorithms based on ten times experiments were calculated (Table 9). In terms of LOO and LOSO, the differences between the RF (with MENN) and DNNE (with MENN) when compared to the SVM (linear) or SVM (RBF) showed some highly significant differences. While these results show a significant difference between the  PD_MEdit_EL algorithm and those examined, a significant difference was not noted when comparing the RF (with MENN) and DNNE (with MENN), thus suggesting that the two algorithms would perform in a comparable fashion. Statistically significant differences were also examined using the "Test_Data" set ( Table 10). The same trend was seen with this dataset, with some highly significant differences noted between the RF (with MENN) and DNNE (with MENN) when compared to the SVM (linear) or SVM (RBF). However, when comparing the RF (with MENN) and DNNE (with MENN), no significant difference was noted for subjects, but the two were significantly statistically different for samples.

Discussion
Herein, a classification algorithm for PD diagnosis, termed PD_MEdit_EL, was generated by combining a multi-edit-nearest neighbor (MENN) algorithm and ensemble learning algorithm. While research pertaining to PD classifications based on speech samples has been performed, few studies have utilized the Sakar et al. dataset [1]. Since this is a larger and more recently deposited dataset, it was chosen to examine the effectiveness of this new classification strategy. While present classification algorithms examine feature extraction, feature selection/transformation and classifier design, they are unable to consider sample optimization via selection. According to the theory of machine learning, the instance selection is very crucial to the quality of machine learning and in accurately obtaining a final classification, with outliers adding 'noise' that can mislead the classifier. In the present study, a sample selection algorithm, MENN, is utilized to reduce the impact of these outliers. Subsequent experimentation showed that the proposed algorithm can provide accurate classifications. While the RF method has been utilized in previous studies, a significant improvement was noted when it was combined with the MENN algorithm (Tables 9, 10). Additionally, two ensemble learning algorithms, RF and DNNE, were utilized to enhance the classification accuracy and stability of classification for PD. While the RF algorithm has been more extensively examined, the DNNE is a new classification algorithm. The parameters for the two algorithms were based on prior knowledge and statistical experiments and the results showed that the two algorithms performed better than the algorithm from Ref. [1], even without the MENN algorithm. When combining the ensemble learning algorithms and the MENN algorithm, the classification performance was further improved in both the "Training_Data" (~30%) and "Test_Data" (~25%) sets. The two Sakar et al. [1] datasets were studied systematically in terms of ACC, SEN and SPE, to include examining samples and subjects. To establish statistical effectiveness of the proposed PD_MEdit_EL algorithm, significance differences between algorithms were examined, with some highly significant differences noted between the PD_MEdit_ EL algorithm and the other examined algorithms.
Overall, PD_MEdit_EL algorithm achieved a higher classification performance when compared with examined algorithms currently utilized for classification. This could in part be attributed to the PD_MEdit_EL method processing the data, which the existing examined algorithms do not. Such processing techniques include feature selection, feature nonlinear transformation, multiple-classifier and so on. Thus the combining of these methods makes the PD_MEdit_EL algorithm effective method for classifying PD.
The main contributions and innovations of this paper can be described as follows:  1. Speech samples were optimized utilizing the MENN algorithm, thus improving PD classification accuracy. 2. The ensemble learning algorithms, RF and DNNE, were examined in conjunction with MENN and resulted in further optimized PD classifications. 3. The "Traning_Data" and "Test_Data" sets were studied and verified systematically, while the relevant studies did not involve the "Test_Data" set. 4. The statistical significance of the proposed PD_MEdit_EL algorithm was confirmed when compared to the other examined algorithms.

Conclusions
While speech based PD classifications have been shown to be effective, the existing methods lack the ability to optimize speech samples, which is crucial for improving PD classification performance. In this study, a new classification algorithm (PD_MEdit_EL) was proposed that combines the MENN algorithm and an ensemble learning algorithm.
To evaluate the performance of the proposed algorithm, a public dataset provided by Sakar et al. was utilized, with the proposed algorithm compared to existing methods. Experimental results showed that the PD_MEdit_EL algorithm provides improved classification abilities when compared with the examined algorithms. Furthermore, the noted improvement was apparent irrespective of the LOO or LOSO or of the dataset utilized, with an improvement near 30% seen for the "Training_Data" set and 25% for the "Test_Data". Overall, this study provides new insight that can be applied to subsequent research pertaining to PD classifications when utilizing speech data. In the near future, we will consider examining compressed speech feature data to further verify and possibly modify the PD_MEdit_EL algorithm.