Parkinson’s Disease Detection from Voice Recordings Using Associative Memories

Parkinson’s disease (PD) is a neurological condition that is chronic and worsens over time, which presents a challenging diagnosis. An accurate diagnosis is required to recognize PD patients from healthy individuals. Diagnosing PD at early stages can reduce the severity of this disorder and improve the patient’s living conditions. Algorithms based on associative memory (AM) have been applied in PD diagnosis using voice samples of patients with this health condition. Even though AM models have achieved competitive results in PD classification, they do not have any embedded component in the AM model that can identify and remove irrelevant features, which would consequently improve the classification performance. In this paper, we present an improvement to the smallest normalized difference associative memory (SNDAM) algorithm by means of a learning reinforcement phase that improves classification performance of SNDAM when it is applied to PD diagnosis. For the experimental phase, two datasets that have been widely applied for PD diagnosis were used. Both datasets were gathered from voice samples from healthy people and from patients who suffer from this condition at an early stage of PD. These datasets are publicly accessible in the UCI Machine Learning Repository. The efficiency of the ISNDAM model was contrasted with that of seventy other models implemented in the WEKA workbench and was compared to the performance of previous studies. A statistical significance analysis was performed to verify that the performance differences between the compared models were statistically significant. The experimental findings allow us to affirm that the proposed improvement in the SNDAM algorithm, called ISNDAM, effectively increases the classification performance compared against well-known algorithms. ISNDAM achieves a classification accuracy of 99.48%, followed by ANN Levenberg–Marquardt with 95.89% and SVM RBF kernel with 88.21%, using Dataset 1. ISNDAM achieves a classification accuracy of 99.66%, followed by SVM IMF1 with 96.54% and RF IMF1 with 94.89%, using Dataset 2. The experimental findings show that ISNDAM achieves competitive performance on both datasets and that statistical significance tests confirm that ISNDAM delivers classification performance equivalent to that of models published in previous studies.


Introduction
Parkinson's disease (PD) is a neurological condition that is chronic and worsens over time. It affects between 2 and 3 percent of the world's population who are over the age of 65 [1,2], which presents a challenging diagnosis [3,4]. An accurate diagnosis is required to differentiate healthy individuals from PD patients. Studies have shown that PD can be diagnosed at early stages and that an early diagnosis of this neurodegenerative condition can lessen the impact of PD and improve patient's living conditions [5,6]. At an early stage of this disease, a patient's face may show no expression, and speech may become incomprehensible [7]. The first symptom may be a barely noticeable rhythmic shaking in only one hand. The symptoms of PD worsen as the condition progresses. Over the past few years, the use of computational intelligence (CI) techniques to diagnose PD has experienced rapid growth [8].
Almeida et al. [9] used feature extraction techniques on voice signals and machine learning methods for PD detection. They concluded that phonation analysis is a more efficient alternative to detect this progressive neurodegenerative disorder than speech tasks.
Polat and Nour [10] analyzed informative features from voice signals using a datapreprocessing method for PD recognition. They concluded that the proposed approach could be applied as a data-preprocessing method on computational intelligence algorithms for PD recognition, using informative features.
Pereira et al. [11] presented a methodical study of current enabling technologies that can be used to diagnose PD as well as to ameliorate the life quality of individuals with this condition. The authors also included a closer look into innovative and future approaches to achieve this goal.
Sakar et al. [12] presented a methodical study of voice signal processing for PD detection. The authors extracted features from voice signals of PD patients using a tunable wavelet transform approach (TQWT). The authors concluded that the performance of TQWT is superior to other cutting-edge speech signal processing approaches that are utilized in PD classification.
Pahuja and Nagabhushan [13] presented an analysis of the currently available supervised machine learning methods applied to Parkinson's Disease recognition. They compared three types of supervised machine learning approaches: SVM, K-NN, and ANN. They concluded that a feed-forward ANN combined with the Levenberg-Marquardt method improves its performance and allows the highest classification rates to be achieved when applied to the voice dataset.
Alzubaidi et al. [14] investigated and provided a summary of neural networks applied to Parkinson's disease diagnosis. The authors analyzed 91 studies to identify the function that a feed-forward artificial neural network plays in the process of PD diagnosis. The authors concluded that the early detection of PD through voice sample analysis occurs in a significant proportion of the 91 works analyzed. The authors also argue that ANNs are a viable option for performing this task.
Sechidis et al. [15] introduced a cross-domain transfer-learning model for speech emotion recognition and applied it to a Parkinsonian speech corpus. They evaluated distinctive voice patterns of patients with PD and concluded that there is a relationship between PD and emotional scores, where patients with this disease are often perceived as expressing an emotion of sadness.
Ma et al. [16] proposed an ensemble model for Parkinson speech recognition based on a dual-side learning ensemble that applies sample selection to remove useless samples and deep feature transformation to generate high-quality features for PD recognition. They concluded that the weighted fusion mechanism that merges classification algorithms is the crucial component that helps to achieve higher performance than the state-of-art relevant algorithms.
Haq et al. [17] presented a methodical review of the literature of deep learning methods frequently used in Parkinson detection. They explored current reviews with their advantages and limitations. They also presented numerical results comparison for current relevant methods.
Zhang et al. [18] introduced a feature called EDF-EMD and assessed its effectiveness on two datasets. The authors analyzed the frequency behavior of voice signals and concluded that the high-frequency segment of the speech signal provides relevant components related to the diagnosis and recognition of PD. They concluded that classification performance results are outstanding, using feature extraction from IMF1. The experiments suggest that the EDF-EMD feature can be utilized to recognize this medical condition effectively.
Mei et al. [19] presented a thorough analysis of the literature to study and enhance global understanding of supervised machine learning algorithms of practical application to Parkinson's disorder recognition. The authors reviewed datasets, mathematical methods, data types, machine learning algorithms, data sources and the related results of 209 studies. The authors concluded that there is a strong potential of machine learning approaches in clinical decision making, which will lead to a more informed and methodical diagnosis of PD.
Ngo et al. [20] presented a methodical review of the literature of the last decade to investigate signal features and Parkinson's disease recognition methods. They reported on the signal recording protocols, the different datasets used by supervised learning algorithms and signal analysis methods frequently used in Parkinson speech recognition. Based on classification and correlation results, they concluded that voice signal features as well as information extracted from speech recordings have a potential utility for Parkinson's disorder recognition and assessment.
Quan et al. [21] presented a convolutional neural network (CNN) approach for Parkinson's disorder recognition using voice samples. This approach is based on the successive application of CNN on voice samples. First, a 2D CNN is applied to obtain time series dynamic features,then a 1D CNN is used to obtain the dependencies between these time series. The authors concluded that higher frequencies of the Mel-spectrogram are less significant than low-frequency regions.
Madruga et al. [22] analyzed the impact of the variability that exists between different voice signal acquisition devices and the negative effects on voice signal recordings specifically oriented for Parkinson's recognition. They proposed a methodology to increase the robustness against the variability between different devices. They concluded that this approach improves the capacity of supervised learning models to detect PD patients from healthy individuals, even using different voice signal acquisition devices.
Coelho et al. [23] analyzed the performance effects of Hjorth parameters extracted from electroencephalogram (EEG) signals for Parkinson's disease detection. They evaluated the differences between healthy individuals' and PD patients' brain's cerebral cortex. They concluded that there are substantial distinctions between healthy individuals' and Parkinson's disease patients' brain's lobes; thus, they can be used as Parkinson's disease biomarkers.
Dixit et al. [24] presented a comprehensive overview of many AI-based machine learning and deep learning approaches to diagnose PD, as well as their impact on the development of new research areas. This review additionally provides an in-depth look at the current state of PD diagnostics as well as forthcoming applications of data-driven AI technology.
As evidenced by recent research, PD diagnosis continues to be a current topic of interest, and furthermore, its detection through voice samples using computational intelligence techniques is still a major challenge.
This paper presents an improvement to the smallest normalized difference associative memory (SNDAM) algorithm and shows performance measurement results achieved by this new proposal, called improved smallest normalized difference associative memory (ISNDAM), when it is applied to PD diagnosis. The experimental findings confirm that the proposed improvement in the SNDAM algorithm, called ISNDAM, is efficient and effectively increases the classification performance compared against well-known algorithms.

Materials and Methods
The learning matrix concept was conceived more than six decades ago [25][26][27], and since then, major research groups worldwide have shifted their focus to this promising research area [28][29][30][31][32]. The concept of an array that stores the relationship between patterns has evolved into what is known today as associative memories (AM).
Associative memories are used to simulate the basic learning activity of the human brain. Thus, this learning paradigm associates input and output patterns. Definition 1. Let x µ be the µ-th input pattern and y µ be the µ-th output pattern, then the learning set is represented as follows: {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x µ , y µ ), ..., (x p , y p )} (1) In this way, an associative memory M is created through the association of output patterns with input patterns. This stage is known as the training phase.
Once associative memory M has been built, a test patternx will be presented to associative memory M to obtain an output pattern y ϑ that will indicate the class label that corresponds to test patternx. This stage is known as the operation phase.

Smallest Normalized Difference Associative Memory
This model was proposed in [33] to overcome the shortcomings of ABAM introduced in [34,35]. The ABAM model proposed two operators for its functionality: α and β. These operators were conceived in a binary space. Thus, there are certain drawbacks to this, such as the fact that all patterns have to be binary encoded [36], which increases the processing complexity but does not necessarily increase the recall capacity of the ABAM model. [37].
Briefly stated, the main contribution of the SNDAM model is that it extended α R and β R operators to the R domain, which reduced the complexity of the binary encoding of the fundamental set of patterns as well as the time consumed for model training, while the robustness to deal with subtractive, additive or combined alterations in input patterns was preserved.

Definition 2.
The following describes the α R operation: Definition 3. The following describes the β R operator: The second contribution of this model is to use the smallest normalized difference to evaluate the similarity between an unknown recalled pattern and another one present in the training phase. This removes the ambiguity in the class label assignment; as a consequence, classification performance is improved.
The central core of SNDAM is based on operations between vectors to produce pattern associations, stored as a two-dimensional array (learning matrix) [33]. One crucial advantage of the SNDAM model is that it can fully recover the learning set when the learning phase is carried out in an autoassociative mode. Another aspect that distinguishes the SNDAM model is the robustness against additive or subtractive noise in training patterns. This turns out to be an advantage since an unknown pattern x ω can be considered as a training pattern x µ altered by noise (additive or subtractive).

Definition 4.
Let M be the SNDAM MAX type, which is obtained as follows: Definition 5. Let W be the SNDAM min type, which is obtained as follows: Definition 6. Let A = R, let be the maximum operator, and let x MAX ∈ A n be an n-dimensional vector that contains the highest value of each of the n components of all the p instances in the learning set: Definition 7. Let y ϑ be the recalled pattern from a given input instancex that was not present in the learning set, using M: Definition 8. Let y ϑ be the recalled pattern from a given input instancex that was not present in the learning set, using W: Definition 9. Let δ ϑµ represent the magnitude of the similarity between each of the p instances x µ in the learning set and the recalled pattern y ϑ :

Improved Smallest Normalized Difference Associative Memory
Even though the SNDAM model [33] overcame some of the shortcomings of ABAM [34,35], in this section, an improvement to the SNDAM model is proposed. This model is known as improved smallest normalized difference associative memory (ISNDAM), and it is based on the incorporation of a relevance identification phase, which is executed before the testing phase with unknown patterns, as is shown in Figure 1.

ISNDAM Algorithm
The operation of ISNDAM requires three phases, namely: training, relevance identification and testing. As a result of the first phase, a learning matrix is obtained using those patterns present in the learning set. After that, an iterative search process is carried out in order to identify those characteristics that provide more information for the purposes of classification, as is shown in Figure 1. Those features that are identified as relevant are binary encoded and are stored in a reinforcement vector, as is shown in Figure 2. Finally, the smallest normalized difference is applied to test the unknown patterns; thus, the performance indicators of the proposed model are obtained.

Figure 2.
Example of a binary encoded relevance identification vector. The value 1 is assigned to features that contribute to improving classification performance, while the value 0 is assigned to those that do not.

Training Phase
At first, a learning matrix M is built by associating all p instances in the learning set. After classifying these instances, a performance measure of the learning set is obtained. This first indicator sets the classification performance lower bound. Algorithm 1 describes the steps required to implement this phase.
Step 2: Compare the previously generated p matrices, and keep the maximum value ( ). The result is a learning matrix M that contains the maximum value of all p generated matrices.
Step 3: Compare the n components of all the p instances in the learning set, and keep the maximum value ( ), using Equation (8). The result is an n-dimensional vector that contains the highest value of each of the n components of all the the p instances x µ in the learning set.
Step 4: Recall all the p instances x µ in the learning set, using Equation (9).
Step 5: Obtain the magnitude of the similarity δ ϑµ between each of the p instances x µ in the learning set and the recalled pattern y ϑ , using Equation (11).
Step 7: Assign x µ class label to the recalled pattern y ϑ . End of Training Phase

Relevance Identification Phase
The relevance identification phase consists of an iterative search process to identify and to select those features that improve ISNDAM classification performance. Algorithm 2 describes the steps required to implement this phase. The selected subset of features retains relevant information that sufficiently describes the problem to be effectively solved. This optimal subset of features is represented by a binary encoded n-dimensional vector. Those features that improve classifier performance are coded with a value of 1, while those features that negatively affect the performance of the classifier are coded with a value of 0, as is shown in Figure 2. 1} and let e r be a binary coded n-dimensional column vector that represents the identification of those relevant features that improve the performance of the model, with the r value represented in binary form:
Step 2: Obtain the magnitude of the similarity δ ϑµ between each of the p instances x µ in the learning set and the recalled pattern y ϑ , using Equation (11).
Step 4: Assign x µ class label to the recalled pattern y ϑ .
Step 5: Obtain an r-th performance indicator, considering both successes and errors in the classification process.
Step 6: Keep track of both the performance indicator and the relevance identification vector in each one of the r-th iteration.
Step 7: Contrast both performance indicators, (r − 1)-th and r-th. The highest performance value is preserved.
Step 8: Increment ru ntil r > rmax; End of Relevance Identification Phase

Testing Phase
In this third phase, those features that benefit the performance of ISNDAM have already been identified. The relevance identification vector e r ignores those features that negatively affect the classification performance; thus, the testing phase is executed considering only relevant features. In this phase, test instances are recalled, and the smallest normalized difference value obtained is computed using all training patterns to assign a class label. Algorithm 3 describes the steps required to implement this phase.

Algorithm 3: Testing Phase
Data: M, p, n, e r ,x Result: y ϑ ; Initialization: Step 1: Use memory M, previously generated in the training phase, to recall y ϑ from a given input instancex that was not present in the learning set, using Equation (9).
Step 2: Obtain the magnitude of the similarity δ ϑµ between each of the p instances x µ in the learning set and the recalled pattern y ϑ , using Equation (11).
Step 4: Assign x µ class label to the recalled pattern y ϑ . End of Testing Phase

ISNDAM Numerical Example
In order to clarify the implementation of the ISNDAM model, the operation of the proposed model is illustrated by way of an example. Numeric values in our example correspond to patterns taken from the widely known iris plants dataset [38]. Each instance has four features, and there are an equal number of instances in each class, specifically three instances of each class.
Example. Given six instances in the learning set, each described by n = 4 features, where x 1 , x 2 , x 3 belong to class 1 and x 4 , x 5 , x 6 belong to class 2, recall all instances and obtain a classification performance measure.
Generate p matrices, one for each fundamental pattern association (x µ , x µ ), using Equation (6). Thus, each matrix element can be obtained using the following expression: Once all p matrices have been obtained, the maximum operator is applied according to Equation (6). This means that each m ij component of each matrix must be compared and the maximum value is stored, so that finally a single matrix containing maximum values is obtained. In our example, the SNDAM autoassociative MAX-type obtained is as follows: Use memory M, previously generated in the training phase, to recall y ϑ from a given input instancex that was not present in the learning set, using Equation (9). Then, obtain the magnitude of the similarity δ ϑµ between each of the p instances x µ in the learning set and the recalled pattern y ϑ , using Equation (11). Thus, each component of the recalled pattern y ϑ is obtained using the following expression: Firstly, the M matrix is taken, and the Beta operator β R is applied to the first instance to be tested. In our case, we take the first instance of the numerical example x 1 Subsequently, the minimum operator is applied, and as a result, we obtain vector y ϑ .
Once vector y ϑ has been recalled, what follows is to obtain the magnitude of the similarity δ ϑµ between each of the p instances x µ in the learning set and the recalled pattern y ϑ , using Equation (11) For this example, y ϑ will be compared with x 1 , x 2 , . . . , x 6 and the smallest normalized difference is obtained, according to (11). 11 As can be seen, the smallest normalized difference δ ϑµ between y ϑ and each of the patterns present in the training phase occurs in the first position δ 11 . This result is intuitive because the recalled values of y ϑ are present in the x 1 pattern. Therefore, since y ϑ has a greater similarity with x 1 , the class label of x 1 is applied to y ϑ . This implies that y ϑ belongs to class 1. It can be verified that all patterns x 1 , x 2 , . . . , x 6 that are present in the training phase are correctly recalled, that is, recalled patterns y 1 , y 2 , y 3 belong to class 1 and y 4 , y 5 , y 6 belong to class 2.
What follows is to test the ISNDAM model with unknown patterns. At this stage, we are interested in obtaining the class label that corresponds to test pattern x , which is achieved by measuring the similarity between each one of those patterns that were present in the training phase and the recalled pattern y ϑ .
The test pattern x is the following: Firstly, the M matrix is taken, and the Beta operator β R is applied to the unknown pattern x to be tested.
Subsequently, the minimum operator is applied, and as a result, we obtain vector y ϑ .
Once vector y ϑ has been recalled, what follows is to obtain the smallest normalized difference δ ϑµ between each pattern in the learning set and the recalled pattern y ϑ . For these purposes, y ϑ will be compared with x 1 , x 2 , . . . , x 6 , and the smallest normalized difference will be obtained, according to Equation (11). 11 As can be seen, the smallest normalized difference δ ϑµ between y ϑ and each of the patterns present in the training phase occurs in the sixth position δ 16 . Therefore, since y ϑ has a greater similarity with x 6 , the class label of x 6 is applied to y ϑ . This implies that y ϑ belongs to class 2.
In summary, the ISNDAM model takes advantage of the properties of the learning matrices trained in auto-associative mode and uses the smallest normalized difference δ ϑµ to measure the similarity between those patterns present in the training phase and some unknown pattern y ϑ .

•
One of the contributions of the ISNDAM model is the simplification of the Beta operator β R to a single case, which eliminates ambiguity in class assignment and consequently increases classification performance. • The second contribution of this model is the relevance identification phase, which identifies all the relevant characteristics for classification purposes through a wrapperbased feature selection approach applied to SNDAM, as is shown in Figure 1.

Datasets
In this section, information about the datasets used for the validation of the performance measurements of ISNDAM model is provided. Table 1 summarizes the most relevant characteristics of them. More information about these datasets can be found in public data repositories [39,40].

•
The first dataset was created at Oxford University to differentiate PD patients from healthy individuals. It was generated from voice signal analysis measurements from thirty-one individuals, where twenty-three have PD. This dataset consists of 195 instances with 23 attributes. More details on how the recordings were gathered as well as the feature extraction process can be found at [41]. • The second dataset was created at Istanbul University, from voice recordings of forty individuals, where twenty of them are healthy individuals and the remaining twenty patients have PD. This dataset consists of 1040 instances with 27 attributes. More details on how the recordings were gathered as well as the feature extraction process can be found at [12].

Performance Metrics
In this section, a succinct description of each performance metric is presented. First, let P be the number of individuals who have a certain condition, let N be the number of individuals who do not have such condition, let TP be the true positive value that refers to a test outcome that accurately indicates the existence of a particular condition in a patient, let FP be the false positive value that refers to a test outcome that wrongly indicates the existence of a particular condition in a patient, let TN be the true negative value that refers to a test outcome that accurately indicates the non-existence of a certain condition in a patient, and let FN be the false negative value that refers to a test outcome that wrongly indicates the non-existence of a certain condition in a patient.
A confusion matrix contrasts predicted and actual classes to visually show the performance of a model [42]. A particular case of confusion matrices appears when there is a problem where there are only two classes (negative and positive). This is exemplified as follows: Actual • Sensitivity represents a test's capacity to accurately identify all individuals who have a certain condition, also known as recall: • Accuracy refers to a model's performance. It is computed as the proportion of tests that were properly predicted by all of the predictions: • Specificity indicates a test's capacity to correctly detect every individual who does not have a certain condition: • False positive rate (FPR) indicates a test's capacity to incorrectly detect healthy individuals who do not have a certain condition: • Precision refers to the reliability of a model in predicting a positive test result. It represents the proportion of tests that were accurately predicted as positive to the total number of tests that were forecast as positive: • Area under the ROC curve (AUC) represents how well a binary classification algorithm is able to identify the difference between two classes [43]: • Geometric mean (G-Mean) estimates the balance of classification performance between the minority and majority classes: • F1-score is determined by finding the harmonic mean of the assessments for sensitivity and precision. It represents sensitivity and precision into a single statistic in a symmetrical form:

Statistical Hypothesis Tests
In statistical hypothesis testing, the objective is to provide evidence that the performance metric findings are representative of the overall behavior of classifiers. In addition to performance assessments obtained from the metrics described in Section 2.4, there are two additional aspects to consider: the first is whether the reported findings can be attributed to the properties of the classifiers or whether they could have occurred by chance, and the second is whether the apparent superiority in performance of one algorithm over another is not due to random selection of training or testing samples. To address these two factors, a statistical significance analysis is required [33].
As suggested in [44,45], the two-matched-samples t test can be useful to find whether the difference between two means is meaningful [46], which is applicable in the case of two independent samples [47]. Hypothesis testing consists of assuming two hypotheses: an alternative hypothesis, H 1 , and a null hypothesis, H 0 , which is generally the opposite of what we want to prove. Rejecting the null hypothesis implies, based on evidence, that the alternative hypothesis is accepted, which gives us confidence in believing that the observations were not by chance. Algorithm 4 outlines the steps necessary to conduct a statistical significance test.

Definition 11.
Let H 0 be a null hypothesis; it is assumed to be true until evidence indicates otherwise and particularly states: classifiers A and B have the same performance.

Algorithm 4: Statistical Significance Test
Data: Performance evaluation metric Result: Accept/reject the null hypothesis; Step 1: Establish the null hypothesis H 0 .
Step 2: Define the two-matched-samples t test and the statistics used to negate the null hypothesis H 0 .
Step 3: Choose a critical region where the statistics should fall (intervals of confidence).
Step 4: Calculate the t-statistic; see if it is in the critical region. if statistic is in the critical region then the alternative hypothesis is accepted else alternative hypothesis is rejected End of statistical significance test

Results
The performance metrics, briefly described in Section 2.4, were computed for each compared algorithm using two datasets: Dataset 1 [41] and Dataset 2 [12]. The comparison of the obtained results is divided into two sections.

•
First: The efficiency of ISNDAM was evaluated and compared to the efficiency of seventy different models that are included in the WEKA workbench [48,49]. All experiments were performed using 5 × 2 cross-validation, as suggested in [50,51]. Additionally, a statistical significance test was conducted to verify if there existed statistically significant differences in the performance of each compared algorithm included in the WEKA workbench. Performance outcomes using Dataset 1 are shown in Table 2, while Table 3 shows the results of the statistical significance analysis for Dataset 1. Similarly, performance outcomes using Dataset 2 are shown in Table 4, while the statistical significance analysis for Dataset 2 is presented in Table 5. • Second: The performance of ISNDAM was compared to that of previous studies [13,19,24,52] using Dataset 1, as well as to that of previous studies [11,12,18] using Dataset 2. The performance results of ISNDAM compared to that of previous studies [13,24] using Dataset 1 are shown in Table 6. In the same way, the performance results of ISNDAM compared to that of previous studies [19,52] using Dataset 1 are shown in Table 7. Similarly, the performance results of ISNDAM compared to that of previous studies [11,12,18] using Dataset 2 are shown in Table 8.     As is shown in Tables 2 and 4, several performance metrics, briefly described in Section 2.4, were used for comparison purposes: accuracy, G-Mean, specificity, F1-score, precision, AUC and sensitivity. Statistical significance test results for the paired measurements between the ISNDAM classification accuracy results and the performance outcomes of all of the algorithms compared are presented in Tables 3 and 5. In accordance with Table 2, it is clear that ISNDAM achieves the best performance, followed by SNDAM and IBk, using Dataset 1. The first two models belong to the associative model-based classifier family, while IBk belongs to the lazy classifiers family. The default number of nearest neighbors used by IBk is (k = 1), while the default distance function is the Euclidean. As suggested in [44], the two-matched-samples t test can be useful in finding if the difference between two means is meaningful [46]. Table 3 shows the statistical significance test results using Dataset 1 for the paired measurements between the ISNDAM classification accuracy and the classification assessment results of each compared algorithm, with a confidence interval percentage of 95%, where p < 0.05 establishes the statistical significance threshold. Given the conditions of the two-matched-samples t test, classification accuracy results pairwise comparisons have to be made [45]. Therefore, we have seven p value measurements to be evaluated. Thus, a p value smaller than the statistical significance threshold implies that less evidence exists to support the null hypothesis. In each pairwise comparison of Table 3, it can be observed that the alternative hypothesis is accepted. This allows us to affirm that there exists a statistically significant difference between ISNDAM classification accuracy and the performance achieved by all the other compared models. This implies that even when performance differences are small, they are not due to random selection of training or testing samples; that is, the performance of the ISNDAM model is superior to the performance achieved by the other models that appear in Table 2 using Dataset 1 and 5 × 2 cross-validation. Following the same analysis scheme, according to Table 4, it is clear that ISNDAM achieves the best performance, followed by SNDAM and random forest, using Dataset 2. The first two models belong to the associative model-based classifier family, while random forest belongs to the decision tree classifiers family. Table 5 shows the statistical significance test results using Dataset 2 for the paired measurements between the ISNDAM classification accuracy and the classification assessment results of each compared algorithm, with a confidence interval percentage of 95%, where p < 0.05 establishes the statistical significance threshold. We have seven p value measurements to be evaluated. In each pairwise comparison of Table 5, it can be observed that the alternative hypothesis is accepted. This allows us to affirm that there exists a statistically significant difference between ISNDAM classification accuracy and the performance achieved by all the other compared models. This implies that the performance of the ISNDAM model is superior to the performance achieved by the other models that appear in Table 4 using Dataset 2 and 5 × 2 cross-validation.
With the purpose of expanding the experimental outcomes interpretation, the performance of ISNDAM was compared to that of previous studies [13,19,24,52] using Dataset 1, as well as to that of previous studies [11,12,18] using Dataset 2.
As it is shown in Table 6 the performance of ISNDAM was compared to that of previous studies [13,24]. It is clear that ISNDAM achieves the best performance with a classification accuracy of 99.48%, followed by ANN Levenberg-Marquardt with 95.89% and SVM RBF kernel with 88.21%, using Dataset 1.
Similarly, Table 7 shows the performance of ISNDAM compared to that of previous studies [19,52]. It is clear that ISNDAM achieves the best performance with a classification accuracy of 99.48% using the raw features of Dataset 1. However, if the weighted features approach [52] is applied to Dataset 1, the highest classification accuracy is achieved by LS-SVM, PNN and GRNN.
In the same way, Table 8 shows the classification accuracy of ISNDAM compared to that of previous studies [11,12,18]. It is clear that ISNDAM achieves the best performance with a classification accuracy of 99.66%, followed by SVM IMF1 with 96.54% and RF IMF1 with 94.89%, using Dataset 2.

Discussion
One of the primary goals that this study intends to accomplish focuses on the development of a machine learning model that assists the Parkinson's disease detection process with competitive performance. To achieve this, it was necessary to show, through frequently used performance metrics, that a slight modification to the SNDAM model can improve PD detection from voice samples.
Another objective was to find whether the performance of the ISNDAM model differed significantly from that of other machine learning models. The fulfillment of this objective was accomplished using two computational platforms, namely: the WEKA workbench and IBM SPSS software platform. It was possible to verify that the proposed model presents a superior classification performance when compared to other machine learning models implemented in the WEKA workbench. This statement is based on a statistical significance analysis, where in all pairwise comparisons, the null hypothesis is rejected.
Another objective was to compare the performance of the ISNDAM model with classification models published in previous studies. For such purposes and in order to conduct a coherent comparison, the performance analysis was restricted to classification algorithms that used the same datasets and cross-validation schemes; therefore, classification performance of ISNDAM was compared to that of previous studies [13,19,24,52] using Dataset 1, as well as to that of previous studies [11,12,18] using Dataset 2.
While using Dataset 1, the ISNDAM model performance was superior in the following studies: Pahuja and Nagabhushan [13] and Dixit et al. [24].
Using raw features of Dataset 1, the performance of the ISNDAM model was superior in all the cases that appear in the following studies: Hariharan et al. [52] and Mei et al. [19]. However, using weighted features of Dataset 1, the highest classification accuracy was achieved by LS-SVM, PNN and GRNN.
While using Dataset 2, the ISNDAM model performance was superior in each of the following studies: Pereira et al. [11], Sakar et al. [12] and Zhang et al. [18].
There are several reasons why the proposed model offers competitive experimental performance. The first is because of its great learning capacity, specifically because it has been trained in autoassociative mode. The second is because of its high tolerance for noise in test patterns. The third is the ability of the proposed model to retrieve patterns when the associative memory has been trained in autoassociative mode. The last and perhaps the most important is the use of the smallest normalized distance, which helps to retrieve the class label of a pattern that was present in the training phase and assign it to an unknown (test) pattern that has the highest similarity with respect to some instance that was present in the training phase. It basically assigns a class label considering the maximum similarity between the unknown instance and the training instances.

Conclusions
This paper introduces an efficient method for performing Parkinson's disease detection. This model is known as improved smallest normalized difference associative memory (ISNDAM) algorithm. It is the outcome of an improvement to the original SNDAM model. A feature selection stage was added to the original SNDAM model, which enhances the classification performance.
After performing an analysis on the outcomes of the experimental phase, it can be affirmed that the ISNDAM algorithm is an efficient and effective option for detecting Parkinson's disease from voice samples.