Automatic and Early Detection of Parkinson’s Disease by Analyzing Acoustic Signals Using Classification Algorithms Based on Recursive Feature Elimination Method

Parkinson’s disease (PD) is a neurodegenerative condition generated by the dysfunction of brain cells and their 60–80% inability to produce dopamine, an organic chemical responsible for controlling a person’s movement. This condition causes PD symptoms to appear. Diagnosis involves many physical and psychological tests and specialist examinations of the patient’s nervous system, which causes several issues. The methodology method of early diagnosis of PD is based on analysing voice disorders. This method extracts a set of features from a recording of the person’s voice. Then machine-learning (ML) methods are used to analyse and diagnose the recorded voice to distinguish Parkinson’s cases from healthy ones. This paper proposes novel techniques to optimize the techniques for early diagnosis of PD by evaluating selected features and hyperparameter tuning of ML algorithms for diagnosing PD based on voice disorders. The dataset was balanced by the synthetic minority oversampling technique (SMOTE) and features were arranged according to their contribution to the target characteristic by the recursive feature elimination (RFE) algorithm. We applied two algorithms, t-distributed stochastic neighbour embedding (t-SNE) and principal component analysis (PCA), to reduce the dimensions of the dataset. Both t-SNE and PCA finally fed the resulting features into the classifiers support-vector machine (SVM), K-nearest neighbours (KNN), decision tree (DT), random forest (RF), and multilayer perception (MLP). Experimental results proved that the proposed techniques were superior to existing studies in which RF with the t-SNE algorithm yielded an accuracy of 97%, precision of 96.50%, recall of 94%, and F1-score of 95%. In addition, MLP with the PCA algorithm yielded an accuracy of 98%, precision of 97.66%, recall of 96%, and F1-score of 96.66%.


Introduction
Parkinson's disease (PD) is a neurodegenerative disease caused by the death of neurons (called substantia nigra) that generate dopamine [1]. Dopamine is an organic chemical of the catecholamine and phenethylamine families that controls physical movement by transmitting messages between the brain and the substantia nigra, thereby enabling coordinated movement [2]. When 60-80% of the cells that produce dopamine are lost, the amount of dopamine is not enough to control a person's movements and, thus, symptoms of PD appear [3]. The lack of dopamine neuron production leads to losing control over the • Proposing a new approach for early detection and diagnosis of PD based on acoustic signals to help doctors with early diagnosis and timely medical interventions; • Proposing and implementing SMOTE technique for a balanced dataset; • Apply Pearson's coefficient to analyze the correlation between all features and remove attributes with a very high correlation; • Apply the RFE algorithm to give each feature a percentage of its contribution to diagnosing PD; • Apply the t-SNE and PCA algorithms to reduce the number of features in the dataset and select the features correlated with the target characteristic.
The rest of this paper is organised as follows. Section 2 provides related work. Section 3 describes the process of PD detection by dysphonia. Section 3 analyses the materials and methods applied in the study, and presents subsections on processing the features, finding correlations between them and removing the outliers. Section 4 describes the experiment setup. Section 5 presents the results of the analysis and compares the results with those of the literature. Section 6 provides the conclusion.

Related Work
This paper study is distinguished from current studies by developing diagnostic systems with various methodologies and tools that can effectively analyze audio data and distinguish between Parkinson's and healthy people with high precision.
Hui et al. [22] proposed a CNN for analyzing the EEG recordings of 16 healthy subjects and 15 Parkinson's patients. Gabor transform converted the EEG signals into spectral diagrams to train a CNN. Majid et al. [23] developed an approach based on spatial patterns of PD diagnosis for patients taking medication and those not taking medication. The EEG signals were processed to remove noise by a common spatial pattern. Features were extracted from the optimized signals and fed into machine-learning classifiers. The classifier achieved the best results with features extracted from beta and alpha ranges with an accuracy of 95%. Luigi et al. [24] The frequency features of the velocity and angle signals were extracted, the features were selected, and the machine-learning classifiers were optimized for FOG capture and before. The FOG detection network achieved good results and validated the patients. The network achieved a sensitivity of 84.1%, a specificity of 85.9%, and an accuracy of 86.1%. Thus, the network can predict FOG before it happens. Nalini et al. [25] utilized three deep-learning networks, RNN, LSTM, and MLP, for diagnosing the voice characteristics of PD patients. They provided loss-function curves for PD detection; the LSTM network achieved better performance than other networks. Arti et al. [26] utilized three ML and ANN algorithms to diagnose a speech dataset. Data collection and feature selection have been improved based on the wrapper and filtering method. SVM and KNN achieved an accuracy of 87.17%, while naïve Bayes had an accuracy of 74.11%. Hajer et al. [27] also used three ML algorithms for diagnosing a PD dataset to distinguish PD patients from healthy controls. The data were analyzed by linear discriminant analysis (LDA) and PCA algorithms. K-means and DBSCAN models are built based on feature-reduction algorithms. LDA performs better than PCA; thus, its output is fed into clustering algorithms. DB-SCAN achieved an accuracy of 64%, a sensitivity of 78.13%, and a specificity of 38.89%. Moumita et al. [28] designed three schemes based on decision-forest and SysFor algorithms through ForestPA features for PD diagnosis. The approach requires minimal decision trees to achieve good accuracy. Increasing the density of decision trees for dynamic training and testing new samples proves to be the best method for PD detection. The decision tree with ForestPA features reached an accuracy of 94.12%. Sarkar et al. [29] collected and analysed acoustic data from 40 people: 20 with PD and 20 that were normal. The researchers used support-vector machines (SVM) and K-nearest-neighbour (KNN) classifiers to analyse and diagnose samples. To diagnose a speech disorder in PD patients, Little et al. [30] proposed an algorithm to measure dysphonia and analyse speech methods. The main objective was to distinguish between PD and normal patients through two features, namely recurrence and fractal scaling, which distinguish distorted sounds from normal sounds and applied the pitch period entropy (PPE) method to diagnose a dataset consisting of 23 patients with PD and 8 healthy subjects through the extraction of dysphonia features. The method achieved an accuracy of 91.4%. Canturk et al. [31] presented four methods for selecting features and classified the selected features into six categories. The system achieved an accuracy of 57.5% with LOSO CV and an accuracy of 68.94% with fold CV. Li et al. [32] extracted hybrid features, and then diagnosed these features through SVM; the algorithm achieved 82.5% accuracy. Benba et al. [33] extracted features by Mel-frequency cepstral coefficients and rated them by SVM, and the algorithm achieved 82.5% accuracy with LOSO CV. They also applied human factor cepstral coefficients to extract features from vowels, achieving an accuracy of 87.5% with LOSO CV. Almeida et al. [34] extracted the features of phonemic pronunciation using several methods, and PD was detected on the basis of several classifiers. Das et al. [35] applied partial least squares to reduce dimensions and applied a self-organising map (SOM) for clustering. Finally, PD was detected by the unified PD rating scale. Yuvaraj et al. [36] studied emotional information such as happiness, anger, fear, sadness, and disgust to diagnose PD and distinguish it from normal states through the use of EEG signals. Spectral decomposition was also applied with KNN and SVM classifiers, and the researchers observed that emotions were reduced in PD patients [37,38]. Yuvaraj et al. [39] applied higher-order spectra to extract features from electroencephalography signals to diagnose PD from normal cases. All classifiers achieved promising results. Sivaranjini et al. [40] used the AlexNet model to diagnose MR images to distinguish PD from normal cases; the model reached an accuracy of 88.9%. Ali et al. [41] extracted features and selected the most important ones, ranking them by the chi-square statistical method for PD diagnosis. Senturk et al. [42] applied methods to choose features and remove unessential ones, and the selected features were diagnosed through ML methods for early detection of PD. Gupta et al. [43] employed an optimised version of the crow search algorithm for early detection of PD, in which the system achieved superior accuracy.
Through previous studies, it is noted that there are limitations in the techniques used and the failure to achieve satisfactory results for the automated and early diagnosis of PD.

Proposed Approach for PD Detection
This section, as shown in Figure 1, discusses the development of an automated technique for analyzing acoustic signals of a PD dataset for early diagnosis of PD. To improve the dataset by processing outliers and replacing missing values, a coefficient of variation was applied to measure the relative dispersion of the data points and to balance the dataset by the SMOTE method. The association of all features with the target feature was evaluated through the correlation coefficient. The relationship between features and the proportion of positive and negative correlation for each feature was measured by the RFE algorithm. To select the most critical features, t-SNE and PCA algorithms were applied. Finally, the selected features were classified by five classifiers.
Diagnostics 2023, 13, x FOR PEER REVIEW 5 of 25 et al. [42] applied methods to choose features and remove unessential ones, and the selected features were diagnosed through ML methods for early detection of PD. Gupta et al. [43] employed an optimised version of the crow search algorithm for early detection of PD, in which the system achieved superior accuracy. Through previous studies, it is noted that there are limitations in the techniques used and the failure to achieve satisfactory results for the automated and early diagnosis of PD.

Proposed Approach for PD Detection
This section, as shown in Figure 1, discusses the development of an automated technique for analyzing acoustic signals of a PD dataset for early diagnosis of PD. To improve the dataset by processing outliers and replacing missing values, a coefficient of variation was applied to measure the relative dispersion of the data points and to balance the dataset by the SMOTE method. The association of all features with the target feature was evaluated through the correlation coefficient. The relationship between features and the proportion of positive and negative correlation for each feature was measured by the RFE algorithm. To select the most critical features, t-SNE and PCA algorithms were applied. Finally, the selected features were classified by five classifiers.

Description of Dataset
The PD dataset used in this paper for the early detection of PD is based on voice signals, which was created and donated by Max Little of Oxford University to the UCI Machine Learning Repository [44]. The dataset is considered one of the most efficient datasets collected, prepared, and evaluated by many physicians. Many researchers have developed automated techniques and evaluated them on this dataset. It is still the destination of many researchers and those interested in the early detection of PD. The dataset of voice signals contains 195 biomedical voices which are divided into 147 phonetics for PD patients and 48 for healthy people [45]. Table 1 shows 23 features extracted from voice signals that describe the voice measure and interpretation of each feature.

Description of Dataset
The PD dataset used in this paper for the early detection of PD is based on voice signals, which was created and donated by Max Little of Oxford University to the UCI Machine Learning Repository [44]. The dataset is considered one of the most efficient datasets collected, prepared, and evaluated by many physicians. Many researchers have developed automated techniques and evaluated them on this dataset. It is still the destination of many researchers and those interested in the early detection of PD. The dataset of voice signals contains 195 biomedical voices which are divided into 147 phonetics for PD patients and 48 for healthy people [45]. Table 1 shows 23 features extracted from voice signals that describe the voice measure and interpretation of each feature.

Data Processing
Data processing is the process of converting raw data into a useful and understandable form. Data analytics is one of the most significant steps to ensure the success of subsequent measures. Data processing consists of two steps: (1) the imputation of data, which involves replacing missing values, removing outliers, and deleting duplicated values, and (2) the validation of data to ensure completeness and consistency [46]. In this paper, we noticed that the dataset does not contain duplicate values as the number of rows is similar to the number of unique column values. We also note that all the features are continuous "numerical variables" types except for the "status" feature, which is of a binary categorical type. Thus, the data of the feature must be converted to an object data type, as shown in Table 1. If discrepancies are discovered during the data-processing steps, appropriate actions are taken based on the specific nature and extent of the discrepancies. This may involve imputing missing values, handling outliers, removing duplicates, or investigating and resolving data-validation issues. The goal is to ensure the integrity and quality of the data for accurate and reliable analysis.

Detecting Outliers
The PD dataset contains 23 features of voice samples. The skewness method of statistical analysis was applied to measure the symmetry of distribution in the dataset. When feature values are on the left or right side of a median, the feature values are described as skewed. The data are symmetric when the mean, median, and mode are at the same point. The data show positive skewness when the distribution of the tail to the right side is longer or fatter, which means that the mean and median are greater than the mode [47].
The data show negative skewness when the tail distribution to the left side is longer or fatter than the right side, which means that the mean and median are less than the mode. In this paper, we divided the dataset into seven groups, where the features of group 1, namely, MDVP: Fo (Hz), MDVP: Fhi (Hz) and MDVP: Flo (Hz), have a right skewness, which means that the mean value is greater than the median value. Furthermore, the features of group 2, namely, MDVP: Jitter (%), MDVP: Jitter (Abs), MDVP: RAP, MDVP: PPQ and Jitter: DDP, have a right skewness, which means that the mean value is greater than the median value. Similarly, the features of group 3, namely, MDVP: Shimmer, MDVP: Shimmer (dB), Shimmer: APQ3, Shimmer: APQ5, MDVP: APQ and Shimmer: DDA, have a right skewness, which means that the mean value is greater than the median value. For group 4 features, namely, NHR and HNR, the NHR features have a right skewness, which means that the mean value is greater than the median value, whereas the HNR features have a left skewness, which means that the mean value is less than the median value. For group 5 features, namely, RPDE and D2, the median and mean are close, indicating minimal skewness. For the feature of group 6, DFA, the median and mean are close to each other, indicating negligible skewness. For the Group 7 features, namely, spread1, spread2, and PPE, the mean appears to be slightly greater than the median, and, therefore, the traits have positive skewness. Table 2 describes the skewness value for each feature.

Coefficient of Variation
The coefficient of variation (CV) is a statistical measurement of the relative dispersion of data points in a dataset about the mean, where the variability increases as the number increases. This measure also shows the variability of the dataset with respect to the mean. The purpose of applying CV to the features of a dataset is to assess the accuracy of the technique [48]. CV is also applied when the standard deviation is proportional to the mean, which is a measure of variability. CV is more accurate than the standard deviation. When a CV is less than 1, it has a low variance, and when a CV is higher than 1, its variance is high. Equation (1) shows the mathematical formula for CV. Table 3 describes the CV values for each feature in the dataset for healthy subjects and patients with PD.

Balance of Dataset
The PD dataset consists of 195 records divided into two unbalanced classes, which are healthy (Class 0 with a percentage of 24.62%) and Parkinson (Class 1 with a rate of 75.38%). Therefore, the dataset is unbalanced. Thus, the diagnostic process will tend to the majority class and ignore the minority class. Therefore, balancing the dataset is necessary. If upsampling techniques are applied, the majority of classes in the dataset will lose important information. Thus, the oversampling method overcomes this challenge. Thus, samples for the minority classes are increased.
To overcome this problem, SMOTE was proposed and implemented [49]. It works by adding new samples to minority classes during the training phase only. This method searches for samples of minority classes and discovers the nearest neighbor to each point to generate new samples [50]. The method continues until the dataset is balanced and the minority classes become equal to the majority classes. Table 4 shows the dataset before and after the use of SMOTE. It is noted that the classes of the minority classes (healthy) became almost equal to the classes of the majority (Parkinson).

Correlation Features
Statistical methods are used for the processing and interpretation of raw data. The correlation coefficient is a statistical indicator of relationships between features or between expected and actual values, where the correlation coefficient shows the correlation of each feature with the other. The value of the correlation coefficient varies from −1 to +1, where the relationship between two features is positive when the value of one feature increases Diagnostics 2023, 13, 1924 9 of 24 or decreases, and the value of other features increases or decreases with it. A negative relationship occurs when the value of the feature increases, and then it decreases the other or vice versa. Zero correlation is observed when one feature does not affect the other [51]. The analysis of the correlation between all the features by tuning the Pearson's coefficient is in Equation (2), which describes the coefficient for tuning all the features.
The correlation coefficient is set between 0.80 and −0.80 for positive and negative correlations, respectively. Good correlations between features are beneficial. The dataset contains highly correlated features, which must be removed before the dataset can be classified. The imbalance of the dataset is the reason for the high correlation. Thus, the oversampling process was used to balance the dataset. Then, we found the correlation coefficient again between the dataset's features. After outliers were detected, the z-score method was applied to remove outliers by normalising the dataset. After applying the z-score method, 14 rows were removed from the dataset, which ultimately consisted of 181 rows and 23 features. Equation (3) describes the z-score method.
After the outliers were removed, a few highly correlated features were obtained, thereby proving that the correlation coefficients were significantly affected by the outliers. Figure 2 describes a correlation between the dataset's features after the outliers were removed. The features were reduced to 13, which then became the dataset consisting of 181 rows.

Standardisation of Continuous Variables
Finally, the dataset obtained from the previous steps contains continuous variables; thus, the standardisation method was applied to ensure that all data had a standardised format [52]. Table 5 shows the dataset after applying the standardising method where we notice the data were formatted in the same format to give the dataset a deeper and effective meaning. The dataset was standardised by Equation (4) where each feature gets the

Standardisation of Continuous Variables
Finally, the dataset obtained from the previous steps contains continuous variables; thus, the standardisation method was applied to ensure that all data had a standardised format [52]. Table 5 shows the dataset after applying the standardising method where we notice the data were formatted in the same format to give the dataset a deeper and effective meaning. The dataset was standardised by Equation (4) where each feature gets the mean subtracted from its value and divided by the standard deviation of the dataset.  The rows are numbered from zero to four, indicating different instances or observations within the dataset; • Each cell in the table represents a value corresponding to a specific variable and observation. The values are standardized, which means they have undergone a process of normalization or scaling to a common scale, often with a mean of zero and a standard deviation of one. Standardization is performed to facilitate comparisons and analysis of variables with different scales or units.
Without further context or information about the dataset, providing a more detailed interpretation or analysis is challenging. However, based on the given information, it can be inferred that this dataset contains standardized continuous values related to PD and its associated variables, potentially for further analysis, modelling, or statistical calculations.
The table shows that the values of all eight measures of voice quality are different for each data point. This suggests that there is a wide range of voice quality in the PD dataset. The standardization process has helped to make the values in the table more comparable, which will make it easier to analyze the data.

Recursive-Feature Elimination Algorithm
After the preprocessing, it is necessary to identify the correlation between the features and the percentage of positive and negative correlation of all features. Therefore, finding the correlation of all features with the target feature (status) is necessary to specify the contribution of each feature in diagnosing the condition of Parkinson's. In this paper, we applied the RFE algorithm, which works to find the correlation of all features with the target feature. The RFE algorithm is easy to use, effective, and efficient to select the most important features correlated to the effective prediction of the target feature and eliminate the features that have a weak correlation with the status feature [53]. Table 6 shows the correlation of all dataset features with the target feature. It is noticed that there is a positive and negative correlation between the features and the status feature; it is worth noting that the best positive features correlated to status are spread1, MDVP: Fo(Hz), MDVP: Flo(Hz), etc. It is worth noting that we applied this algorithm to the dataset after removing the features that contain many outliers.

t-SNE Algorithm
Dimensional reduction algorithms select the most important features strongly associated with the target characteristic. Thus, important and highly representative features are obtained to obtain high accuracy. Dimensional reduction is referred to as reducing the number of variables in a dataset. The dataset after dimension reduction can have better predictive performance than the original dataset [54].
In this paper, the t-SNE algorithm was proposed and applied to reduce the dimensions of nonlinear data and drops the data from a high-dimensional space P to a low-dimensional data space Q to visualize the data. Through the name of the algorithm, which means that the probability is random and not confirmed, and is concerned with the variance of neighbourhood points and the inclusion of data in a low space. In addition, t-SNE generates different data each time for the same dataset but it is focused on keeping adjacent data points. The algorithm distributes the Xi and Xj pairs, assigning the similar features in a higher probability and the dissimilar features in a lower probability. Equation (5) describes the pairwise similarity in the high-dimensional data space, and X's conditional probability has many neighbors. Equation (6) also illustrates the representation of data points in a low-dimensional space by t-SNE. Then, t-SNE iteratively operates the same probability distribution over a smaller data space to reduce the Kulback-Leibler (KL) variance, as shown in Equation (7). The algorithm reduced the features of the new dataset from 12 to 10 features.
where P x i /x j is the high-dimensional data space and x i /x j are pairs in the P space; Q y i /y j is the low-dimensional data space and y i /y j are pairs in the Q space.

PCA Algorithm
One of the popular dimensionality reduction algorithms, PCA, is an unsupervised statistically working algorithm that converts the values of correlated features into linearly uncorrelated features called principal components. The algorithm depends on the mathematical concepts of variance, covariance, eigenvalues, and eigenvectors. The dimensions refer to the number of features in the dataset. The correlation refers to the correlation between two features; when components are orthogonal, the relationship between the two features is zero [55]. The algorithm standardizes the dataset so that the features are of high variance. However, if the variance is independent of the significance of the features, then the algorithm will divide each distinct value by the standard deviation of all the features. The Z covariance matrix contains the variance between two pairs of features [56]. Eigenvectors represent axes of information with a high variance that have eigenvalues. The algorithm arranges the eigenvalues in descending order and the eigenvectors in descending order in the P matrix. After that, the Z covariance matrix is multiplied by the P matrix to get new features. Finally, essential and relevant features are preserved and less critical features are removed to produce a new dataset. In this paper, the features were reduced from 12 to 9 while retaining the most vital information.

Classification Algorithms
The aim of classification is to find a model that describes and, at the same time, distinguishes classes of data, and then uses it to predict the class to which an unclassified object will belong. Classification is a process that allows data to be divided into given classes based on their properties [57]. The classification process takes place in the following steps: • Training based on the analysis of the so-called classification model created in the training set; • Testing, that is, evaluation of the quality of the created model using test data.
In this paper, five types of classification algorithms that are most widely used in the literature were applied for the early diagnosis of PD.

Support-Vector Machines
The SVM has a straight line in the middle called a hyperplane that separates two classes so that the margin minimum is maximised. These classes are normal and stroke. This hyperplane is a decision boundary found by the SVM algorithm. The decision boundary divides the data space into two halves called normal and infected cases. The geometric mean edge is the distance from the decision boundary to the nearest data point. When the resolution limits are separated (hyperplane) and the training data are linearly separable, the geometric edge is positive. The goal is to find a hyperplane that maximises the margin. When the training data are linearly separable, a single linear decision limit exists, which separates the normal data above the hyperplane and the data of stroke patients under the hyperplane [58].

K-Nearest Neighbour
The KNN algorithm used a supervised ML method to solve regression and classification problems. KNN does not include any knowledge stage. This algorithm detects similarities between the new data point and the stored data points [59]. The similarities indicate how close or far the new point (test data) is from the stored data (training data) based on Euclidean distance. In other words, the closer the new data point is to any training point, the more closely it belongs to this class.
The KNN algorithm is applied through the following steps: • The value of the variable K is set; • The Euclidean distance from the new data point to the training data points is calculated; • The sorted group is organised in ascending order based on the calculated Euclidean distance; • The K-value labels are added; • According to the value of K, the new data point is classified into its nearest neighbours according to the Euclidean distance.

Decision Tree
DT is a diagram with a tree structure, where the uppermost node represents the root. Internal nodes represent feature testing, edges represent test results, and leaf nodes indicate the class into which it was classified. The key point in creating the DT is selecting the "highest decision" feature, which initially forms the root node of the tree. If we can find such a feature, then the process of determining the correct class for a given classification object becomes highly accurate and computationally less demanding [60]. The process moves from the root node according to certain rules to the leaf node where the decision is made.

Random Forest
RF is a generalisation of the bagging methodology, which generates many trees based on the CART algorithm and then collects predictions or ratings from each tree. The intention is to reduce the high variance generated from each individual tree, and thus improve diagnostic performance. Forests are random, similar to bootstrap as a clustering method, which is attractive mainly because they are able to strengthen weak methods, thereby resulting in accurate diagnostic predictions [61] Finally, the RF takes the vote from each tree and the diagnosis is made according to the highest vote output from all the trees in the RF.

Multilayer Perceptron
The MLP is a generalisation of the perceptron for solving nonlinear separable problems. The perceptron contains an input layer to receive the input data, hidden layers to address the problem, and an output layer to show the results.
The MLP architecture consists of the following parts: • An input layer, which is only responsible for receiving input signals, whether images or text, and passing them to the next layer; • An output layer that provides the network response and shows the required diagnostic results; • Hidden layers, which works on processing the entered data to find solutions to the given problem; • Feedforward networks, which enable communication where the data move only in the forward direction; and Neurons that are connected to all the other neurons in the next layer.

Experimental Results
The results of the system development are presented in this section.

Experiment Settings
The experiment was conducted on a computer running the Windows operating system with the hardware equipment that can be seen in Table 7. The experiments were coded using Python.

Splitting Dataset
The dataset generated by analyzing audio signals consisted of 195 records divided into two unbalanced categories, healthy and Parkinson's. This dataset was divided into 69.75% for training and 30.25% for testing. Before balancing, the dataset contains 147 (75.38%) PD records and 48 healthy records (24.62%). While after balancing the dataset, the dataset contains equal records during the training phase, with 103 records for both classes. Table 4 describes the distribution of dataset samples during the training and testing phases.

Evaluation Metrics
The performance of the classification algorithms on the PD dataset was evaluated by using four statistical measures: accuracy, precision, recall, and F1 score. These evaluation metrics are the most effective measure to test the effectiveness of classification models. Equations (8)-(11) describe the process of calculating the statistical metrics, where TP (true positive) and TN (true negative) represent correctly classified instances, whereas FP (false positive) and FN (false positive) represent incorrectly classified instances [62].
where TP refers to Parkinson's instances that are correctly classified, TN indicates normal instances that are correctly classified, FN represents PD instances that are classified as normal, and FP indicates normal instances that are classified as PD.

Leave-One-Subject-Out Cross-Validation
The leave-one-subject-out cross-validation (LOSO CV) method is a type of crossvalidation that is commonly used to evaluate the performance of machine-learning models on the acoustic signals dataset for Parkinson's disease. In LOSO CV, the model is trained on all but one subject's data; then, the model's performance is evaluated on the held-out subject's data. This process is repeated for each subject in the dataset. The final accuracy of the model is calculated as the average accuracy across all subjects. The LOSO CV method is a more robust evaluation method than other cross-validation methods, such as k-fold cross-validation, because it accounts for the variability between subjects. This is important because the acoustic signals dataset contains a wide range of subjects with different vocal characteristics. The LOSO CV method can help to ensure that the model is not overfitting to the training data and that it is able to generalize to new subjects.
Here is a step-by-step description of the LOSO CV method on the acoustic signals dataset for Parkinson's disease:

1.
Dataset preparation: the acoustic signals dataset contains recordings of acoustic features extracted from the speech signals of individuals with and without Parkinson's disease. Each subject in the dataset contributes multiple recordings; 2.
Subject split: the dataset is divided into subjects, where each subject corresponds to an individual participant. In LOSO CV, one subject is selected as the test subject, and the remaining subjects are used as the training set. This process is repeated for each subject in the dataset, ensuring that each subject serves as the test subject once; 3.
Training phase: for each iteration of LOSO CV, the machine-learning model is trained on the training set, which consists of all subjects except the test subject. The model learns the patterns and relationships between the acoustic features and the presence of Parkinson's disease; 4.
Testing phase: after training the model, the test subject is used to evaluate the model's performance. The model takes the acoustic features from the test subject's recordings as input and predicts whether or not the subject has Parkinson's disease; 5.
Performance evaluation: the predictions made by the model are compared to the true labels (i.e., the presence or absence of Parkinson's disease) for the test subject.
Common evaluation metrics such as accuracy, precision, recall, and F1 score can be calculated to assess the model's performance on the test subject; Iteration: steps 2-5 are repeated for each subject in the dataset, with each subject serving as the test subject exactly once. The performance metrics obtained for each iteration can be aggregated to get an overall assessment of the model's performance on the acoustic signals dataset. Overall, the LOSO CV method is a valuable tool for evaluating the performance of machine-learning models on the acoustic signals dataset for Parkinson's disease. It is a more robust evaluation method than other cross-validation methods, and it can help to ensure that the model is not overfitting to the training data and that it can generalize to new subjects.

Results of Classifiers with the t-SNE Method
The dataset was divided into 69.75% for training and 30.25% for testing and the SMOTE technique was applied to balance the dataset during the training phase. Each feature was arranged and given a percentage according to its relation to the target feature using the RFE algorithm. The t-SNE algorithm also reduced the dataset dimensions and the features to ten important features. The SVM, KNN, DT, RF, and MLP classifiers were fed with 10 features. The loss function was also reduced by adjusting the hyperparameter of the classifiers during the training phase.  Figure 3 shows that all the classifiers were accurate and effective in categorising PD cases and they achieved more accurate diagnostic results than the classification of normal patients.

Results of Classifiers with the PCA Method
In this experiment, the processing stages of the dataset passed through the same stages, except for the dimensionality-reduction process, where the high-dimensional data space was represented in the low-dimensional data space by the PCA algorithm. The PCA algorithm reduced the dataset's features to nine important features for diagnosing PD. The loss function in classification algorithms was also reduced by adjusting the hyperparameter of the classifiers during the training phase. Table 9 describes the results achieved by the five algorithms. The RF algorithm achieved the best results during the training and testing phases than the rest. The RF and MLP algorithms achieved the best results during the training and testing phases than the rest. During the training phase, RF achieved accuracy, precision, recall, and F1-score of 99%, 95.52%, 99.10%, and 97.28%, respectively, while during the testing phase, it reached 99%, 95%, 98%, and 96%. During the training phase, MLP achieved accuracy, precision, recall, and F1-score of 98.43%, 98%, 97.66%, and 98%, respectively, while during the testing phase, it reached 98%, 97.66%, 96%, and 96.66%. During the training phase, DT achieved accuracy, precision, recall, and F-score of 96%, 91.33%, 93.66%, and 92.66%, respectively, while during the testing phase, it reached 95%, 94%, 90.33%, and 92%. During the training phase, SVM achieved accuracy, precision, recall, and F1-score of 94%, 90%, 91.66%, and 90.66%, respectively, while during the testing phase, it reached 93%, 84.70%, 91.20%, and 87.70%. Finally, KNN obtained accuracy, precision, recall, and F1-score during the training phase of 94%, 89.30%, 89%, and 89.03%, respectively, while during the testing phase, it reached 93%, 91%, 84%, and 87%. Figure 4 shows that all classifiers achieved accurate and effective diagnostic results for diagnosing PD.

Results of Classifiers with the PCA Method
In this experiment, the processing stages of the dataset passed through the same stages, except for the dimensionality-reduction process, where the high-dimensional data space was represented in the low-dimensional data space by the PCA algorithm. The PCA algorithm reduced the dataset's features to nine important features for diagnosing PD. The loss function in classification algorithms was also reduced by adjusting the hyperparameter of the classifiers during the training phase. Table 9 describes the results achieved by the five algorithms. The RF algorithm achieved the best results during the training and testing phases than the rest. The RF and MLP algorithms achieved the best results during the training and testing phases than the rest. During the training phase, RF achieved accuracy, precision, recall, and F1-score of 99%, 95.52%, 99.10%, and 97.28%, respectively, while during the testing phase, it reached 99%, 95%, 98%, and 96%. During the training phase, MLP achieved accuracy, precision, recall, and F1-score of 98.43%, 98%, 97.66%, and 98%, respectively, while during the testing phase, it reached 98%, 97.66%, 96%, and 96.66%. During the training phase, DT achieved accuracy, precision, recall, and F1-score of 96%, 91.33%, 93.66%, and 92.66%, respectively, while during the testing phase, it reached 95%, 94%, 90.33%, and 92%. During the training phase, SVM achieved accuracy, precision, recall, and F1-score of 94%, 90%, 91.66%, and 90.66%, respectively, while during the testing phase, it reached 93%, 84.70%, 91.20%, and 87.70%. Finally, KNN obtained accuracy, precision, recall, and F1-score during the training phase of 94%, 89.30%, 89%, and 89.03%, respectively, while during the testing phase, it reached 93%, 91%, 84%, and 87%. Figure 4 shows that all classifiers achieved accurate and effective diagnostic results for diagnosing PD.

Discussion
PD is a health problem that threatens the elderly and is caused by neurodegeneration due to the death of neurons that secrete dopamine. The manual diagnosis of PD is still lacking, doctors' opinions differ, and the number of doctors in developing countries is small. Thus, automated diagnosis by artificial intelligence solves these challenges. In this study, many systems have been developed that go through many stages of image processing. The PD dataset went through data optimization and feature correlation rents. The RFE algorithm was applied to give the percentage contribution of each feature to the target feature. The features of the dataset were subjected to the t-SNE and PCA algorithms to select the most important features and fed to five classifiers for classification. When analyzing acoustic signals for the diagnosis of Parkinson's disease, t-SNE and PCA are both dimensionality reduction techniques that can be employed. However, they serve different purposes and have distinct advantages and limitations. Complementary information: t-SNE and PCA capture different aspects of the data. While t-SNE is effective at visualizing clusters and preserving the local structure, it may not capture the overall variance or global patterns in the data. PCA, on the other hand, focuses on explaining the maximum variance in the data, which can provide valuable insights into the most significant features. By combining both techniques, you can benefit from their complementary information. Interpretability: PCA produces orthogonal components that are interpretable as linear combinations of the original features. These components represent the directions of maximum variance in the data. In contrast, t-SNE does not provide straightforward interpretations or explicit relationships with the original features. Therefore, PCA can help in understanding the underlying factors contributing to the acoustic signals related to Parkinson's disease. This section discusses the evaluation of algorithms devel-

Discussion
PD is a health problem that threatens the elderly and is caused by neurodegeneration due to the death of neurons that secrete dopamine. The manual diagnosis of PD is still lacking, doctors' opinions differ, and the number of doctors in developing countries is small. Thus, automated diagnosis by artificial intelligence solves these challenges. In this study, many systems have been developed that go through many stages of image processing. The PD dataset went through data optimization and feature correlation rents. The RFE algorithm was applied to give the percentage contribution of each feature to the target feature. The features of the dataset were subjected to the t-SNE and PCA algorithms to select the most important features and fed to five classifiers for classification. When analyzing acoustic signals for the diagnosis of Parkinson's disease, t-SNE and PCA are both dimensionality reduction techniques that can be employed. However, they serve different purposes and have distinct advantages and limitations. Complementary information: t-SNE and PCA capture different aspects of the data. While t-SNE is effective at visualizing clusters and preserving the local structure, it may not capture the overall variance or global patterns in the data. PCA, on the other hand, focuses on explaining the maximum variance in the data, which can provide valuable insights into the most significant features. By combining both techniques, you can benefit from their complementary information. Interpretability: PCA produces orthogonal components that are interpretable as linear combinations of the original features. These components represent the directions of maximum variance in the data. In contrast, t-SNE does not provide straightforward interpretations or explicit relationships with the original features. Therefore, PCA can help in understanding the underlying factors contributing to the acoustic signals related to Parkinson's disease. This section discusses the evaluation of algorithms developed on the PD dataset for early diagnosis of PD at each category level. The systems achieved better results after applying the t-SNE and PCA algorithms. This means that the existence of some features is negatively associated with the target feature and affects the efficiency of the system.
The systems optimize the early diagnosis of PD by evaluating selected features and hyperparameter tuning of ML algorithms for diagnosing PD based on voice disorders. The study found that the proposed techniques were superior to existing studies, indicating the superiority of the proposed techniques. This contributes to the evaluation and advancement of existing knowledge in the field of Parkinson's disease diagnosis.
Second, when the classifiers are fed the dataset after dimensionality reduction by the PCA algorithm, Table 11 and Figure 6 describe the diagnostic results for each class. In the healthy class, it is noted that SVM, KNN, DT, RF, and MLP achieved the following results: for precision by 80%, 88.50%, 94%, 100%, and 98%, respectively; also for recall by 97%,  Our findings suggest that acoustic signals can be used to detect Parkinson's disease automatically and early. This could lead to earlier diagnosis and treatment, which could improve the quality of life for people with Parkinson's disease.
The systems optimize the early diagnosis of PD by evaluating selected features and hyperparameter tuning of ML algorithms for diagnosing PD based on voice disorders. The study found that the proposed techniques were superior to existing studies, indicating the superiority of the proposed techniques. This contributes to the evaluation and advancement of existing knowledge in the field of Parkinson's disease diagnosis.
Second, when the classifiers are fed the dataset after dimensionality reduction by the PCA algorithm, Table 11 and Figure 6 describe the diagnostic results for each class. In the healthy class, it is noted that SVM, KNN, DT, RF, and MLP achieved the following results: for precision by 80%, 88.50%, 94%, 100%, and 98%, respectively; also for recall by 97%, 84%, 96%, 99%, and 100%, respectively; while the for F1-score 88%, 85.50%, 95%, 99%, and 99%, respectively. For the PD class, it is noted that SVM, KNN, DT, RF, and MLP achieved the following results: for precision 87%, 90%, 94%, 90%, and 97.50%, respectively; and for recall at 88%, 84%, 87.50%, 97%, and 95%, respectively; while the F1-score by 87.50%, 90%, 90.50%, 93%, and 95.50%, respectively.     Figure 7 present a comparison of the results of the proposed classification models with existing models discussed in the literature. Our proposed model provides better results over existing studies, whereas the previous studies achieved an accuracy score between 95.43% and 78.23%, while the proposed systems reached an accuracy of 98%, 96%, and 98% for the RF, DT, and MLP classifiers, respectively. The measure of recall (sensitivity) in the previous systems was between 95.4% and 71%, while those of the proposed systems were 98%, 93.66%, and 96% for RF, DT, and MLP classifiers, respectively.    Figure 7 present a comparison of the results of the proposed classification models with existing models discussed in the literature. Our proposed model provides better results over existing studies, whereas the previous studies achieved an accuracy score between 95.43% and 78.23%, while the proposed systems reached an accuracy of 98%, 96%, and 98% for the RF, DT, and MLP classifiers, respectively. The measure of recall (sensitivity) in the previous systems was between 95.4% and 71%, while those of the proposed systems were 98%, 93.66%, and 96% for RF, DT, and MLP classifiers, respectively.  Reliability refers to the consistency and stability of measurements or techniques, while validity refers to the accuracy and appropriateness of the measurements or techniques in assessing the intended construct or phenomenon. Fortunately, the information provided does explicitly mention the reliability and validity of the measures used in the study. The study reported the accuracy of the proposed techniques through the accuracy of the results assessed thanks to the reliability and validity of the measures used. The study stated that the proposed techniques were able to identify PD with an accuracy of up to 98%. Overall, the study provides some promising evidence that acoustic signals can be used to detect PD automatically and early.
The implications of the findings are significant for the field of Parkinson's disease diagnosis and treatment. By leveraging machine-learning algorithms and analyzing acoustic signals, they have demonstrated the potential for automated and early detection of PD. This approach offers several advantages over traditional diagnosis methods, which often involve time-consuming physical and psychological tests and specialist examinations of the patient's nervous system. By extracting a set of features from recordings of a person's voice, we were able to train machine-learning models to distinguish between Parkinson's cases and healthy individuals. This noninvasive approach has the potential to revolutionize the early detection of PD, enabling timely interventions and improved patient outcomes. The study also introduces two techniques, t-SNE and PCA, for the dimensionality reduction of the dataset. By reducing the number of features while retaining the most informative ones, these techniques help improve the efficiency and performance of the classification algorithms. The experimental results presented in the paper demon- Reliability refers to the consistency and stability of measurements or techniques, while validity refers to the accuracy and appropriateness of the measurements or techniques in assessing the intended construct or phenomenon. Fortunately, the information provided does explicitly mention the reliability and validity of the measures used in the study. The study reported the accuracy of the proposed techniques through the accuracy of the results assessed thanks to the reliability and validity of the measures used. The study stated that the proposed techniques were able to identify PD with an accuracy of up to 98%. Overall, the study provides some promising evidence that acoustic signals can be used to detect PD automatically and early.
The implications of the findings are significant for the field of Parkinson's disease diagnosis and treatment. By leveraging machine-learning algorithms and analyzing acoustic signals, they have demonstrated the potential for automated and early detection of PD. This approach offers several advantages over traditional diagnosis methods, which often involve time-consuming physical and psychological tests and specialist examinations of the patient's nervous system. By extracting a set of features from recordings of a person's voice, we were able to train machine-learning models to distinguish between Parkinson's cases and healthy individuals. This noninvasive approach has the potential to revolutionize the early detection of PD, enabling timely interventions and improved patient outcomes. The study also introduces two techniques, t-SNE and PCA, for the dimensionality reduction of the dataset. By reducing the number of features while retaining the most informative ones, these techniques help improve the efficiency and performance of the classification algorithms. The experimental results presented in the paper demonstrate the effectiveness of the proposed techniques. These findings have practical implications for healthcare professionals involved in PD diagnosis. The proposed techniques can be applied to develop automated systems that assist in the early screening and diagnosis of PD based on voice analysis. Such systems could potentially be integrated into routine clinical practice, enabling cost-effective and widespread screening for PD, particularly in populations where access to specialized neurological examinations may be limited. Furthermore, the study contributes to the existing knowledge by demonstrating the efficacy of specific machinelearning algorithms (RF and MLP) and dimensionality reduction techniques (t-SNE and PCA) in the context of PD diagnosis. This knowledge can inform future studies and inspire further study into the development of advanced diagnostic tools and methodologies for Parkinson's disease.
Here are some potential limitations and biases to consider: the study's methodology relies on analyzing voice recordings to diagnose Parkinson's disease. It is essential to discuss the specifics of the data-collection process, including the recording equipment used, the recording environment, and any potential limitations or sources of error introduced during data acquisition. The RFE algorithm selected relevant features from the voice recordings. The criteria used for feature selection and the potential impact of excluding certain features. Biases may arise if certain features are overrepresented or if crucial features are unintentionally omitted.

Conclusions
PD is a disease caused by a lack of dopamine, which affects the elderly and disrupts their lives. This disease is difficult to diagnose because its symptoms are unclear and associated with other diseases. Extensive medical and scientific research has been conducted to diagnose PD early. ML techniques have contributed to early diagnosis by analysing a person's voice disorders. The present study contributes knowledge useful for the early diagnosis of PD by providing a voice dataset comprising 22 features. These features appeared highly correlated, thereby making them unsuitable for high-level diagnosis. The features that contained outliers were removed. The RFE algorithm was applied to rank the features according to their importance; then, the dimensions were reduced by using two algorithms, t-SNE and PCA, to represent the data in a low-dimensional space. The SVM, KNN, DT, RF, and MLP classifiers were fed the resulting features by both t-SNE and PCA algorithms. All classifiers achieved superior results for diagnosing PD and normal cases. During the testing phase, RF with the t-SNE algorithm achieved an accuracy of 97%, precision of 96.50%, recall of 94%, and F1-score of 95%. While MLP with the PCA algorithm achieved an accuracy of 98%, precision of 97.66%, recall of 96%, and F1-score of 96.66%. We have shown that acoustic signals can be used to detect Parkinson's disease automatically and early. This is a significant finding, as it could lead to earlier diagnosis and treatment, which could improve the quality of life for people with Parkinson's disease. Our findings contribute to existing knowledge by providing a new method for detecting Parkinson's disease.
Funding: This research has been funded by the Deanship of Scientific Research at Najran University, Kingdom of Saudi Arabia, through a grant code (NU/DRP/SERC/12/17).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data applied to this study to assess systems performance were obtained from the UCI Machine Learning Repository: Parkinsons dataset, which is publicly available at: https://archive.ics.uci.edu/ml/datasets/parkinsons (accessed on 23 February 2022).