Harnessing Voice Analysis and Machine Learning for Early Diagnosis of Parkinson's Disease: A Comparative Study Across Three Datasets

doi:10.21203/rs.3.rs-3576457/v3

Download PDF

Research Article

Harnessing Voice Analysis and Machine Learning for Early Diagnosis of Parkinson's Disease: A Comparative Study Across Three Datasets

https://doi.org/10.21203/rs.3.rs-3576457/v3

This work is licensed under a CC BY 4.0 License

Journal Publication

published 30 Apr, 2024

Read the published version in Journal of Voice →

Version 3

posted

You are reading this latest preprint version

Objective

This study evaluates the efficacy of voice analysis combined with machine learning (ML) techniques in enabling the diagnosis of Parkinson's Disease (PD).

Methods

Voice data, phonation of the vowel 'a', from three distinct datasets (two from the UCI ML Repository and one from figshare) for 432 participants (278 PD patients) were analyzed. We employed four ML models - Artificial Neural Networks (ANN), Random Forest (RF), Gradient Boosting (GB), and Support Vector Machine (SVM) - alongside two ensemble methods (soft voting classifier - EVC and stacking method - ESM). The models underwent 50 iterations of evaluation, involving various data splits and 10-fold cross-validation. Comparative analysis was done using one-way ANOVA followed by Bonferroni post hoc corrections.

Results

The ESM, SVM, and GB models emerged as the top performers, demonstrating superior performance across metrics, including accuracy, sensitivity, specificity, precision, F1 score, and ROC AUC. Despite data heterogeneity and variable selection limitations, the models showed high values for all metrics.

Conclusion

Machine learning integration with voice analysis, mainly through ESM, SVM, and GB, is promising for early PD diagnosis. Using multi-source data and a large sample size enhances our findings' validity, reliability, and generalizability.

Artificial Intelligence and Machine Learning

Biomedical Engineering

Parkinson's Disease (PD)

voice analysis

Machine learning

Early diagnosis

Artificial Neural networks

Parkinson's Disease (PD) represents a significant global health challenge, affecting millions and exerting substantial socio-economic impacts. Traditional diagnostic approaches, predominantly reliant on assessing physical symptoms, frequently delay detection, especially during the disease's incipient stages, where symptoms may be subtle or absent (1–3). In this context, voice analysis, as a non-invasive, readily accessible diagnostic tool, offers a promising alternative. However, despite its potential, the complexity of PD-related voice changes makes traditional voice analysis methods challenging for early and accurate diagnosis. Speech-Language Pathologists (SLPs) have historically employed voice analysis to diagnose hypokinetic dysarthria, a speech disorder symptomatic of PD. Nonetheless, navigating the subtleties of these changes requires advanced analytical capabilities beyond conventional statistical approaches (4–6).

Recent strides in machine learning (ML) offer new opportunities for the early detection of PD by employing sophisticated analysis of voice-related variables. These techniques can manage significant variability across subjects and groups. They can discern subtle yet critical vocal changes associated with early PD, significantly advancing beyond traditional voice analysis methods and offering more objective and quantifiable metrics (7–9). The integration of ML into voice analysis reflects an evolving landscape in PD diagnostics, where voice impairments like hypophonia and mono-pitch speech, characteristic of hypokinetic dysarthria, serve as pivotal early indicators (1, 2, 8–11).

This evolution is underscored by the application of advanced ML algorithms, such as Support Vector Machines (SVM), Random Forests (RF), and Deep Learning (DL), have been applied to voice analysis, demonstrating potential in identifying subtle vocal changes indicative of early-stage PD with a high degree of accuracy (7, 8, 11–17). These methodologies represent a significant enhancement over traditional voice analysis, providing objective insights into vocal impairments and facilitating earlier PD detection (18) .

This study advances the field by evaluating and comparing four ML models – Deep Neural Networks (DNN), Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM) – alongside two ensemble methods, Ensemble Stacking Model (ESM), and Ensemble Voting Classifier (EVC), in differentiating PD patients from healthy individuals. By applying diverse machine learning techniques to a large dataset, which integrates three distinct datasets, this study seeks to refine the precision of voice analysis tools for Parkinson's Disease diagnosis. The inclusion of varied datasets not only enriches the data pool but also aims to improve the reliability and applicability of our results. This approach may pave the way for enhanced early detection and potentially contribute to more effective disease management and patient outcomes (19).

2.1 Data Acquisition

The study employed three datasets: two sourced from the reputable UCI ML Repository (20, 21) and one from figshare (22), selected for their demonstrated reliability (14, 15, 23–28). Selection was based on data consistency. The datasets included voice measurements from PD patients and healthy individuals, with 432 participants comprising 278 PD patients.

2.2. Voice collection protocol

In each study, participants were instructed to phonate the vowel 'a' continuously for three to five seconds.

2.3 Data Preparation and Preprocessing

We combined the first two datasets and extracted all in common 39 variables: pitch local perturbation measures, amplitude perturbation measures, Mel frequency cepstral coefficient-based spectral measures of order 0 to 12 and their derivatives, recurrence period density entropy, detrended fluctuation analysis, pitch period entropy, and glottal-to-noise excitation ratio. The same variables were also quantified from .wav files in the third dataset after preprocessing (14). Established Python libraries (librosa, parselmouth, fathon, pyrpde) and custom functions facilitated comprehensive audio file analysis. Data from all three datasets were standardized using StandardScaler and OneHotEncoder for sex. To address the class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training data.

2.4 Machine Learning Analysis:

We implemented four widely studied ML models - DNN, RF, GB, and SVM - and two ensemble methods combining these models: ESM and EVM; these ensemble methods (Stacking and Voting) combine predictions from multiple models to improve reliability. Python libraries (30) were utilized for the implementation. These models and methods were selected based on their proven effectiveness in similar contexts, as detailed in (29). Each model offers unique advantages in identifying subtle voice changes associated with early stages of Parkinson's Disease, making them valuable tools for non-invasive diagnostics.

Deep Neural Networks (DNN) are advanced computational models that mimic human brain functions to detect complex patterns. DNNs were constructed using the Keras Sequential API with an input layer, two hidden layers, an output layer, and multiple activation functions. For the stacking ensemble method, the final prediction in this ensemble was made using a Logistic Regression model as the final estimator and was evaluated using K-fold cross-validation (cv = 5).

The models underwent rigorous validation involving five 80 − 20 splits and 10-fold Stratified KFold cross-validation. Hyperparameters were optimized using GridSearchCV, with specific measures taken to mitigate overfitting, such as early stopping and dropout rates, L1/L2 regularization (l1_l2, l1 = 0.001, l2 = 0.0001), learning curves validation plots, out-of-bag (OOB) error measures (RF).

As for the parameters tested, for the DNN, we explored the number of neurons in the first dense layer (16, 32, 64), learning rates (0.001, 0.01, 0.05), dropout rates (0.1, 0.3, 0.5), batch sizes (8, 16, 32), epochs (30, 70), activation functions (sigmoid, tanh, relu, and swish) and optimizers (Adaptive Moment Estimation (Adam), Stochastic Gradient Descent and Adamax, a variant of Adam based on the infinity norm).

Gradient Boosting (GB) involves sequentially improving predictions by focusing on mistakes of previous models. For the GB classifier, we adjusted several hyperparameters: learning rate (0.1, 0.2, 0.3), max depth (5), max features (sqrt, log2), min samples leaf (10, 20), min samples split (20, 30), n_estimators (200, 300), and subsample (0.8, 0.9).

Random Forests (RF) use multiple decision trees to make a more accurate diagnosis by considering various possible outcomes and their probabilities. For the RF classifier, we explored the number of trees (100, 125, 150), max features (auto, sqrt), max depth (4, 5, 6), criterion (gini, entropy), min samples split (8, 10, 12), and min samples leaf (4, 6, 8).

Finally, Support Vector Machines (SVM) find the best boundary that separates different classes based on the input variables. For the SVM, we tested parameters C (0.125, 0.25, 0.5, 0.75) and gamma (0.1, 0.3, 0.6, 0.9).

2.5 Metrics and Statistical Analysis

Mean and standard deviation values for each voice-related variable analyzed in the study were estimated for the control group and the Parkinson's group. F-tests were conducted to assess the equality of variances between the two groups, with p-values less than 0.05 indicating significant differences. Independent t-tests for equal or unequal variances were then used to evaluate the mean differences between groups. T-test results were adjusted for multiple comparisons using the Bonferroni correction method, with an adjusted significance threshold set at p < 0.0013.

The six models were evaluated using accuracy, sensitivity, specificity, precision, F1 score, and ROC AUC. One-way ANOVA with Tukey HSD corrections was employed for comparison across models (31). We also used bootstrapping to estimate 95% confidence intervals for our metrics, ensuring a comprehensive understanding of the models' performance variability and reliability. Bootstrapping was used after we determined the normality of the data using the Shapiro-Wilk test from the "scipy.stats" library with either mean or median values and 1000 times iterations.

Table 1 presents the means and standard deviations for the voice-related variables assessed in this study, comparing the control group to the Parkinson's group. This table also includes F-test and t-test outcomes that determine the mean differences between these groups. Our statistical analysis revealed significant differences between control and Parkinson’s patients across several acoustic measures; among the 39 variables examined, 24 showed significant differences. These include local perturbation measures such as local percentage jitter (locPctJitter), local absolute jitter (locAbsJitter), and relative average perturbation jitter (rapJitter); local and three-point amplitude shimmer measures like local decibel shimmer (locDbShimmer), and three and eleven-point amplitude perturbation quotient shimmer (apq3Shimmer and apq11Shimmer); long-term vocal stability metrics such as recurrence period density entropy (RPDE), detrended fluctuation analysis (DFA), and pitch period entropy (PPE); as well as specific Mel frequency cepstral coefficients (MFCCs) and their corresponding changes over time (delta coefficients), particularly MFCCs of orders 2, 6-12, and delta MFCCs of orders 0-2, 7, 8, 11, and 12.

Following the statistical analysis, hyperparameter tuning of the Deep Neural Network (DNN) model was conducted to optimize its configuration. The optimal setup included 32 neurons in the first hidden layer, utilizing tanh, relu, and sigmoid activation functions in successive layers. The model's learning process was guided by the Adamax optimizer, binary cross-entropy loss, a dropping rate of 0.1, and a learning rate set at 0.01. We trained the DNN for 30 epochs with a batch size of 16. The model, comprised of 1813 trainable parameters, exhibited an average accuracy of 72.10% ± 6.65% (CI 69.5% - 74.7%), sensitivity of 83.49% ± 9.22% (CI 79.7% - 86.9%), specificity of 60.79% ± 10.25% (CI 57% - 64.8%), precision of 68.34% ± 6.26% (CI 66% - 70.7%), F1 score of 74.89% ± 6.05% (CI 72.7% - 77.2%), and a ROC AUC of 79.94% ± 7.80% (CI 76.9% - 82.8%).

The optimized RF model comprised 150 trees (n_estimators) with a maximum depth of 6 for each tree. The model was fine-tuned with min_samples_split set to 10, min_samples_leaf to 8, max_depth to 6, max_features to sqrt, and criterion to gini, ensuring that each leaf had sufficient samples to make a reliable prediction. The model exhibited an average accuracy of 77.34% ± 7.61% (CI 74.2% - 80.3%), sensitivity of 76.58% ± 10.7% (CI 72.7% - 81.8%), specificity of 78.1% ± 9.46% (CI 74.6% - 81.8%), precision of 78.21% ± 8.1% (CI 75.1% - 81.1%), F1 score of 77% ± 8.01% (CI 73.8% - 80.1%), ROC AUC of 85.88% ± 6.38% (CI 83.5% - 90.3%), oob_error of 0.2323 ± 0.0134, and test-error of 0.2266 ± 0.761.

The GB model was configured with 200 estimators and a learning rate 0.3. The maximum depth for each tree in the GB model was set to 5, with a minimum sample split of 30 and a minimum sample leaf of 10, maximum features set to sqrt, and subsample set to 0.9. The model's architecture, featuring a maximum tree depth of 5, a minimum sample split of 30 and a minimum sample leaf of 10, maximum features set to 'sqrt' and subsample set to 0.9, resulted in an average accuracy of 83.23% ± 6.23% (CI 80.8% - 85.6%), sensitivity of 81.74% ± 8.38% (CI 74.2% - 81.4%), specificity of 84.79% ± 7.75% (CI 81.8% - 88.0%), precision of 84.59% ± 7.26% (CI 81.6% - 87.3%), F1 score of 82.91% ± 6.57% (CI 80.3% - 85.4%), and a ROC AUC of 90.46% ± 5.22% (CI 88.0% - 93.7%).

Our SVM model utilized a radial basis function (RBF) kernel with a regularization parameter (C) of 0.75 and a gamma value 0.1. The SVM exhibited an average accuracy of 83.75% ± 5.39% (CI 81.6% - 86.0%), the sensitivity of 89.07% ± 6.21% (CI 86.3% - 90.9%), specificity of 78.44% ± 9.05% (CI 74.7% - 81.9%), the precision of 80.98% ± 6.65% (CI 78.5% - 83.4%), F1 score of 84.62% ± 4.94% (CI 82.7% - 86.6%), and a ROC AUC of 91.31% ± 4.62% (CI 89.5% - 93.1%).

As for the ESM, the model achieved an average accuracy of 84.49% ± 6.08% (CI 82.1% - 86.8%), sensitivity of 85.74% ± 7.53% (CI 85.7% - 90.5%), specificity of 83.30% ± 9.36% (CI 79.8% - 87.0%), precision of 84.29% ± 7.96% (CI 81.0% - 87.2%), F1 score of 84.70% ± 5.95% (CI 82.4% - 87.0%), and a ROC AUC of 92.08% ± 4.94% (CI 90.0% - 95.2%).

Lastly, EVM obtained an average accuracy of 82.19% ± 6.59% (CI 79.1% - 86.0%), the sensitivity of 81.02% ± 8.60% (CI 76.2% - 86.4%), specificity of 83.36% ± 9.10% (CI 77.3% - 90.5%), precision of 83.46% ± 8.00% (CI 80.5% - 86.5%), F1 score of 81.92% ± 6.72% (CI 77.3% - 86.4%), and a ROC AUC of 90.46% ± 4.08% (CI 88.9% - 92.1%).

These results illustrate each model’s capacity to effectively differentiate between PD patients and healthy controls, underpinning the utility of integrating advanced machine learning techniques in the analysis of complex voice data.

3.1 Comparison Across Models:

In our model performance comparison, significant statistical differences emerged. The ESM, SVM, and GB models consistently outperformed other models. ESM and SVM significantly outperformed the ANN model (p<0.001) in accuracy, with no significant differences between them (p=0.993 for ESM vs. SVM). Similarly, GB's performance was comparable to ESM and SVM, with no significant difference in accuracy (p=0.9292 for ESM vs. GB; p=0.9988 for GB vs. SVM). Regarding sensitivities, SVM significantly surpassed RF and ANN (p<0.001). ESM and SVM were comparable in sensitivity (p=0.384), as were ESM and GB (p=0.1906). In specificities, ESM, GB, and SVM all showed substantial improvements over ANN (p<0.001), with no significant difference between GB and ESM (p=0.967) or GB and SVM (p=0.0093).

GB and ESM were significantly better in precision than ANN (p<0.001). For F1 scores, ESM, SVM, and GB were superior to ANN (p<0.001), with no significant differences found between ESM and SVM (p=1.0) or between GB and SVM (p=0.783). The ROC AUC values also highlighted the more remarkable results of ESM, SVM, and GB, with no significant differences (p>0.7145 for all comparisons). Conversely, the Ensemble Voting Model (EVM) was statistically inferior in specificities compared to GB (p=0.05) and in F1 scores compared to RF (p=0.003). These results demonstrate the robust performance of ESM, SVM, and GB in PD diagnosis, outclassing other models in most metrics.

The results from our ML analysis are pivotal for clinical application, particularly for speech-language pathologists who focus on voice disorders in Parkinson's Disease. The enhanced diagnostic accuracy demonstrated by our models, particularly the SVM and Ensemble Methods, indicates that these tools can reliably identify early signs of Parkinson's Disease through routine voice assessments. This capability to detect subtle vocal changes before they become overtly apparent offers a significant advantage in early disease management, potentially allowing for earlier interventions that can alter the disease's progression and improve patient outcomes.

This study represents a significant advancement in integrating heterogeneous voice data from diverse sources for PD diagnosis via machine learning (ML) methods, overcoming challenges from diverse data analysis methods and variable selection constraints. Our methodology, incorporating a comprehensive dataset of 432 participants, exceeds most previous studies' PD patient sample sizes, e.g. (14, 16, 23, 29, 32, 33). This larger dataset enhances the generalizability and robustness of our findings, as also indicated in a recent review (29). Unlike most studies relying on single data sources (14, 16, 23, 29, 32, 33), our multi-source integration boosts the validity and reliability of our results.

In our study, 39 voice-related variables were compared between healthy controls and individuals with Parkinson's Disease (PD), as detailed in Table 1. We observed significant differences in several acoustic measures, emphasizing the sensitivity of vocal features to neuromuscular changes in PD. Notably, local perturbation measures such as local percentage jitter (locPctJitter), local absolute jitter (locAbsJitter), and relative average perturbation jitter (rapJitter) were elevated in PD patients, indicative of PD-related voice impairments due to irregular speech cycles (34).

Shimmer measures presented a mixed picture. While local shimmer (locShimmer) approached significance, suggesting potential elevations in PD patients likely due to vocal fold vibration irregularities (2, 22), local decibel shimmer (locDbShimmer) and three and eleven-point amplitude perturbation quotient shimmer (apq3Shimmer and apq11Shimmer) were higher in controls. This discrepancy could reflect compensatory mechanisms in PD patients, such as reduced vocal fold amplitude due to rigidity and bradykinesia, or intentional speech pattern adjustments to enhance clarity despite motor deficits (35). Additionally, nonlinear stability metrics, including recurrence period density entropy (RPDE), detrended fluctuation analysis (DFA), and pitch period entropy (PPE), were higher in controls, indicating more complex and stable vocal signals compared to the more simplified and unstable patterns in PD patients (36).

Variability was also evident in the Mel frequency cepstral coefficients (MFCCs) and their deltas, reflecting the heterogeneous impact of PD on speech characteristics. Specifically, controls exhibited more stable spectral shapes and more complex spectral variations in lower-order coefficients (MFCC 2 and 6), highlighting their relatively preserved speech dynamics. In contrast, PD patients generally showed greater fluctuations in higher-order spectral features (MFCCs 7–12), indicating more pronounced spectral alterations associated with the disease (37, 38). Regarding delta coefficients, which measure changes between consecutive MFCCs, PD patients typically exhibited more pronounced variations (delta coefficients 0–2, 7, 8, and 12), suggesting greater temporal variability in their speech patterns. Conversely, delta 11 was greater in controls, indicating that controls may experience more pronounced fluctuations in this specific spectral feature over time (37, 38).

These findings underscore the complexity of PD-related voice changes suggesting that traditional statistical methods alone may not fully capture these nuanced patterns, underscoring the potential of ML in this context (14, 16). For Speech-Language Pathologists (SLPs), the variables studies here are promising clinical assessment tools, but their diagnostic potential is maximized when integrated into ML frameworks. Machine learning models are adept at managing such variability and can effectively distinguish between individuals with and without the disease by learning from complex patterns in the data (39–41).

Consistent with the literature (16, 32, 33, 42), our findings affirm the superior performance of SVM models, aligning with the trends in PD diagnosis using voice analysis. Only a few studies showed different results (43) (44). Our model's accuracy and F1 scores for SVM, GB, and RF are comparable or superior to those reported in more extensive studies (38, 45). Notably, our Ensemble Stacking Model (ESM) exhibited precision and F1 scores surpassing Sakar et al. (2010), with our expanded dataset including 278 PD patients out of 432 participants.

In demonstrating superiority over the EVM, RF, and DNN models, the ESM and GB models, alongside the SVM, highlight the robustness of ensemble and individual ML methods in managing data heterogeneity—a frequent challenge in medical research. This underscores the transformative potential of ML in medical diagnostics, especially for conditions like PD, where early and accurate detection is crucial (46). After discussing the superior performance of the SVM and ESM models, one might consider how these findings translate to clinical practice. For SLPs, the high sensitivity of these models means that even subtle abnormalities in voice, which might not be discernible through standard auditory assessments, can be detected early, thereby enabling timely therapeutic interventions.

While our results are promising, the study's reliance on pre-existing datasets presents limitations, such as a restricted range of variables and potential biases inherent in the dataset composition. Future research should prioritize collecting more diverse and comprehensive data, allowing for a broader exploration of variables affecting PD diagnosis, as suggested by Sheikhi et al. (2022) (47). Furthermore, integrating clinical validation trials, as discussed in Dao et al. (2022) (48), is imperative to establish these ML approaches' real-world applicability and efficacy.

In conclusion, this study underscores the synergy of voice analysis and advanced ML in early PD detection and paves the way for developing noninvasive, cost-effective diagnostic tools. These tools can potentially revolutionize patient care by facilitating earlier intervention strategies (49).

Declaration of Competing Interest

The author declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding Statement

The research conducted by author Osmar Pinto Neto was supported by scholarships provided by the Anima Institute.

Declaration of Generative AI and AI-assisted technologies in the writing process

While preparing this work, the author used OpenAI's GPT-4 architecture to improve readability and language. After using this service, the author reviewed and edited the content as needed and takes full responsibility for the publication's content.

Suppa A, Costantini G, Asci F, Di Leo P, Al-Wardat MS, Di Lazzaro G et al (2022) Voice in Parkinson’s Disease: A Machine Learning Study. Front Neurol [Internet]. [citado 27 de outubro de 2023];13. Disponível em https://www.frontiersin.org/articles/ 10.3389/fneur.2022.831428
Voice Analysis for Diagnosis and Monitoring Parkinson’s Disease | SpringerLink [Internet]. [citado 27 de outubro de 2023]. Disponível em: https://link.springer.com/chapter/10.1007/978-981-16-3056-9_8
The Diagnostic Process - Improving Diagnosis in Health Care - NCBI Bookshelf [Internet]. [citado 27 de outubro de 2023]. Disponível em: https://www.ncbi.nlm.nih.gov/books/NBK338593/
Rios-Urrego CD, Rusz J, Orozco-Arroyave JR (2024) Automatic speech-based assessment to discriminate Parkinson’s disease from essential tremor with a cross-language approach. Npj Digit Med 17 de fevereiro de 7(1):1–11
Faragó P, Ștefănigă SA, Cordoș CG, Mihăilă LI, Hintea S, Peștean AS et al (2023) CNN-Based Identification of Parkinson’s Disease from Continuous Speech in Noisy Environments. Bioeng maio de 10(5):531
Murali M, ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA, ASSOCIATED WITH PARKINSON’S DISEASE [Internet] [Thesis] (2023). Queen Margaret University, Edinburgh; [citado 5 de abril de 2024]. Disponível em: https://eresearch.qmu.ac.uk/handle/20.500.12289/13266
Bherav UK Computer Science and Engineering
A Review of Artificial Intelligence’s Neural Networks (Deep Learning) Applications in Medical Diagnosis and Prediction | IEEE Journals & Magazine | IEEE Xplore [Internet]. [citado 27 de outubro de 2023]. Disponível em: https://ieeexplore.ieee.org/document/9464112
An application of neural networks for distinguishing gait patterns on the basis of hip-knee joint angle diagrams - ScienceDirect [Internet]. [citado 27 de outubro de 2023]. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0966636296010703
Neural networks for detection and classification of walking pattern changes due to ageing | SpringerLink [Internet]. [citado 27 de outubro de 2023]. Disponível em: https://link.springer.com/article/10.1007/BF03178892
Rustempasic I, Can M Diagnosis of Parkinson’s Disease using Fuzzy C-Means Clustering and Pattern Recognition. SOUTHEAST Eur J SOFT Comput. 1^o de março de 2013;2
Hendricks RM, Khasawneh MT An Investigation into the Use and Meaning of Parkinson’s Disease Clinical Scale Scores. Park Dis [Internet]. 2021 [citado 27 de outubro de 2023];2021. Disponível em: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8179766/
Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinsons disease. IEEE Trans Biomed Eng maio de 59(5):1264–1271
Yuan L, Liu Y, Feng HM Parkinson disease prediction using machine learning-based features from speech signal. Serv Oriented Comput Appl [Internet]. 27 de junho de 2023 [citado 28 de outubro de 2023]; Disponível em: https://doi.org/10.1007/s11761-023-00372-w
Thanoun M, Yaseen M, Aleesa A Development of Intelligent Parkinson Disease Detection System Based on Machine Learning Techniques Using Speech Signal. Int J Adv Sci Eng Inf Technol. 28 de fevereiro de 2021;11.
Alshammri R, Alharbi G, Alharbi E, Almubark I (2023) Machine learning approaches to identify Parkinson’s disease using voice signal features. Front Artif Intell [Internet]. [citado 19 de janeiro de 2024];6. Disponível em https://www.frontiersin.org/articles/ 10.3389/frai.2023.1084001
Iyer A, Kemp A, Rahmatallah Y, Pillai L, Glover A, Prior F et al (2023) A machine learning method to process voice samples for identification of Parkinson’s disease. Sci Rep 23 de novembro de 13(1):20615
Shetty P, Pereira B, Dua R, Singh DS (2024) Classification of Parkinson’s using SVM. IRE J 10 de fevereiro de 7(8):171–174
Fukuoka Y (2002) Artificial Neural Networks in Medical Diagnosis. In: Schmitt M, Teodorescu HN, Jain A, Jain A, Jain S, Jain LC (eds) organizadores. Computational Intelligence Processing in Medical Diagnosis [Internet]. Studies in Fuzziness and Soft Computing. Physica- HD, Em, pp 197–228. [citado 27 de outubro de 2023] https://doi.org/10.1007/978-3-7908-1788-1_8Disponível em
Sakar C (2018) Serbes,Gorkem, Gunduz,Aysegul, Nizam,Hatice, and Sakar,Betul. Parkinson’s Disease Classification. UCI Machine Learning Repository
A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform - ScienceDirect [Internet]. [citado 28 de outubro de 2023]. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S1568494618305799?via%3Dihub
Voice Samples for Patients with (2023) Parkinson’s Disease and Healthy Controls [Internet]. figshare; [citado 22 de janeiro de 2024]. Disponível em: https://figshare.com/articles/dataset/Voice_Samples_for_Patients_with_Parkinson_s_Disease_and_Healthy_Controls/23849127/1
Younis Thanoun M (2021) T. YASEEN M. A Comparative Study of Parkinson Disease Diagnosis in Machine Learning. Em: Proceedings of the 4th International Conference on Advances in Artificial Intelligence [Internet]. New York, NY, USA: Association for Computing Machinery; [citado 28 de outubro de 2023]. pp. 23–8. (ICAAI ’20). Disponível em: https://doi.org/10.1145/3441417.3441425
Polat K, Nour M (2020) Parkinson disease classification using one against all based data sampling with the acoustic features from the speech signals. Med Hypotheses. 1^o de julho de. ;140:109678
Mittal V, Sharma RK (2021) Machine learning approach for classification of Parkinson disease using acoustic features. J Reliab Intell Environ 1^o de setembro de 7(3):233–239
Yasar A, Saritas I, Sahman MA, Cinar AC (2019) Classification of Parkinson disease data with artificial neural networks. IOP Conf Ser Mater Sci Eng novembro de 675(1):012031
Liu W, Liu J, Peng T, Wang G, Balas VE, Geman O et al (2023) Prediction of Parkinson’s disease based on artificial neural networks using speech datasets. J Ambient Intell Humaniz Comput 1^o de outubro de 14(10):13571–13584
Ali L, Khan SU, Arshad M, Ali S, Anwar M A Multi-model Framework for Evaluating Type of Speech Samples having Complementary Information about Parkinson’s Disease. Em: 2019 International Conference on Electrical, Communication, and Computer Engineering (ICECCE) [Internet]. 2019 [citado 28 de outubro de 2023]. pp. 1–5. Disponível em: https://ieeexplore.ieee.org/abstract/document/8940696
Rana A, Dumka A, Singh R, Panda MK, Priyadarshi N, Twala B Imperative Role of Machine Learning Algorithm for Detection of Parkinson’s Disease: Review, Challenges and Recommendations. Diagnostics. 19 de agosto de 2022;12(8):2003
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Res 1^o de novembro de 12(null):2825–2830
Seabold S, Perktold J, Statsmodels (2010) Econometric and Statistical Modeling with Python. Proc 9th Python Sci Conf. ;92–96
Karapinar Senturk Z (2020) Early diagnosis of Parkinson’s disease using machine learning algorithms. Med Hypotheses maio de 138:109603
Lahmiri S, Dawson DA, Shmuel A (2018) Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures. Biomed Eng Lett fevereiro de 8(1):29–39
Parsapoor M (2023) Synthetic Data Generation Techniques for Developing AI-based Speech Assessments for Parkinson’s Disease (A Comparative Study) [Internet]. arXiv; [citado 12 de abril de 2024]. Disponível em: http://arxiv.org/abs/2312.02229
Bologna M, Espay AJ, Fasano A, Paparella G, Hallett M, Berardelli A (2023) Redefining Bradykinesia. Mov Disord 38(4):551–557
Abdurrahman G, Sintawati M (2020) Implementation of xgboost for classification of parkinson’s disease. J Phys Conf Ser maio de 1538(1):012024
Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE et al (2019) A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl Soft Comput janeiro de 74:255–263
Sayed MA, Cao DM, Islam MT, Tayaba M, Pavel MEUI, Mia MT et al (2023) Parkinson’s Disease Detection through Vocal Biomarkers and Advanced Machine Learning Algorithms. J Comput Sci Technol Stud 2 de dezembro de 5(4):142–149
An interpretable model based on graph learning for diagnosis of Parkinson’s disease with voice-related EEG | npj Digital Medicine [Internet]. [citado 11 de abril de 2024]. Disponível em: https://www.nature.com/articles/s41746-023-00983-9
Kumar DM, Arthi R, Rajeev A, Ranjith A, Murali A K A. Early Detection of Parkinsons Using Machine Learning. Em: 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC) [Internet]. 2024 [citado 11 de abril de 2024]. pp. 562–5. Disponível em: https://ieeexplore.ieee.org/abstract/document/10481533
Sztahó D, Jenei AZ, Valálik I, Vicsi K (2022) The Effect of Speech Fragmentation and Audio Encodings on Automatic Parkinson’s Disease Recognition. J Biomed Sci Eng 6 de janeiro de 15(1):6–25
Marar S, Swain D, Hiwarkar V, Motwani N, Awari A (2018) Predicting the occurrence of Parkinson’s Disease using various Classification Models. 2018 Int Conf Adv Comput Telecommun ICACAT. dezembro de. ;1–5
Rao DV, Sucharitha Y, Venkatesh D, Mahamthy K, Yasin SM Diagnosis of Parkinson’s Disease using Principal Component Analysis and Machine Learning algorithms with Vocal Features. Em: 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS) [Internet]. 2022 [citado 22 de janeiro de 2024]. pp. 200–6. Disponível em: https://ieeexplore.ieee.org/document/9760962
Tracy JM, Özkanca Y, Atkins DC, Hosseini Ghomi R (2020) Investigating voice as a biomarker: Deep phenotyping methods for early detection of Parkinson’s disease. J Biomed Inf abril de 104:103362
Mamun M, Mahmud MI, Hossain MI, Islam AM, Ahammed MS, Uddin MM Vocal Feature Guided Detection of Parkinson’s Disease Using Machine Learning Algorithms. Em: 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) [Internet]. 2022 [citado 22 de janeiro de 2024]. pp. 0566–72. Disponível em: https://ieeexplore.ieee.org/document/9965732
Sheikhi S, Kheirabadi MT (2022) An Efficient Rotation Forest-Based Ensemble Approach for Predicting Severity of Parkinson’s Disease. J Healthc Eng 2022:5524852
Dao SVT, Yu Z, Tran LV, Phan PNK, Huynh TTM, Le TM An Analysis of Vocal Features for Parkinson’s Disease Classification Using Evolutionary Algorithms. Diagnostics. 16 de agosto de 2022;12(8):1980
El-Habbak OM, Abdelalim AM, Mohamed NH, Abd-Elaty HM, Hammouda MA, Mohamed YY et al Enhancing Parkinson’s Disease Diagnosis Accuracy Through Speech Signal Algorithm Modeling. | Computers, Materials & Continua | EBSCOhost [Internet]. Vol. 70. 2022 [citado 22 de janeiro de 2024]. p. 2953. Disponível em: https://openurl.ebsco.com/contentitem/doi:10.32604%2Fcmc.2022.020109?sid=ebsco:plink:crawler&id=ebsco:doi:10.32604%2Fcmc.2022.020109

Table 1: Means and Standard Deviations (SD) for Voice-Related Variables in Control and Parkinson's Groups

This table summarizes the mean and standard deviation values for each voice-related variable analyzed in the study, categorized by the control group and the Parkinson's group. F-tests were conducted to assess the equality of variances between the two groups, with p-values less than 0.05 indicating significant differences. Independent t-tests were then used to evaluate the mean differences between groups. T-test results were adjusted for multiple comparisons using the Bonferroni correction method, with an adjusted significance threshold set at p<0.0013. Variables displaying p-values below this threshold after correction are marked with an asterisk (*) to denote significant differences between the control and Parkinson's groups.

	Control		Parkinson's
Variables	Mean	SD	Mean	SD	F-test	t-test
locPctJitter	0.474	0.347	0.619	0.342	0.816	0.0001*
locAbsJitter	0.293	0.333	0.497	0.332	0.955	<0.0001*
rapJitter	0.192	0.228	0.356	0.254	0.147	<0.0001*
ppq5Jitter	164.303	198.042	214.859	157.715	0.001	0.0085
locShimmer	163.794	197.382	213.952	157.284	0.001	0.0088
locDbShimmer	0.259	0.314	0.156	0.263	0.014	0.0009*
apq3Shimmer	0.015	0.018	0.009	0.016	0.024	0.0007*
apq5Shimmer	0.019	0.023	0.012	0.018	0.001	0.0023
apq11Shimmer	0.024	0.031	0.013	0.024	<0.001	0.0002*
RPDE	18.218	30.108	7.806	19.107	<0.001	0.0002*
DFA	19.516	32.458	8.315	20.48	<0.001	0.0002*
PPE	22.547	37.421	9.786	23.934	<0.001	0.0003*
GNE_mean	4624.49	11163.8	3843.85	17244.4	<0.001	0.5783
MFCC_0th_coef	-32.793	115.099	-20.263	85.306	<0.001	0.2509
MFCC_1st_coef	-0.922	1.756	-0.496	1.626	0.284	0.014
MFCC_2nd_coef	0.651	0.698	0.359	0.578	0.009	<0.0001*
MFCC_3rd_coef	-0.157	0.56	-0.022	0.386	<0.001	0.0104
MFCC_4th_coef	0.097	0.722	0.096	0.509	<0.001	0.981
MFCC_5th_coef	0.918	0.536	0.89	0.365	<0.001	0.5779
MFCC_6th_coef	0.441	0.634	0.236	0.48	<0.001	0.0008*
MFCC_7th_coef	9.52	10.633	12.81	9.249	0.052	0.0012*
MFCC_8th_coef	33.615	37.007	48.664	32.514	0.071	<0.0001*
MFCC_9th_coef	35.142	38.895	52.703	34.841	0.125	<0.0001*
MFCC_10th_coef	34.362	38.15	51.068	33.92	0.102	<0.0001*
MFCC_11th_coef	312.036	369.061	433.536	308.064	0.012	0.0008*
MFCC_12th_coef	577.255	659.629	845.923	568.087	0.037	<0.0001*
Delta_MFCC_0th	1116.66	1280.85	1796.577	1221.88	0.508	<0.0001*
Delta _MFCC_1st	1562.23	1775.16	2477.09	1654.46	0.326	<0.0001*
Delta _MFCC_2nd	88.511	193.978	152.808	183.227	0.425	0.0009*
Delta _MFCC_3rd	138.707	252.147	204.525	271.492	0.323	0.0164
Delta _MFCC_4th	282.561	510.67	351.462	448.593	0.071	0.1569
Delta _MFCC_5th	255.194	460.048	337.85	376.874	0.005	0.0651
Delta _MFCC_6th	0.802	0.585	0.804	0.381	<0.001	0.9607
Delta _MFCC_7th	10.951	19.052	24.76	27.76	<0.001	<0.0001*
Delta _MFCC_8th	1.715	3.287	3.666	5.004	<0.001	<0.0001*
Delta _MFCC_9th	0.921	0.692	0.918	0.463	<0.001	0.9633
Delta _MFCC_10th	0.49	0.616	0.333	0.397	<0.001	0.0058
Delta _MFCC_11th	0.476	0.611	0.289	0.407	<0.001	0.0011*
Delta _MFCC_12th	735585	847445	1071354	739984	0.059	<0.0001*

The authors declare no competing interests.

Download PDF

Journal Publication

published 30 Apr, 2024

Read the published version in Journal of Voice →

Version 3

posted

You are reading this latest preprint version

Harnessing Voice Analysis and Machine Learning for Early Diagnosis of Parkinson's Disease: A Comparative Study Across Three Datasets

Status:

Journal Publication

Version 3

Abstract

Objective

Methods

Results

Conclusion

1. Introduction

2. Material and Methods

2.1 Data Acquisition

2.2. Voice collection protocol

2.3 Data Preparation and Preprocessing

2.4 Machine Learning Analysis:

2.5 Metrics and Statistical Analysis

3. Results

4. Discussion

Declarations

References

Tables

Additional Declarations

Status:

Journal Publication

Version 3