Abstract
Voice Disorder or Dysphonia has caught the attention of audio signal process engineers and researchers. The efficiency of several feature extraction and classifier implementation techniques in identifying voice abnormalities has been investigated. Mel-Frequency Cepstral Coefficient (MFCC) has been extensively used as a feature extractor. This paper adopts a Comparative Review Method to assess the effectiveness of feature extraction and classifier methods in detecting voice disorders. By examining the pairing of the Mel-Frequency Cepstral Coefficient (MFCC) with various classifiers, including Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Tree (DT), and other online or commercial classifiers, the study aims to review the robustness of MFCC in this context. The study also recognizes the significance of choosing the right database in light of the various aetiologies of pathological illnesses and its possible influence on the efficacy of voice disorder detection.
Similar content being viewed by others
Data availability
Not applicable.
Abbreviations
- MFCC:
-
Mel-Frequency Cepstral Coefficient
- SVM:
-
Support Vector Machine
- ANN:
-
Artificial Neural Network
- DT:
-
Decision Tree
- FT:
-
Fourier Transform
- DWT:
-
Discrete Wavelet Transform
- DFT:
-
Discrete Fourier Transform
- OSLEM:
-
Online Sequential Extreme Learning Machine
- MDVP:
-
Multi-Dimensional Voice Parameter
- HNR:
-
Harmonic-to-Noise Ratio
- LDA:
-
Linear Discriminant Analysis
- KNN:
-
K-Nearest Neighbour
- DNN:
-
Deep Neural Network
- DCT:
-
Discrete Cosine Transform
- HTK:
-
Hidden Markov Model Toolkit
- ZCR:
-
Zero Crossing Rate
- SVD:
-
Saarbruecken Voice Database
- AVPD:
-
Arabic Voice Pathology Database
- MEEI:
-
Massachusetts Eye and Ear Infirmary Voice Disorders Database
- M.S.S.:
-
Manjit Singh Sidhu
- N.A.A.L.:
-
Nur Atiqah Abdul Latib
- K.K.S.:
-
Kirandeep Kaur Sidhu
References
Abdul ZK, Al-Talabani AK (2022) Mel frequency cepstral coefficient and its applications: a review. IEEE Access 10:122136–122158. https://doi.org/10.1109/ACCESS.2022.3223444
Abdulmajeed NQ, Al-Khateeb B, Mohammed MA (2022) A review on voice pathology: taxonomy, diagnosis, medical procedures and detection techniques, open challenges, limitations, and recommendations for future directions. J Intell Syst 31(1):855–875. https://doi.org/10.1515/jisys-2022-0058
Aghaei F, Khoramshahi H, Biparva S (2022) Psychometric characteristics of different versions of vocal tract discomfort (VTD) scale: a systematic review. Iran J Public Health 51:37–47. https://doi.org/10.18502/ijph.v51i1.8290
AL-Dhief FT, Latiff NMA, Malik NNNA et al (2020) Voice pathology detection using machine learning technique. In: 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT). IEEE Xplore, pp 99–104. https://doi.org/10.1109/ISTT50966.2020.9279346
Al-Dheif FT, Latiff NMA, Baki MM et al (2021) Voice pathology detection using support vector machine based on different number of voice signals. In: 2021 26th IEEE Asia-Pacific Conference on Communications (APCC). IEEE Xplore, pp 1–6. https://doi.org/10.1109/APCC49754.2021.9609830
Al-Dhief FT, Latiff NMA, Malik NNNA et al (2022) Dysphonia detection based on voice signals using naive bayes classifier. In: 2022 IEEE 6th International Symposium on Telecommunication Technologies (ISTT). IEEE Xplore, pp 56–61. https://doi.org/10.1109/ISTT56288.2022.9966535
Al-Dhief FT, Latiff NMA, Malik NNNA et al (2020) A survey of voice pathology surveillance systems based on internet of things and machine learning algorithms. IEEE Access 8:64514–64533. https://doi.org/10.1109/access.2020.2984925
Ali Z, Alsulaiman M, Muhammad G, Elamvazuthi I, Al-Nasheri A, Mesallam TA, ..., Malki KH (2017) Intra-and inter-database study for Arabic, English, and German databases: do conventional speech features detect voice pathology? J Voice 31(3):386–e1. https://doi.org/10.1016/j.jvoice.2016.09.009
Altaf A, Mahdin H, Maskat R, Shaharudin SM, Altaf A, Mahmood A (2023) A novel voice feature AVA and its application to the pathological voice detection through machine learning. Int J Adv Comput Sci Appl 14(9). https://doi.org/10.14569/IJACSA.2023.01409113
Altayeb M, Al-Ghraibah A (2022) Classification of three pathological voices based on specific features groups using support vector machine. Int J Electr Comput Eng (IJECE) 12:946. https://doi.org/10.11591/ijece.v12i1.pp946-956
Amara F, Fezari M (2014) Voice pathologies classification using GMM and SVM classifiers. In: Proceedings of Proceedings of the 2013 International Conference on Biology, Medical Physics, Medical Chemistry, Biochemistry and Biomedical Engineering. DEStech Publications, pp 65–69 https://doi.org/10.13140/RG.2.1.1857.7441
American Speech-Language-Hearing Association [ASHA] (1993) Definitions of communication disorders and variations. https://doi.org/10.1044/policy.rp1993-00208
AnilKumar V, Reddy RVS (2023) Classification of voice pathology using different features and Bi-LSTM. 2023 International Conference on Smart Systems for applications in Electrical Sciences (ICSSES), Tumakuru, India, pp 1–4. https://doi.org/10.1109/ICSSES58299.2023.10200529
Asmae O, Abdelhadi R, Bouchaib C et al (2020) Parkinson’s disease identification using KNN and ANN algorithms based on voice disorder. In: 2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE Xplore, pp 1–6 https://doi.org/10.1109/IRASET48871.2020.9092228
Basalamah A, Hasan M, Bhowmik S, Akib Shahriyar S (2023) A highly accurate dysphonia detection system using linear discriminant analysis. Comput Syst Sci Eng 44:1921–1938. https://doi.org/10.32604/csse.2023.027399
Behlau M, Zambon F, Guerrieri AC, Roy N (2012) Epidemiology of voice disorders in teachers and nonteachers in Brazil: prevalence and adverse effects. J Voice 26:665.e9–665.e18. https://doi.org/10.1016/j.jvoice.2011.09.010
Bhangale KB, Titare P, Pawar R, Bhavsar S (2018) Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J Eng (IOSRJEN) 8(6):55–62
Bhattacharyya N (2014) The prevalence of voice problems among adults in the United States. Laryngoscope 124:2359–2362. https://doi.org/10.1002/lary.24740
Bhattarai K, Prasad PWC, Alsadoon A et al (2017) Experiments on the MFCC application in speaker recognition using MATLAB. In: 2017 Seventh International Conference on Information Science and Technology (ICIST). IEEE Xplore, pp 32–37. https://doi.org/10.1109/ICIST.2017.7926796
Boone DR, Mcfarlane SC, Von SL, Zraick RI (2020) The voice and voice therapy. Pearson, Hoboken
Boualoulou N, Belhoussine Drissi T, Nsiri B (2022) An intelligent approach based on the combination of the discrete wavelet transform, delta delta MFCC for Parkinson’s disease diagnosis. Int J Adv Comput Sci Appl 13. https://doi.org/10.14569/ijacsa.2022.0130466
Boualoulou N, Belhoussine Drissi T, Nsiri B (2024) Comparison of feature extraction methods between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease diagnosis. In: Joby PP, Alencar MS, Falkowski-Gilski P (eds) IoT Based Control Networks and Intelligent Systems. ICICNIS 2023. Lecture notes in networks and systems, vol 789. Springer, Singapore. https://doi.org/10.1007/978-981-99-6586-1_16
Charbuty B, Abdulazeez A (2021) Classification based on Decision Tree algorithm for machine learning. J Appl Sci Technol Trends 2:20–28. https://doi.org/10.38094/jastt20165
Chatterjee S (2019) An optimized music recognition system using mel-frequency cepstral coefficient (MFCC) and vector quantization (VQ). Research Directions: Special Issue International Business Research Conference on Transformation Opportunities and Sustainability Challenges in Technology and Management, pp 100–106
Coelho S, Shashirekha HL (2023) Identification of Voice disorders: a comparative study of machine learning algorithms. In: Karpov A, Samudravijaya K, Deepak KT, Hegde RM, Agrawal SS, Prasanna SRM (eds) Speech and Computer. SPECOM 2023. Lecture notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_45
Dankovičová Z, Sovák D, Drotár P, Vokorokos L (2018) Machine learning approach to dysphonia detection. Appl Sci 8:1927. https://doi.org/10.3390/app8101927
Darouiche MS, Moubtahij HE, Yakhlef MB, Tazi EB (2022) An automatic voice disorder detection system based on extreme gradient boosting classifier. In: 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE, pp 1–5. https://doi.org/10.1109/IRASET52964.2022.9737980
Degila K, Errattahi R, Hannani AE (2019) The UCD System for the 2018 FEMH voice data challenge. In: 2018 IEEE International Conference on Big Data (Big Data). IEEE Xplore, pp 5242–5246. https://doi.org/10.1109/BigData.2018.8622604
Gayathri S, Priya E (2022) Identification of voice pathology from temporal and cepstral features for vowel a low intonation. In: 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC). IEEE Xplore, pp 345–350. https://doi.org/10.1109/ICESIC53714.2022.9783484
Goyal J, Khandnor P, Aseri TC (2020) A comparative analysis of machine learning classifiers for Dysphonia-based classification of Parkinson’s Disease. Int J Data Sci Analytics 11:69–83. https://doi.org/10.1007/s41060-020-00234-0
Hadjaidji E, Korba MCA, Khelil K (2021) Spasmodic dysphonia detection using machine learning classifiers. In: 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI). IEEE Xplore, pp 1–5. https://doi.org/10.1109/ICRAMI52622.2021.9585920
Hawi S, Alhozami J, AlQahtani R et al (2022) Automatic Parkinson’s disease detection based on the combination of long-term acoustic features and Mel frequency Cepstral coefficients (MFCC). Biomed Signal Process 78:104013
Ilapakurti A, Kedari S, Vuppalapati JS et al (2019) Artificial Intelligent (AI) clinical edge for voice disorder detection. In: 2019 IEEE Fifth International Conference on Big Data Computing Service and Applications (BigDataService). IEEE Xplore, pp 340–345. https://doi.org/10.1109/BigDataService.2019.00060
Kadiri SR, Alku P (2020) Analysis and detection of pathological voice using glottal source features. IEEE J Selec Topics Signal Process 14:367–379. https://doi.org/10.1109/jstsp.2019.2957988
Mcloughlin I (2016) Speech and audio processing: a MATLAB®-based approach. Cambridge University Pres, Cops, Cambridge. https://doi.org/10.1017/CB09781316084205.005
Mesallam TA, Farahat M, Malki KH, Alsulaiman M, Ali Z, Al-Nasheri A, Muhammad G (2017) Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthc Eng 2017:1. https://doi.org/10.1155/2017/8783751
Milani MGM, Ramashini M, Krishani M (2021) A real-time application to detect human voice disorders. In: 2020 International Conference on Decision Aid Sciences and Application (DASA). IEEE Xplore, pp 979–984. https://doi.org/10.1109/DASA51403.2020.9317268
Oates J, Winkworth A (2008) Current knowledge, controversies and future directions in hyperfunctional voice disorders. Int J Speech Lang Pathol 10:267–277. https://doi.org/10.1080/17549500802140153
Owida HA, Al-Ghraibah A, Altayeb M (2021) Classification of chest x-ray images using Wavelet and MFCC Features and Support Vector Machine Classifier. Eng Technol Appl Sci Res 11:7296–7301. https://doi.org/10.48084/etasr.4123
Paniagua MS, Pérez CJ, Calle-Alonso F, Salazar C (2020) An acoustic-signal-based preventive program for university lecturers’ vocal health. J Voice 34(1):88–99. https://doi.org/10.1016/j.jvoice.2018.05.011
Peng X, Xu H, Liu J et al (2023) Voice Disorder classification using convolutional neural network based on deep transfer learning. Sci Rep 13:7264. https://doi.org/10.1038/s41598-023-34461-9
Phyland D, Miles A (2019) Occupational voice is a work in progress. Curr Opin Otolaryngol Head Neck Surg 27:439–447. https://doi.org/10.1097/moo.0000000000000584
Pittala RB, Tejopriya BR, Pala E (2022) Study of speech recognition using CNN. In: 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS). IEEE Xplore, pp 150–155. https://doi.org/10.1109/ICAIS53314.2022.9743083
Putzer M, Barry WJ Saarbruecken Voice Database. In: Saarbruecken Voice Database - SPSC @ TU Graz. https://www.spsc.tugraz.at/databases-and-tools/saarbruecken-voice-database.html. Accessed 10 Feb 2024
Ramoo D (2021) 2.2 the articulatory system. BCcampus Pressbooks
Ranjan R, Thakur A (2019) Analysis of feature extraction techniques for speech recognition system. Int J Innovative Technol Exploring Eng (IJITEE) 8:197–200
Reddy MK, Alku P (2021) A comparison of Cepstral features in the detection of pathological voices by varying the input and filter bank of the cepstrum computation. IEEE Access 9:135953–135963. https://doi.org/10.1109/access.2021.3117665
Roy N, Merrill RM, Gray SD, Smith EM (2005) Voice disorders in the general population: prevalence, risk factors, and occupational impact. Laryngoscope 115:1988–1995. https://doi.org/10.1097/01.mlg.0000179174.32345.41
Sakar BE, Isenkul ME, Sakar CO et al (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomedical Health Inf 17:828–834. https://doi.org/10.1109/JBHI.2013.2245674
Simberg S, Sala E, Tuomainen J et al (2006) The effectiveness of group therapy for students with mild voice disorders: a controlled clinical trial. J Voice 20:97–109. https://doi.org/10.1016/j.jvoice.2005.01.002
Strang G (1999) The discrete cosine transform. SIAM Rev 41:135–147
Syed SA, Rashid M, Hussain S et al (2021) Inter classifier comparison to detect voice pathologies. Math Biosci Eng 18:2258–2273. https://doi.org/10.3934/mbe.2021114
Taoufiq BD, Soumaya Z, Benayad N, Nouhaila B (2022) Cepstral coefficient extraction using the MFCC with the discrete wavelet transform for the Parkinson’s Disease diagnosis. Int J Eng Trends Technol 70:283–290. https://doi.org/10.14445/22315381/ijett-v70i7p229
Thibeault S, Colton RH, Leonard R (2011) Understanding voice problems: a physiological perspective for diagnosis and treatment. Wolters Kluwer/Lippincott Williams & Wilkins, Philadelphia
Tirronen S, Kadiri SR, Alku P (2022) The effect of the MFCC frame length in automatic voice pathology detection. J Voice. https://doi.org/10.1016/j.jvoice.2022.03.021
van Houtte E, Claeys S, Wuyts F, van Lierde K (2012) Voice disorders in teachers: occupational risk factors and psycho-emotional factors. Logoped Phoniatr Vocol 37:107–116. https://doi.org/10.3109/14015439.2012.660499
Velardo V (2020) Mel-frequency cepstral coefficients explained easily presentation slides. In: Audio Signal Processing for Machine Learning. https://www.youtube.com/watch?v=4_SH2nfbQZ8&t=1563s. Accessed 23 Apr 2023
Verma V, Benjwal A, Chhabra A et al (2023) A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection. Sci Rep 13:22719. https://doi.org/10.1038/s41598-023-49869-6
Vimal W (2022) Study on the behaviour of Mel Frequency Cepstral Coefficient algorithm for different windows. In: 2022 International Conference on Innovative Trends in Information Technology (ICITIIT). IEEE Xplore, pp 1–6. https://doi.org/10.1109/ICITIIT54346.2022.9744231
Vinod H, Sharma RK, Shandilya R (2018) Dysphonic voice detection using MDVP parameters and computer science. In: 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS). IEEE Xplore, pp 1–4. https://doi.org/10.1109/SCEECS.2018.8546882
Yin H, Hohmann V, Nadeu C (2011) Acoustic features for speech recognition based on Gammatone Filterbank and instantaneous frequency. Speech Commun 53:707–715. https://doi.org/10.1016/j.specom.2010.04.008
Zakariah M, Ajmi Alotaibi BR et al (2022) An analytical study of speech pathology detection based on MFCC and deep neural networks. Comput Math Methods Med 2022:1–15. https://doi.org/10.1155/2022/7814952
Acknowledgements
This work is supported by Yayasan Canselor Uniten (YCU) Seeding Fund No. 202210016YCU and Innovation & Research Management Centre (iRMC).
Funding
This work received funding from Yayasan Canselor Uniten (YCU) Seeding Fund No. 202210016YCU and Innovation & Research Management Centre (iRMC).
Author information
Authors and Affiliations
Contributions
Conceptualization: N.A.A.L. and M.S.S.; Literature search: N.A.A.L.; Data analysis: N.A.A.L. and K.K.S.; Writing -original draft preparation: N.A.A.L.; Writing -review and editing: M.S.S. and K.K.S; Writing- final draft: N.A.A.L. and K.K.S; Supervision: M.S.S. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sidhu, M.S., Latib, N.A.A. & Sidhu, K.K. MFCC in audio signal processing for voice disorder: a review. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19253-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19253-1