Skip to main content
Log in

MFCC in audio signal processing for voice disorder: a review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Voice Disorder or Dysphonia has caught the attention of audio signal process engineers and researchers. The efficiency of several feature extraction and classifier implementation techniques in identifying voice abnormalities has been investigated. Mel-Frequency Cepstral Coefficient (MFCC) has been extensively used as a feature extractor. This paper adopts a Comparative Review Method to assess the effectiveness of feature extraction and classifier methods in detecting voice disorders. By examining the pairing of the Mel-Frequency Cepstral Coefficient (MFCC) with various classifiers, including Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Tree (DT), and other online or commercial classifiers, the study aims to review the robustness of MFCC in this context. The study also recognizes the significance of choosing the right database in light of the various aetiologies of pathological illnesses and its possible influence on the efficacy of voice disorder detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Not applicable.

Abbreviations

MFCC:

Mel-Frequency Cepstral Coefficient

SVM:

Support Vector Machine

ANN:

Artificial Neural Network

DT:

Decision Tree

FT:

Fourier Transform

DWT:

Discrete Wavelet Transform

DFT:

Discrete Fourier Transform

OSLEM:

Online Sequential Extreme Learning Machine

MDVP:

Multi-Dimensional Voice Parameter

HNR:

Harmonic-to-Noise Ratio

LDA:

Linear Discriminant Analysis

KNN:

K-Nearest Neighbour

DNN:

Deep Neural Network

DCT:

Discrete Cosine Transform

HTK:

Hidden Markov Model Toolkit

ZCR:

Zero Crossing Rate

SVD:

Saarbruecken Voice Database

AVPD:

Arabic Voice Pathology Database

MEEI:

Massachusetts Eye and Ear Infirmary Voice Disorders Database

M.S.S.:

Manjit Singh Sidhu

N.A.A.L.:

Nur Atiqah Abdul Latib

K.K.S.:

Kirandeep Kaur Sidhu

References

  1. Abdul ZK, Al-Talabani AK (2022) Mel frequency cepstral coefficient and its applications: a review. IEEE Access 10:122136–122158. https://doi.org/10.1109/ACCESS.2022.3223444

    Article  Google Scholar 

  2. Abdulmajeed NQ, Al-Khateeb B, Mohammed MA (2022) A review on voice pathology: taxonomy, diagnosis, medical procedures and detection techniques, open challenges, limitations, and recommendations for future directions. J Intell Syst 31(1):855–875. https://doi.org/10.1515/jisys-2022-0058

    Article  Google Scholar 

  3. Aghaei F, Khoramshahi H, Biparva S (2022) Psychometric characteristics of different versions of vocal tract discomfort (VTD) scale: a systematic review. Iran J Public Health 51:37–47. https://doi.org/10.18502/ijph.v51i1.8290

    Article  Google Scholar 

  4. AL-Dhief FT, Latiff NMA, Malik NNNA et al (2020) Voice pathology detection using machine learning technique. In: 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT). IEEE Xplore, pp 99–104. https://doi.org/10.1109/ISTT50966.2020.9279346

  5. Al-Dheif FT, Latiff NMA, Baki MM et al (2021) Voice pathology detection using support vector machine based on different number of voice signals. In: 2021 26th IEEE Asia-Pacific Conference on Communications (APCC). IEEE Xplore, pp 1–6. https://doi.org/10.1109/APCC49754.2021.9609830

  6. Al-Dhief FT, Latiff NMA, Malik NNNA et al (2022) Dysphonia detection based on voice signals using naive bayes classifier. In: 2022 IEEE 6th International Symposium on Telecommunication Technologies (ISTT). IEEE Xplore, pp 56–61. https://doi.org/10.1109/ISTT56288.2022.9966535

  7. Al-Dhief FT, Latiff NMA, Malik NNNA et al (2020) A survey of voice pathology surveillance systems based on internet of things and machine learning algorithms. IEEE Access 8:64514–64533. https://doi.org/10.1109/access.2020.2984925

    Article  Google Scholar 

  8. Ali Z, Alsulaiman M, Muhammad G, Elamvazuthi I, Al-Nasheri A, Mesallam TA, ..., Malki KH (2017) Intra-and inter-database study for Arabic, English, and German databases: do conventional speech features detect voice pathology? J Voice 31(3):386–e1. https://doi.org/10.1016/j.jvoice.2016.09.009

  9. Altaf A, Mahdin H, Maskat R, Shaharudin SM, Altaf A, Mahmood A (2023) A novel voice feature AVA and its application to the pathological voice detection through machine learning. Int J Adv Comput Sci Appl 14(9). https://doi.org/10.14569/IJACSA.2023.01409113

  10. Altayeb M, Al-Ghraibah A (2022) Classification of three pathological voices based on specific features groups using support vector machine. Int J Electr Comput Eng (IJECE) 12:946. https://doi.org/10.11591/ijece.v12i1.pp946-956

    Article  Google Scholar 

  11. Amara F, Fezari M (2014) Voice pathologies classification using GMM and SVM classifiers. In: Proceedings of Proceedings of the 2013 International Conference on Biology, Medical Physics, Medical Chemistry, Biochemistry and Biomedical Engineering. DEStech Publications, pp 65–69 https://doi.org/10.13140/RG.2.1.1857.7441

  12. American Speech-Language-Hearing Association [ASHA] (1993) Definitions of communication disorders and variations. https://doi.org/10.1044/policy.rp1993-00208

  13. AnilKumar V, Reddy RVS (2023) Classification of voice pathology using different features and Bi-LSTM. 2023 International Conference on Smart Systems for applications in Electrical Sciences (ICSSES), Tumakuru, India, pp 1–4. https://doi.org/10.1109/ICSSES58299.2023.10200529

  14. Asmae O, Abdelhadi R, Bouchaib C et al (2020) Parkinson’s disease identification using KNN and ANN algorithms based on voice disorder. In: 2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE Xplore, pp 1–6 https://doi.org/10.1109/IRASET48871.2020.9092228

  15. Basalamah A, Hasan M, Bhowmik S, Akib Shahriyar S (2023) A highly accurate dysphonia detection system using linear discriminant analysis. Comput Syst Sci Eng 44:1921–1938. https://doi.org/10.32604/csse.2023.027399

    Article  Google Scholar 

  16. Behlau M, Zambon F, Guerrieri AC, Roy N (2012) Epidemiology of voice disorders in teachers and nonteachers in Brazil: prevalence and adverse effects. J Voice 26:665.e9–665.e18. https://doi.org/10.1016/j.jvoice.2011.09.010

    Article  Google Scholar 

  17. Bhangale KB, Titare P, Pawar R, Bhavsar S (2018) Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J Eng (IOSRJEN) 8(6):55–62

    Google Scholar 

  18. Bhattacharyya N (2014) The prevalence of voice problems among adults in the United States. Laryngoscope 124:2359–2362. https://doi.org/10.1002/lary.24740

    Article  Google Scholar 

  19. Bhattarai K, Prasad PWC, Alsadoon A et al (2017) Experiments on the MFCC application in speaker recognition using MATLAB. In: 2017 Seventh International Conference on Information Science and Technology (ICIST). IEEE Xplore, pp 32–37. https://doi.org/10.1109/ICIST.2017.7926796

  20. Boone DR, Mcfarlane SC, Von SL, Zraick RI (2020) The voice and voice therapy. Pearson, Hoboken

    Google Scholar 

  21. Boualoulou N, Belhoussine Drissi T, Nsiri B (2022) An intelligent approach based on the combination of the discrete wavelet transform, delta delta MFCC for Parkinson’s disease diagnosis. Int J Adv Comput Sci Appl 13. https://doi.org/10.14569/ijacsa.2022.0130466

  22. Boualoulou N, Belhoussine Drissi T, Nsiri B (2024) Comparison of feature extraction methods between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease diagnosis. In: Joby PP, Alencar MS, Falkowski-Gilski P (eds) IoT Based Control Networks and Intelligent Systems. ICICNIS 2023. Lecture notes in networks and systems, vol 789. Springer, Singapore. https://doi.org/10.1007/978-981-99-6586-1_16

  23. Charbuty B, Abdulazeez A (2021) Classification based on Decision Tree algorithm for machine learning. J Appl Sci Technol Trends 2:20–28. https://doi.org/10.38094/jastt20165

    Article  Google Scholar 

  24. Chatterjee S (2019) An optimized music recognition system using mel-frequency cepstral coefficient (MFCC) and vector quantization (VQ). Research Directions: Special Issue International Business Research Conference on Transformation Opportunities and Sustainability Challenges in Technology and Management, pp 100–106

  25. Coelho S, Shashirekha HL (2023) Identification of Voice disorders: a comparative study of machine learning algorithms. In: Karpov A, Samudravijaya K, Deepak KT, Hegde RM, Agrawal SS, Prasanna SRM (eds) Speech and Computer. SPECOM 2023. Lecture notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_45

  26. Dankovičová Z, Sovák D, Drotár P, Vokorokos L (2018) Machine learning approach to dysphonia detection. Appl Sci 8:1927. https://doi.org/10.3390/app8101927

    Article  Google Scholar 

  27. Darouiche MS, Moubtahij HE, Yakhlef MB, Tazi EB (2022) An automatic voice disorder detection system based on extreme gradient boosting classifier. In: 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE, pp 1–5. https://doi.org/10.1109/IRASET52964.2022.9737980

  28. Degila K, Errattahi R, Hannani AE (2019) The UCD System for the 2018 FEMH voice data challenge. In: 2018 IEEE International Conference on Big Data (Big Data). IEEE Xplore, pp 5242–5246. https://doi.org/10.1109/BigData.2018.8622604

  29. Gayathri S, Priya E (2022) Identification of voice pathology from temporal and cepstral features for vowel a low intonation. In: 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC). IEEE Xplore, pp 345–350. https://doi.org/10.1109/ICESIC53714.2022.9783484

  30. Goyal J, Khandnor P, Aseri TC (2020) A comparative analysis of machine learning classifiers for Dysphonia-based classification of Parkinson’s Disease. Int J Data Sci Analytics 11:69–83. https://doi.org/10.1007/s41060-020-00234-0

    Article  Google Scholar 

  31. Hadjaidji E, Korba MCA, Khelil K (2021) Spasmodic dysphonia detection using machine learning classifiers. In: 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI). IEEE Xplore, pp 1–5. https://doi.org/10.1109/ICRAMI52622.2021.9585920

  32. Hawi S, Alhozami J, AlQahtani R et al (2022) Automatic Parkinson’s disease detection based on the combination of long-term acoustic features and Mel frequency Cepstral coefficients (MFCC). Biomed Signal Process 78:104013

    Article  Google Scholar 

  33. Ilapakurti A, Kedari S, Vuppalapati JS et al (2019) Artificial Intelligent (AI) clinical edge for voice disorder detection. In: 2019 IEEE Fifth International Conference on Big Data Computing Service and Applications (BigDataService). IEEE Xplore, pp 340–345. https://doi.org/10.1109/BigDataService.2019.00060

  34. Kadiri SR, Alku P (2020) Analysis and detection of pathological voice using glottal source features. IEEE J Selec Topics Signal Process 14:367–379. https://doi.org/10.1109/jstsp.2019.2957988

    Article  Google Scholar 

  35. Mcloughlin I (2016) Speech and audio processing: a MATLAB®-based approach. Cambridge University Pres, Cops, Cambridge. https://doi.org/10.1017/CB09781316084205.005

    Book  Google Scholar 

  36. Mesallam TA, Farahat M, Malki KH, Alsulaiman M, Ali Z, Al-Nasheri A, Muhammad G (2017) Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthc Eng 2017:1. https://doi.org/10.1155/2017/8783751

    Article  Google Scholar 

  37. Milani MGM, Ramashini M, Krishani M (2021) A real-time application to detect human voice disorders. In: 2020 International Conference on Decision Aid Sciences and Application (DASA). IEEE Xplore, pp 979–984. https://doi.org/10.1109/DASA51403.2020.9317268

  38. Oates J, Winkworth A (2008) Current knowledge, controversies and future directions in hyperfunctional voice disorders. Int J Speech Lang Pathol 10:267–277. https://doi.org/10.1080/17549500802140153

    Article  Google Scholar 

  39. Owida HA, Al-Ghraibah A, Altayeb M (2021) Classification of chest x-ray images using Wavelet and MFCC Features and Support Vector Machine Classifier. Eng Technol Appl Sci Res 11:7296–7301. https://doi.org/10.48084/etasr.4123

    Article  Google Scholar 

  40. Paniagua MS, Pérez CJ, Calle-Alonso F, Salazar C (2020) An acoustic-signal-based preventive program for university lecturers’ vocal health. J Voice 34(1):88–99. https://doi.org/10.1016/j.jvoice.2018.05.011

    Article  Google Scholar 

  41. Peng X, Xu H, Liu J et al (2023) Voice Disorder classification using convolutional neural network based on deep transfer learning. Sci Rep 13:7264. https://doi.org/10.1038/s41598-023-34461-9

    Article  Google Scholar 

  42. Phyland D, Miles A (2019) Occupational voice is a work in progress. Curr Opin Otolaryngol Head Neck Surg 27:439–447. https://doi.org/10.1097/moo.0000000000000584

    Article  Google Scholar 

  43. Pittala RB, Tejopriya BR, Pala E (2022) Study of speech recognition using CNN. In: 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS). IEEE Xplore, pp 150–155. https://doi.org/10.1109/ICAIS53314.2022.9743083

  44. Putzer M, Barry WJ Saarbruecken Voice Database. In: Saarbruecken Voice Database - SPSC @ TU Graz. https://www.spsc.tugraz.at/databases-and-tools/saarbruecken-voice-database.html. Accessed 10 Feb 2024

  45. Ramoo D (2021) 2.2 the articulatory system. BCcampus Pressbooks

  46. Ranjan R, Thakur A (2019) Analysis of feature extraction techniques for speech recognition system. Int J Innovative Technol Exploring Eng (IJITEE) 8:197–200

    Google Scholar 

  47. Reddy MK, Alku P (2021) A comparison of Cepstral features in the detection of pathological voices by varying the input and filter bank of the cepstrum computation. IEEE Access 9:135953–135963. https://doi.org/10.1109/access.2021.3117665

    Article  Google Scholar 

  48. Roy N, Merrill RM, Gray SD, Smith EM (2005) Voice disorders in the general population: prevalence, risk factors, and occupational impact. Laryngoscope 115:1988–1995. https://doi.org/10.1097/01.mlg.0000179174.32345.41

    Article  Google Scholar 

  49. Sakar BE, Isenkul ME, Sakar CO et al (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomedical Health Inf 17:828–834. https://doi.org/10.1109/JBHI.2013.2245674

    Article  Google Scholar 

  50. Simberg S, Sala E, Tuomainen J et al (2006) The effectiveness of group therapy for students with mild voice disorders: a controlled clinical trial. J Voice 20:97–109. https://doi.org/10.1016/j.jvoice.2005.01.002

    Article  Google Scholar 

  51. Strang G (1999) The discrete cosine transform. SIAM Rev 41:135–147

    Article  MathSciNet  Google Scholar 

  52. Syed SA, Rashid M, Hussain S et al (2021) Inter classifier comparison to detect voice pathologies. Math Biosci Eng 18:2258–2273. https://doi.org/10.3934/mbe.2021114

    Article  Google Scholar 

  53. Taoufiq BD, Soumaya Z, Benayad N, Nouhaila B (2022) Cepstral coefficient extraction using the MFCC with the discrete wavelet transform for the Parkinson’s Disease diagnosis. Int J Eng Trends Technol 70:283–290. https://doi.org/10.14445/22315381/ijett-v70i7p229

    Article  Google Scholar 

  54. Thibeault S, Colton RH, Leonard R (2011) Understanding voice problems: a physiological perspective for diagnosis and treatment. Wolters Kluwer/Lippincott Williams & Wilkins, Philadelphia

    Google Scholar 

  55. Tirronen S, Kadiri SR, Alku P (2022) The effect of the MFCC frame length in automatic voice pathology detection. J Voice. https://doi.org/10.1016/j.jvoice.2022.03.021

    Article  Google Scholar 

  56. van Houtte E, Claeys S, Wuyts F, van Lierde K (2012) Voice disorders in teachers: occupational risk factors and psycho-emotional factors. Logoped Phoniatr Vocol 37:107–116. https://doi.org/10.3109/14015439.2012.660499

    Article  Google Scholar 

  57. Velardo V (2020) Mel-frequency cepstral coefficients explained easily presentation slides. In: Audio Signal Processing for Machine Learning. https://www.youtube.com/watch?v=4_SH2nfbQZ8&t=1563s. Accessed 23 Apr 2023

  58. Verma V, Benjwal A, Chhabra A et al (2023) A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection. Sci Rep 13:22719. https://doi.org/10.1038/s41598-023-49869-6

    Article  Google Scholar 

  59. Vimal W (2022) Study on the behaviour of Mel Frequency Cepstral Coefficient algorithm for different windows. In: 2022 International Conference on Innovative Trends in Information Technology (ICITIIT). IEEE Xplore, pp 1–6. https://doi.org/10.1109/ICITIIT54346.2022.9744231

  60. Vinod H, Sharma RK, Shandilya R (2018) Dysphonic voice detection using MDVP parameters and computer science. In: 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS). IEEE Xplore, pp 1–4. https://doi.org/10.1109/SCEECS.2018.8546882

  61. Yin H, Hohmann V, Nadeu C (2011) Acoustic features for speech recognition based on Gammatone Filterbank and instantaneous frequency. Speech Commun 53:707–715. https://doi.org/10.1016/j.specom.2010.04.008

    Article  Google Scholar 

  62. Zakariah M, Ajmi Alotaibi BR et al (2022) An analytical study of speech pathology detection based on MFCC and deep neural networks. Comput Math Methods Med 2022:1–15. https://doi.org/10.1155/2022/7814952

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Yayasan Canselor Uniten (YCU) Seeding Fund No. 202210016YCU and Innovation & Research Management Centre (iRMC).

Funding

This work received funding from Yayasan Canselor Uniten (YCU) Seeding Fund No. 202210016YCU and Innovation & Research Management Centre (iRMC).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: N.A.A.L. and M.S.S.; Literature search: N.A.A.L.; Data analysis: N.A.A.L. and K.K.S.; Writing -original draft preparation: N.A.A.L.; Writing -review and editing: M.S.S. and K.K.S; Writing- final draft: N.A.A.L. and K.K.S; Supervision: M.S.S. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Manjit Singh Sidhu.

Ethics declarations

Ethics approval and consent to participate

Not applicable. 

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sidhu, M.S., Latib, N.A.A. & Sidhu, K.K. MFCC in audio signal processing for voice disorder: a review. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19253-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19253-1

Keywords

Navigation