MFCC in audio signal processing for voice disorder: a review

Sidhu, Manjit Singh; Latib, Nur Atiqah Abdul; Sidhu, Kirandeep Kaur

doi:10.1007/s11042-024-19253-1

MFCC in audio signal processing for voice disorder: a review

Published: 27 April 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

73 Accesses
Explore all metrics

Abstract

Voice Disorder or Dysphonia has caught the attention of audio signal process engineers and researchers. The efficiency of several feature extraction and classifier implementation techniques in identifying voice abnormalities has been investigated. Mel-Frequency Cepstral Coefficient (MFCC) has been extensively used as a feature extractor. This paper adopts a Comparative Review Method to assess the effectiveness of feature extraction and classifier methods in detecting voice disorders. By examining the pairing of the Mel-Frequency Cepstral Coefficient (MFCC) with various classifiers, including Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Tree (DT), and other online or commercial classifiers, the study aims to review the robustness of MFCC in this context. The study also recognizes the significance of choosing the right database in light of the various aetiologies of pathological illnesses and its possible influence on the efficacy of voice disorder detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Databases, features and classifiers for speech emotion recognition: a review

Article 19 January 2018

Data availability

Not applicable.

Abbreviations

MFCC:: Mel-Frequency Cepstral Coefficient
SVM:: Support Vector Machine
ANN:: Artificial Neural Network
DT:: Decision Tree
FT:: Fourier Transform
DWT:: Discrete Wavelet Transform
DFT:: Discrete Fourier Transform
OSLEM:: Online Sequential Extreme Learning Machine
MDVP:: Multi-Dimensional Voice Parameter
HNR:: Harmonic-to-Noise Ratio
LDA:: Linear Discriminant Analysis
KNN:: K-Nearest Neighbour
DNN:: Deep Neural Network
DCT:: Discrete Cosine Transform
HTK:: Hidden Markov Model Toolkit
ZCR:: Zero Crossing Rate
SVD:: Saarbruecken Voice Database
AVPD:: Arabic Voice Pathology Database
MEEI:: Massachusetts Eye and Ear Infirmary Voice Disorders Database
M.S.S.:: Manjit Singh Sidhu
N.A.A.L.:: Nur Atiqah Abdul Latib
K.K.S.:: Kirandeep Kaur Sidhu

References

Abdul ZK, Al-Talabani AK (2022) Mel frequency cepstral coefficient and its applications: a review. IEEE Access 10:122136–122158. https://doi.org/10.1109/ACCESS.2022.3223444
Article Google Scholar
Abdulmajeed NQ, Al-Khateeb B, Mohammed MA (2022) A review on voice pathology: taxonomy, diagnosis, medical procedures and detection techniques, open challenges, limitations, and recommendations for future directions. J Intell Syst 31(1):855–875. https://doi.org/10.1515/jisys-2022-0058
Article Google Scholar
Aghaei F, Khoramshahi H, Biparva S (2022) Psychometric characteristics of different versions of vocal tract discomfort (VTD) scale: a systematic review. Iran J Public Health 51:37–47. https://doi.org/10.18502/ijph.v51i1.8290
Article Google Scholar
AL-Dhief FT, Latiff NMA, Malik NNNA et al (2020) Voice pathology detection using machine learning technique. In: 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT). IEEE Xplore, pp 99–104. https://doi.org/10.1109/ISTT50966.2020.9279346
Al-Dheif FT, Latiff NMA, Baki MM et al (2021) Voice pathology detection using support vector machine based on different number of voice signals. In: 2021 26th IEEE Asia-Pacific Conference on Communications (APCC). IEEE Xplore, pp 1–6. https://doi.org/10.1109/APCC49754.2021.9609830
Al-Dhief FT, Latiff NMA, Malik NNNA et al (2022) Dysphonia detection based on voice signals using naive bayes classifier. In: 2022 IEEE 6th International Symposium on Telecommunication Technologies (ISTT). IEEE Xplore, pp 56–61. https://doi.org/10.1109/ISTT56288.2022.9966535
Al-Dhief FT, Latiff NMA, Malik NNNA et al (2020) A survey of voice pathology surveillance systems based on internet of things and machine learning algorithms. IEEE Access 8:64514–64533. https://doi.org/10.1109/access.2020.2984925
Article Google Scholar
Ali Z, Alsulaiman M, Muhammad G, Elamvazuthi I, Al-Nasheri A, Mesallam TA, ..., Malki KH (2017) Intra-and inter-database study for Arabic, English, and German databases: do conventional speech features detect voice pathology? J Voice 31(3):386–e1. https://doi.org/10.1016/j.jvoice.2016.09.009
Altaf A, Mahdin H, Maskat R, Shaharudin SM, Altaf A, Mahmood A (2023) A novel voice feature AVA and its application to the pathological voice detection through machine learning. Int J Adv Comput Sci Appl 14(9). https://doi.org/10.14569/IJACSA.2023.01409113
Altayeb M, Al-Ghraibah A (2022) Classification of three pathological voices based on specific features groups using support vector machine. Int J Electr Comput Eng (IJECE) 12:946. https://doi.org/10.11591/ijece.v12i1.pp946-956
Article Google Scholar
Amara F, Fezari M (2014) Voice pathologies classification using GMM and SVM classifiers. In: Proceedings of Proceedings of the 2013 International Conference on Biology, Medical Physics, Medical Chemistry, Biochemistry and Biomedical Engineering. DEStech Publications, pp 65–69 https://doi.org/10.13140/RG.2.1.1857.7441
American Speech-Language-Hearing Association [ASHA] (1993) Definitions of communication disorders and variations. https://doi.org/10.1044/policy.rp1993-00208
AnilKumar V, Reddy RVS (2023) Classification of voice pathology using different features and Bi-LSTM. 2023 International Conference on Smart Systems for applications in Electrical Sciences (ICSSES), Tumakuru, India, pp 1–4. https://doi.org/10.1109/ICSSES58299.2023.10200529
Asmae O, Abdelhadi R, Bouchaib C et al (2020) Parkinson’s disease identification using KNN and ANN algorithms based on voice disorder. In: 2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE Xplore, pp 1–6 https://doi.org/10.1109/IRASET48871.2020.9092228
Basalamah A, Hasan M, Bhowmik S, Akib Shahriyar S (2023) A highly accurate dysphonia detection system using linear discriminant analysis. Comput Syst Sci Eng 44:1921–1938. https://doi.org/10.32604/csse.2023.027399
Article Google Scholar
Behlau M, Zambon F, Guerrieri AC, Roy N (2012) Epidemiology of voice disorders in teachers and nonteachers in Brazil: prevalence and adverse effects. J Voice 26:665.e9–665.e18. https://doi.org/10.1016/j.jvoice.2011.09.010
Article Google Scholar
Bhangale KB, Titare P, Pawar R, Bhavsar S (2018) Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J Eng (IOSRJEN) 8(6):55–62
Google Scholar
Bhattacharyya N (2014) The prevalence of voice problems among adults in the United States. Laryngoscope 124:2359–2362. https://doi.org/10.1002/lary.24740
Article Google Scholar
Bhattarai K, Prasad PWC, Alsadoon A et al (2017) Experiments on the MFCC application in speaker recognition using MATLAB. In: 2017 Seventh International Conference on Information Science and Technology (ICIST). IEEE Xplore, pp 32–37. https://doi.org/10.1109/ICIST.2017.7926796
Boone DR, Mcfarlane SC, Von SL, Zraick RI (2020) The voice and voice therapy. Pearson, Hoboken
Google Scholar
Boualoulou N, Belhoussine Drissi T, Nsiri B (2022) An intelligent approach based on the combination of the discrete wavelet transform, delta delta MFCC for Parkinson’s disease diagnosis. Int J Adv Comput Sci Appl 13. https://doi.org/10.14569/ijacsa.2022.0130466
Boualoulou N, Belhoussine Drissi T, Nsiri B (2024) Comparison of feature extraction methods between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease diagnosis. In: Joby PP, Alencar MS, Falkowski-Gilski P (eds) IoT Based Control Networks and Intelligent Systems. ICICNIS 2023. Lecture notes in networks and systems, vol 789. Springer, Singapore. https://doi.org/10.1007/978-981-99-6586-1_16
Charbuty B, Abdulazeez A (2021) Classification based on Decision Tree algorithm for machine learning. J Appl Sci Technol Trends 2:20–28. https://doi.org/10.38094/jastt20165
Article Google Scholar
Chatterjee S (2019) An optimized music recognition system using mel-frequency cepstral coefficient (MFCC) and vector quantization (VQ). Research Directions: Special Issue International Business Research Conference on Transformation Opportunities and Sustainability Challenges in Technology and Management, pp 100–106
Coelho S, Shashirekha HL (2023) Identification of Voice disorders: a comparative study of machine learning algorithms. In: Karpov A, Samudravijaya K, Deepak KT, Hegde RM, Agrawal SS, Prasanna SRM (eds) Speech and Computer. SPECOM 2023. Lecture notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_45
Dankovičová Z, Sovák D, Drotár P, Vokorokos L (2018) Machine learning approach to dysphonia detection. Appl Sci 8:1927. https://doi.org/10.3390/app8101927
Article Google Scholar
Darouiche MS, Moubtahij HE, Yakhlef MB, Tazi EB (2022) An automatic voice disorder detection system based on extreme gradient boosting classifier. In: 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET). IEEE, pp 1–5. https://doi.org/10.1109/IRASET52964.2022.9737980
Degila K, Errattahi R, Hannani AE (2019) The UCD System for the 2018 FEMH voice data challenge. In: 2018 IEEE International Conference on Big Data (Big Data). IEEE Xplore, pp 5242–5246. https://doi.org/10.1109/BigData.2018.8622604
Gayathri S, Priya E (2022) Identification of voice pathology from temporal and cepstral features for vowel a low intonation. In: 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC). IEEE Xplore, pp 345–350. https://doi.org/10.1109/ICESIC53714.2022.9783484
Goyal J, Khandnor P, Aseri TC (2020) A comparative analysis of machine learning classifiers for Dysphonia-based classification of Parkinson’s Disease. Int J Data Sci Analytics 11:69–83. https://doi.org/10.1007/s41060-020-00234-0
Article Google Scholar
Hadjaidji E, Korba MCA, Khelil K (2021) Spasmodic dysphonia detection using machine learning classifiers. In: 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI). IEEE Xplore, pp 1–5. https://doi.org/10.1109/ICRAMI52622.2021.9585920
Hawi S, Alhozami J, AlQahtani R et al (2022) Automatic Parkinson’s disease detection based on the combination of long-term acoustic features and Mel frequency Cepstral coefficients (MFCC). Biomed Signal Process 78:104013
Article Google Scholar
Ilapakurti A, Kedari S, Vuppalapati JS et al (2019) Artificial Intelligent (AI) clinical edge for voice disorder detection. In: 2019 IEEE Fifth International Conference on Big Data Computing Service and Applications (BigDataService). IEEE Xplore, pp 340–345. https://doi.org/10.1109/BigDataService.2019.00060
Kadiri SR, Alku P (2020) Analysis and detection of pathological voice using glottal source features. IEEE J Selec Topics Signal Process 14:367–379. https://doi.org/10.1109/jstsp.2019.2957988
Article Google Scholar
Mcloughlin I (2016) Speech and audio processing: a MATLAB®-based approach. Cambridge University Pres, Cops, Cambridge. https://doi.org/10.1017/CB09781316084205.005
Book Google Scholar
Mesallam TA, Farahat M, Malki KH, Alsulaiman M, Ali Z, Al-Nasheri A, Muhammad G (2017) Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthc Eng 2017:1. https://doi.org/10.1155/2017/8783751
Article Google Scholar
Milani MGM, Ramashini M, Krishani M (2021) A real-time application to detect human voice disorders. In: 2020 International Conference on Decision Aid Sciences and Application (DASA). IEEE Xplore, pp 979–984. https://doi.org/10.1109/DASA51403.2020.9317268
Oates J, Winkworth A (2008) Current knowledge, controversies and future directions in hyperfunctional voice disorders. Int J Speech Lang Pathol 10:267–277. https://doi.org/10.1080/17549500802140153
Article Google Scholar
Owida HA, Al-Ghraibah A, Altayeb M (2021) Classification of chest x-ray images using Wavelet and MFCC Features and Support Vector Machine Classifier. Eng Technol Appl Sci Res 11:7296–7301. https://doi.org/10.48084/etasr.4123
Article Google Scholar
Paniagua MS, Pérez CJ, Calle-Alonso F, Salazar C (2020) An acoustic-signal-based preventive program for university lecturers’ vocal health. J Voice 34(1):88–99. https://doi.org/10.1016/j.jvoice.2018.05.011
Article Google Scholar
Peng X, Xu H, Liu J et al (2023) Voice Disorder classification using convolutional neural network based on deep transfer learning. Sci Rep 13:7264. https://doi.org/10.1038/s41598-023-34461-9
Article Google Scholar
Phyland D, Miles A (2019) Occupational voice is a work in progress. Curr Opin Otolaryngol Head Neck Surg 27:439–447. https://doi.org/10.1097/moo.0000000000000584
Article Google Scholar
Pittala RB, Tejopriya BR, Pala E (2022) Study of speech recognition using CNN. In: 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS). IEEE Xplore, pp 150–155. https://doi.org/10.1109/ICAIS53314.2022.9743083
Putzer M, Barry WJ Saarbruecken Voice Database. In: Saarbruecken Voice Database - SPSC @ TU Graz. https://www.spsc.tugraz.at/databases-and-tools/saarbruecken-voice-database.html. Accessed 10 Feb 2024
Ramoo D (2021) 2.2 the articulatory system. BCcampus Pressbooks
Ranjan R, Thakur A (2019) Analysis of feature extraction techniques for speech recognition system. Int J Innovative Technol Exploring Eng (IJITEE) 8:197–200
Google Scholar
Reddy MK, Alku P (2021) A comparison of Cepstral features in the detection of pathological voices by varying the input and filter bank of the cepstrum computation. IEEE Access 9:135953–135963. https://doi.org/10.1109/access.2021.3117665
Article Google Scholar
Roy N, Merrill RM, Gray SD, Smith EM (2005) Voice disorders in the general population: prevalence, risk factors, and occupational impact. Laryngoscope 115:1988–1995. https://doi.org/10.1097/01.mlg.0000179174.32345.41
Article Google Scholar
Sakar BE, Isenkul ME, Sakar CO et al (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomedical Health Inf 17:828–834. https://doi.org/10.1109/JBHI.2013.2245674
Article Google Scholar
Simberg S, Sala E, Tuomainen J et al (2006) The effectiveness of group therapy for students with mild voice disorders: a controlled clinical trial. J Voice 20:97–109. https://doi.org/10.1016/j.jvoice.2005.01.002
Article Google Scholar
Strang G (1999) The discrete cosine transform. SIAM Rev 41:135–147
Article MathSciNet Google Scholar
Syed SA, Rashid M, Hussain S et al (2021) Inter classifier comparison to detect voice pathologies. Math Biosci Eng 18:2258–2273. https://doi.org/10.3934/mbe.2021114
Article Google Scholar
Taoufiq BD, Soumaya Z, Benayad N, Nouhaila B (2022) Cepstral coefficient extraction using the MFCC with the discrete wavelet transform for the Parkinson’s Disease diagnosis. Int J Eng Trends Technol 70:283–290. https://doi.org/10.14445/22315381/ijett-v70i7p229
Article Google Scholar
Thibeault S, Colton RH, Leonard R (2011) Understanding voice problems: a physiological perspective for diagnosis and treatment. Wolters Kluwer/Lippincott Williams & Wilkins, Philadelphia
Google Scholar
Tirronen S, Kadiri SR, Alku P (2022) The effect of the MFCC frame length in automatic voice pathology detection. J Voice. https://doi.org/10.1016/j.jvoice.2022.03.021
Article Google Scholar
van Houtte E, Claeys S, Wuyts F, van Lierde K (2012) Voice disorders in teachers: occupational risk factors and psycho-emotional factors. Logoped Phoniatr Vocol 37:107–116. https://doi.org/10.3109/14015439.2012.660499
Article Google Scholar
Velardo V (2020) Mel-frequency cepstral coefficients explained easily presentation slides. In: Audio Signal Processing for Machine Learning. https://www.youtube.com/watch?v=4_SH2nfbQZ8&t=1563s. Accessed 23 Apr 2023
Verma V, Benjwal A, Chhabra A et al (2023) A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection. Sci Rep 13:22719. https://doi.org/10.1038/s41598-023-49869-6
Article Google Scholar
Vimal W (2022) Study on the behaviour of Mel Frequency Cepstral Coefficient algorithm for different windows. In: 2022 International Conference on Innovative Trends in Information Technology (ICITIIT). IEEE Xplore, pp 1–6. https://doi.org/10.1109/ICITIIT54346.2022.9744231
Vinod H, Sharma RK, Shandilya R (2018) Dysphonic voice detection using MDVP parameters and computer science. In: 2018 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS). IEEE Xplore, pp 1–4. https://doi.org/10.1109/SCEECS.2018.8546882
Yin H, Hohmann V, Nadeu C (2011) Acoustic features for speech recognition based on Gammatone Filterbank and instantaneous frequency. Speech Commun 53:707–715. https://doi.org/10.1016/j.specom.2010.04.008
Article Google Scholar
Zakariah M, Ajmi Alotaibi BR et al (2022) An analytical study of speech pathology detection based on MFCC and deep neural networks. Comput Math Methods Med 2022:1–15. https://doi.org/10.1155/2022/7814952
Article Google Scholar

Download references

Acknowledgements

This work is supported by Yayasan Canselor Uniten (YCU) Seeding Fund No. 202210016YCU and Innovation & Research Management Centre (iRMC).

Funding

This work received funding from Yayasan Canselor Uniten (YCU) Seeding Fund No. 202210016YCU and Innovation & Research Management Centre (iRMC).

Author information

Authors and Affiliations

Department of Graphics and Multimedia, College of Computing and Informatics, University Tenaga Nasional (UNITEN), Putrajaya Campus, Jalan Kajang - Puchong, Kajang, Selangor, 43000, Malaysia
Manjit Singh Sidhu & Nur Atiqah Abdul Latib
MK-Faculty of Medicine and Health Sciences, Universiti Tunku Abdul Rahman (UTAR), Kajang, Selangor, Malaysia
Kirandeep Kaur Sidhu

Authors

Manjit Singh Sidhu
View author publications
You can also search for this author in PubMed Google Scholar
Nur Atiqah Abdul Latib
View author publications
You can also search for this author in PubMed Google Scholar
Kirandeep Kaur Sidhu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: N.A.A.L. and M.S.S.; Literature search: N.A.A.L.; Data analysis: N.A.A.L. and K.K.S.; Writing -original draft preparation: N.A.A.L.; Writing -review and editing: M.S.S. and K.K.S; Writing- final draft: N.A.A.L. and K.K.S; Supervision: M.S.S. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Manjit Singh Sidhu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sidhu, M.S., Latib, N.A.A. & Sidhu, K.K. MFCC in audio signal processing for voice disorder: a review. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19253-1

Download citation

Received: 27 August 2023
Revised: 24 February 2024
Accepted: 14 April 2024
Published: 27 April 2024
DOI: https://doi.org/10.1007/s11042-024-19253-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MFCC in audio signal processing for voice disorder: a review

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Databases, features and classifiers for speech emotion recognition: a review

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MFCC in audio signal processing for voice disorder: a review

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Databases, features and classifiers for speech emotion recognition: a review

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation