research-article

The Study of Voice Pathology Detection based on MFCC and SVM

Authors:
Yipeng Niu

The High School Affiliated to Renmin University of China, China

The High School Affiliated to Renmin University of China, China
View Profile

,
Jiaming Cao

Xishan Branch High School affiliated to Renmin University of China, China

Xishan Branch High School affiliated to Renmin University of China, China
View Profile

,
Fei Shen

School of Biological Science and Medical Engineering Beihang University, China

School of Biological Science and Medical Engineering Beihang University, China
View Profile

,
Pengling Ren

Beijing Friendship Hospital Capital Medical University, China

Beijing Friendship Hospital Capital Medical University, China
View Profile

ICBBE '20: Proceedings of the 2020 7th International Conference on Biomedical and Bioinformatics EngineeringNovember 2020Pages 27–30https://doi.org/10.1145/3444884.3444890

Published:31 March 2021Publication History

ICBBE '20: Proceedings of the 2020 7th International Conference on Biomedical and Bioinformatics Engineering

Pages 27–30

ABSTRACT

Subjective auditory perception evaluation of voice is the most simple and direct method for judgment of the degree of voice lesions and the treatment effect. But it is closely related to the clinical experience of doctors. Recently, some voice automatic diagnosis methods based on voice feature parameters and classification algorithms have been proposed. Mel Frequency Cepstral Coefficient (MFCC) is the most commonly used feature parameter. However, it is not clear the role of MFCC dynamic features in improving diagnosis results. This study adopted the features of MFCC, MFCC + ΔMFCC, and MFCC + ΔMFCC + ΔΔMFCC respectively, combined with the Support Vector Machine (SVM) method to further determine whether adding dynamic MFCC features can improve the accuracy of pathological voice detection. The results showed that no matter whether dynamic features were added or not, the accuracy rate and specificity have not changed significantly. This means the dynamic change of the MFCC characteristic parameters is slight at least for vowel vocalization. This study may provide useful information for pathological voice diagnosis based on vowel vocalization.

References

Verdolini, K. and Ramig, L.O. 2001. Review: occupational risks for voice problems. Logop. Phoniatr. Voco. 26, 1 (Jul. 2001), 37-46. DOI= https://doi.org/10.1080/14015430119969.Google Scholar
Stemple, J.C., Roy, N. and Klaben, B.K. 2014. Clinical Voice Pathology Theory and Management. San Diego, Plural Publishing.Google Scholar
Crowe, K., Masso, S. and Hopf, S. 2018. Innovations actively shaping speech-language pathology evidence-based practice Int. J. Speech. Lang. Pathol. 20, 3(Jun. 2018), 297-299. Doi= https://doi.org/10.1080/17549507.2018.1462851.Google Scholar
Szklanny, K., Gubrynowicz, R., Ratyńska, J., Chojnacka-Wądołowska, D., 2019. Electroglottographic and acoustic analysis of voice in children with vocal nodules. Int. J. Pediatr. Otorhinolaryngol. 122(Apr. 2019), 82-88. Doi= https://doi.org/ 10.1016/j.ijporl.2019.03.030.Google ScholarCross Ref
Yu, P.C., Gao, N., Li, X.M., The diagnostic value of laryngeal electromyography in vocal fold paralysis and arytenoid dislocation. Journal of Clinical Otorhinolaryngology Head and Neck Surgery. 32, 6 (2018), 420-423. DOI= https://doi:10.13201/j.issn.1001-1781.2018.06.006.Google Scholar
Ongkasuwan, J., Devore, D., Hollas, S., 2017. Laryngeal ultrasound and pediatric vocal fold nodules. Laryngoscope. 127, 3 (2017), 676-678. DOI= https://doi.org/ 10.1002/lary.26209.Google ScholarCross Ref
Alnasheri, A., Muhammad, G., Alsulaiman, M., 2017. Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions. J. Voice. 31, 1 (Jan. 2017), 3-15. DOI= http://doi:10.1016/j.jvoice.2016.01.014.Google Scholar
Martinez, D., Lleida, E., Ortega, A., 2012. Voice pathology detection on the saarbrücken voice database with calibration and fusion of scores using multifocal toolkit. Comm. Com. Inf. Sc. Springer, 99-109.Google Scholar
Majidnezhad, V. and Kheidorov, I. 2013. An ANN-based method for detecting vocal fold pathology. Int. J. Comput. Appl. 62, 7 (Jan. 2013), 1-4. DOI= https://doi.org/10.5120/10089-4722.Google ScholarCross Ref
Muhammad, G., Alhamid, M.F., Hossain, M.S., 2017. Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix. Sensors-Basel. 17, 2 (Jan. 2017), 267. DOI= https://doi.org/10.3390/s17020267.Google ScholarCross Ref
Chuang, Z.Y., Yu, X.T., Chen, J.Y., 2018. DNN-based Approach to Detect and Classify Pathological Voice. IEEE International Conference on Big Data. Seattle, WA, 5238-5241. DOI= https://doi.org/10.1109/BigData.2018.8622317.Google ScholarCross Ref
Kadiri, S.R. and Alku, P. 2019. Mel-Frequency Cepstral Coefficients of Voice Source Waveforms for Classification of Phonation Types in Speech. Proc. Interspeech. 2019, 2508-2512, DOI= https://doi.org/10.21437/Interspeech.2019-2863.Google ScholarCross Ref
Chin, K.O., Pandiyan, P.M., Yaacob, S., 2006. Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatics. (June. 2006), 1-5, DOI=https://doi.org/ 10.1109/ICOCI.2006.5276486.Google Scholar
Jeancolas, L., Benali, H., Benkelfat, B.E., 2017. Automatic detection of early stages of Parkinson's disease through acoustic voice analysis with mel-frequency cepstral coefficients. International Conference on Advanced Technologies for Signal and Image Processing. (May. 2017), 1-6, DOI=https://doi.org/10.1109/ATSIP.2017.8075567.Google ScholarCross Ref
Vapnik, V.N. 1999. An overview of statistical learning theory. IEEE Trans Neural Netw. 10, 5 (Sep 1999), 988-999. DOI= https://doi.org/10.1109/72.788640.Google ScholarDigital Library
David, V. 2003. Advanced support vector machines and kernel methods. Neurocomputing. 55, 1-2 (Sep 2003), 5-20. DOI= https://doi.org/ 10.1016/S0925-2312(03)00373-4.Google ScholarCross Ref
Bennett, K. and Campbell, C. 2000. Support vector machines: hype or hallelujah? Sigkdd Explor. 2, 2 (Dec 2000), 1-13. DOI= https://doi.org/ 10.1145/380995.380999.Google ScholarDigital Library
Shen, X.H., Wan, R.C. and Zhang, X.Y. 2015. A Speaker Voice Recognition System of Improved Dynamic Characteristic Parameters. Computer simulation. 32, 4, 154-158.Google Scholar

Recommendations

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted ...
Read More
A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes
Abstract
In linguistics, phonemes are the atomic sound, called word segmentor play an important role to recognize the word properly. A novel approach of seven Bengali vowels and ten diphthongs (a syllable for the pronunciation of two consecutive vowels) ...
Read More
Voice Gender Recognition Using Acoustic Features, MFCCs and SVM
Computational Science and Its Applications – ICCSA 2022
Abstract
This paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICBBE '20: Proceedings of the 2020 7th International Conference on Biomedical and Bioinformatics Engineering
November 2020
197 pages
ISBN:9781450388221
DOI:10.1145/3444884

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Mel frequency cepstral coefficient
Voice pathology
automatic diagnosis
support vector machine
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 52
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

The Study of Voice Pathology Detection based on MFCC and SVM

ICBBE '20: Proceedings of the 2020 7th International Conference on Biomedical and Bioinformatics Engineering

ABSTRACT

References

Cited By

Recommendations

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes

Voice Gender Recognition Using Acoustic Features, MFCCs and SVM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

The Study of Voice Pathology Detection based on MFCC and SVM

ICBBE '20: Proceedings of the 2020 7th International Conference on Biomedical and Bioinformatics Engineering

ABSTRACT

References

Cited By

Recommendations

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes

Voice Gender Recognition Using Acoustic Features, MFCCs and SVM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media