ABSTRACT
Subjective auditory perception evaluation of voice is the most simple and direct method for judgment of the degree of voice lesions and the treatment effect. But it is closely related to the clinical experience of doctors. Recently, some voice automatic diagnosis methods based on voice feature parameters and classification algorithms have been proposed. Mel Frequency Cepstral Coefficient (MFCC) is the most commonly used feature parameter. However, it is not clear the role of MFCC dynamic features in improving diagnosis results. This study adopted the features of MFCC, MFCC + ΔMFCC, and MFCC + ΔMFCC + ΔΔMFCC respectively, combined with the Support Vector Machine (SVM) method to further determine whether adding dynamic MFCC features can improve the accuracy of pathological voice detection. The results showed that no matter whether dynamic features were added or not, the accuracy rate and specificity have not changed significantly. This means the dynamic change of the MFCC characteristic parameters is slight at least for vowel vocalization. This study may provide useful information for pathological voice diagnosis based on vowel vocalization.
- Verdolini, K. and Ramig, L.O. 2001. Review: occupational risks for voice problems. Logop. Phoniatr. Voco. 26, 1 (Jul. 2001), 37-46. DOI= https://doi.org/10.1080/14015430119969.Google Scholar
- Stemple, J.C., Roy, N. and Klaben, B.K. 2014. Clinical Voice Pathology Theory and Management. San Diego, Plural Publishing.Google Scholar
- Crowe, K., Masso, S. and Hopf, S. 2018. Innovations actively shaping speech-language pathology evidence-based practice Int. J. Speech. Lang. Pathol. 20, 3(Jun. 2018), 297-299. Doi= https://doi.org/10.1080/17549507.2018.1462851.Google Scholar
- Szklanny, K., Gubrynowicz, R., Ratyńska, J., Chojnacka-Wądołowska, D., 2019. Electroglottographic and acoustic analysis of voice in children with vocal nodules. Int. J. Pediatr. Otorhinolaryngol. 122(Apr. 2019), 82-88. Doi= https://doi.org/ 10.1016/j.ijporl.2019.03.030.Google ScholarCross Ref
- Yu, P.C., Gao, N., Li, X.M., The diagnostic value of laryngeal electromyography in vocal fold paralysis and arytenoid dislocation. Journal of Clinical Otorhinolaryngology Head and Neck Surgery. 32, 6 (2018), 420-423. DOI= https://doi:10.13201/j.issn.1001-1781.2018.06.006.Google Scholar
- Ongkasuwan, J., Devore, D., Hollas, S., 2017. Laryngeal ultrasound and pediatric vocal fold nodules. Laryngoscope. 127, 3 (2017), 676-678. DOI= https://doi.org/ 10.1002/lary.26209.Google ScholarCross Ref
- Alnasheri, A., Muhammad, G., Alsulaiman, M., 2017. Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions. J. Voice. 31, 1 (Jan. 2017), 3-15. DOI= http://doi:10.1016/j.jvoice.2016.01.014.Google Scholar
- Martinez, D., Lleida, E., Ortega, A., 2012. Voice pathology detection on the saarbrücken voice database with calibration and fusion of scores using multifocal toolkit. Comm. Com. Inf. Sc. Springer, 99-109.Google Scholar
- Majidnezhad, V. and Kheidorov, I. 2013. An ANN-based method for detecting vocal fold pathology. Int. J. Comput. Appl. 62, 7 (Jan. 2013), 1-4. DOI= https://doi.org/10.5120/10089-4722.Google ScholarCross Ref
- Muhammad, G., Alhamid, M.F., Hossain, M.S., 2017. Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix. Sensors-Basel. 17, 2 (Jan. 2017), 267. DOI= https://doi.org/10.3390/s17020267.Google ScholarCross Ref
- Chuang, Z.Y., Yu, X.T., Chen, J.Y., 2018. DNN-based Approach to Detect and Classify Pathological Voice. IEEE International Conference on Big Data. Seattle, WA, 5238-5241. DOI= https://doi.org/10.1109/BigData.2018.8622317.Google ScholarCross Ref
- Kadiri, S.R. and Alku, P. 2019. Mel-Frequency Cepstral Coefficients of Voice Source Waveforms for Classification of Phonation Types in Speech. Proc. Interspeech. 2019, 2508-2512, DOI= https://doi.org/10.21437/Interspeech.2019-2863.Google ScholarCross Ref
- Chin, K.O., Pandiyan, P.M., Yaacob, S., 2006. Mel-frequency cepstral coefficient analysis in speech recognition. 2006 International Conference on Computing & Informatics. (June. 2006), 1-5, DOI=https://doi.org/ 10.1109/ICOCI.2006.5276486.Google Scholar
- Jeancolas, L., Benali, H., Benkelfat, B.E., 2017. Automatic detection of early stages of Parkinson's disease through acoustic voice analysis with mel-frequency cepstral coefficients. International Conference on Advanced Technologies for Signal and Image Processing. (May. 2017), 1-6, DOI=https://doi.org/10.1109/ATSIP.2017.8075567.Google ScholarCross Ref
- Vapnik, V.N. 1999. An overview of statistical learning theory. IEEE Trans Neural Netw. 10, 5 (Sep 1999), 988-999. DOI= https://doi.org/10.1109/72.788640.Google ScholarDigital Library
- David, V. 2003. Advanced support vector machines and kernel methods. Neurocomputing. 55, 1-2 (Sep 2003), 5-20. DOI= https://doi.org/ 10.1016/S0925-2312(03)00373-4.Google ScholarCross Ref
- Bennett, K. and Campbell, C. 2000. Support vector machines: hype or hallelujah? Sigkdd Explor. 2, 2 (Dec 2000), 1-13. DOI= https://doi.org/ 10.1145/380995.380999.Google ScholarDigital Library
- Shen, X.H., Wan, R.C. and Zhang, X.Y. 2015. A Speaker Voice Recognition System of Improved Dynamic Characteristic Parameters. Computer simulation. 32, 4, 154-158.Google Scholar
Recommendations
Continuous Punjabi speech recognition model based on Kaldi ASR toolkit
In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted ...
A novel pre-processing technique of amplitude interpolation for enhancing the classification accuracy of Bengali phonemes
AbstractIn linguistics, phonemes are the atomic sound, called word segmentor play an important role to recognize the word properly. A novel approach of seven Bengali vowels and ten diphthongs (a syllable for the pronunciation of two consecutive vowels) ...
Voice Gender Recognition Using Acoustic Features, MFCCs and SVM
Computational Science and Its Applications – ICCSA 2022AbstractThis paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in ...
Comments