Abstract
The speech recognition from voice signals has up till now been the most needed component of computer-aided services (CAS). Several techniques, including various classification and speech analysis methods, have been employed in Automatic speech recognition (ASR) to extract information from the signals. We introduce a novel method for automatically identifying speakers and their response tones from continuous speech through this research. The famous “The Assamese language,” which is combined with other cultures from Northeast India, is employed in the construction of the speech corpus that is used in experiments. Controlling “Noise” combined with speakers’ voices is a crucial issue in contact centres, and we have brought forward one GMM approach in this research. The earliest estimates of the speech and noise power spectra were obtained by forming sets of the system of equations from the GMM mean vectors. The noise cause category is resolved, and the input SNR is assessed in this first stage, followed from the first estimation of the noise power spectrum, the chosen noise model is finally constructed. The Wiener filter was then applied to the improved estimate to dampen background noise and improve noisy speech. Using the assumption that the uncertainty parameters were known, we then assessed the classification of synthetic data and enhanced speech data. Finally, clustering is done one Gaussian data to identify three clusters for the high, medium, and low tone voice for proper identification of three common emotional stats of humane.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ang J, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and fustration in human–computer dialog. In: Proceedings of the international conference on spoken language processing (ICSLP2002). Denver, Colorado
Polzin TS, Waibel A (2000) Emotion-sensitive human–computer interfaces. In: Proceedings of the ISCA workshop on speech and emotion, Belfast, Northern Ireland in 2000
Fulmare NS, Chakrabarti P, Yadav D (2013) Understanding and estimation of emotional expression using acoustic analysis of natural speech. Int J Nat Language Comput 2(4):37–46
Zheng F, Li LT, Zhang H (2016) Voiceprint recognition technology and its application status. Inf Secur Res 2(1):44–57
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Reich W, Chou W (2000) Robust decision tree state tying for continuous speech recognition. IEEE Trans Speech Audio Proc 8(5)
Taheri A, Tarihiet MR et al (2005) Fuzzy hidden markov models for speech recognition on based algorithm. Trans Eng Comput Technol V4. ISSN: 1305-5313
Tran D, Wagner M, Zheng T (2000) A fuzzy approach to statistical models in speech and speaker recognition. In: Proceedings of international conference on fuzzy systems, pp 22–25
Chien J-T (2003) Linear regression based bayesian predictive classification for speech recognition. IEEE Trans Speech Audio Proc 11(1)
Wessel F, Ney H (2005) Unsupervised training of acoustic models for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Proc 13(1)
Lui X et al (2007) A study of variable parameter Gaussian mixture HMM modeling fro Noisy speech recognition. IEEE Trans Audio, Speech Lang Proc 15(1)
Hu GR, Wei XD (2000) End point detection of noisy speech based on cepstrum feature. J Electron 28(10):95–97
Almajai BM, Darch J (2006) Analysis of correlation between audio and visual speech features forc lean audio feature prediction in noise. In: Proceeding of ICSLP
Afify M, Siohan O (2004) Sequential estimation with optimal forgetting for robust speech recognition. IEEE Trans Speech Audio Proc 12(1)
Li XK, Zheng YL, Yuan N et al (2018) Research on voiceprint recognition method based on deep learning. J Eng Heilongjiang Univ 9(1):64–70
Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proceedings of the conference on artificial neural networks in engineering, pp 7–10
Atassi H, Esposito A (2008) A speaker independent approach to the classification of emotional vocal expressions. In: Proceeding of 20th international conference tools with artificial intelligence, ICTAI 2008. IEEE Computer Society, Dayton, Ohio, USA, pp 147–151
Skowronski M, Harris J (2003) Improving the filter bank of a classic speech feature extraction algorithm. In: IEEE international symposium on circuits and system, Bangok, pp 281–284
O’Shaughnessy D (2000) Speech communications: human and machine 2nd edn. IEEE Press, New York
Siddiqi MH, Ali R, Rana MS, Hong E-K, Kim ES, Lee S (2014) Video-based human activity recognition using multilevel wavelet decomposition and step wise linear discriminant analysis. Sensors 14(4):6370–6392
Juang BH (1998) The past, present, and future of speech processing. In: IEEE signal processing magazine
Rabiner L, Juang B, Levinson S, Sondhi M (1986)Recent developments in the application of hidden Markov models to speaker-independent isolated word recognition. Proc IEEE Trans ASSP 34(1):52–59
Kim D-S (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Proc 7(1)
Garau G, Rebaks S (2008) Combining spectral representation for large vocabulary continuous speech recognition. IEEE Trans Audio, Speech Lang Proc 16(1)
Fraser KC, Meltzer JA, Rudzicz F (2016) Linguistic features identify Alzheimer’s disease in narrative speech. J Alzheimer’s Disease 49(2):407–422
Balagopalan A, Novikova J, Rudzicz F, Ghassemi M (2018) The effect of heterogeneous data for alzheimer’s disease detection from speech. In: Proceedings of the machine learning for health (ML4H) workshop at Neur IPS 2018
Ma JL, Jing XX, Yang HY (2015) Application of principal component analysis and K-means clustering in Speaker recognition. Comput Appl 35(s1):127–129
Fergani B, Davy M, Houacine A (2008) Speaker diarizationusing one-class support vector machines. Speech Commun 50(5):355–365
Delacourt P, Wellekens CJ (2000) DISTBIC: a speaker-based segmentation for audio data indexing. SpeechCommun 32(1–2):111–126
HannunC, Case JC (2014) Deep speech: scaling up end-to-end speech recognition. Comput Sci 17:1–12
Di WU, Zhao H, Huang C et al (2014) Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter. J Sig Proc Syst 39(4):392–399
Bora DJ, Gupta AK (2014) A comparative study between fuzzy clustering algorithm and hard clustering algorithm. Int J Comput Trends Technol 10(4):108–113
Xumin SNL, Yong G (2010) Research on K-Means clustering algorithm: an improved k-means clustering algorithm. In: Proceedings of 3rd international symposium on intelligent information technology and security informatics, pp 1–5
Ramathilagam S, Devi R, Kannan SR (2012) Extended fuzzy C-Means: an analyzing data clustering problems. Cluster Comput 16(3):389–406
Gopal A et al (2021) Automated recognition of Hindi word audio clips for Indian children using clustering based filters and binary classifier. In: ICNLSP
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Devi, M., Sarma, M.K., Talukdar, J. (2023). Speech Recognition Via Machine Learning in Recording Studio. In: Singh, S.N., Mahanta, S., Singh, Y.J. (eds) Proceedings of the NIELIT's International Conference on Communication, Electronics and Digital Technology. NICE-DT 2023. Lecture Notes in Networks and Systems, vol 676. Springer, Singapore. https://doi.org/10.1007/978-981-99-1699-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-1699-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1698-6
Online ISBN: 978-981-99-1699-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)