Skip to main content

Abstract

The speech recognition from voice signals has up till now been the most needed component of computer-aided services (CAS). Several techniques, including various classification and speech analysis methods, have been employed in Automatic speech recognition (ASR) to extract information from the signals. We introduce a novel method for automatically identifying speakers and their response tones from continuous speech through this research. The famous “The Assamese language,” which is combined with other cultures from Northeast India, is employed in the construction of the speech corpus that is used in experiments. Controlling “Noise” combined with speakers’ voices is a crucial issue in contact centres, and we have brought forward one GMM approach in this research. The earliest estimates of the speech and noise power spectra were obtained by forming sets of the system of equations from the GMM mean vectors. The noise cause category is resolved, and the input SNR is assessed in this first stage, followed from the first estimation of the noise power spectrum, the chosen noise model is finally constructed. The Wiener filter was then applied to the improved estimate to dampen background noise and improve noisy speech. Using the assumption that the uncertainty parameters were known, we then assessed the classification of synthetic data and enhanced speech data. Finally, clustering is done one Gaussian data to identify three clusters for the high, medium, and low tone voice for proper identification of three common emotional stats of humane.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ang J, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and fustration in human–computer dialog. In: Proceedings of the international conference on spoken language processing (ICSLP2002). Denver, Colorado

    Google Scholar 

  2. Polzin TS, Waibel A (2000) Emotion-sensitive human–computer interfaces. In: Proceedings of the ISCA workshop on speech and emotion, Belfast, Northern Ireland in 2000

    Google Scholar 

  3. Fulmare NS, Chakrabarti P, Yadav D (2013) Understanding and estimation of emotional expression using acoustic analysis of natural speech. Int J Nat Language Comput 2(4):37–46

    Google Scholar 

  4. Zheng F, Li LT, Zhang H (2016) Voiceprint recognition technology and its application status. Inf Secur Res 2(1):44–57

    Google Scholar 

  5. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Google Scholar 

  6. Reich W, Chou W (2000) Robust decision tree state tying for continuous speech recognition. IEEE Trans Speech Audio Proc 8(5)

    Google Scholar 

  7. Taheri A, Tarihiet MR et al (2005) Fuzzy hidden markov models for speech recognition on based algorithm. Trans Eng Comput Technol V4. ISSN: 1305-5313

    Google Scholar 

  8. Tran D, Wagner M, Zheng T (2000) A fuzzy approach to statistical models in speech and speaker recognition. In: Proceedings of international conference on fuzzy systems, pp 22–25

    Google Scholar 

  9. Chien J-T (2003) Linear regression based bayesian predictive classification for speech recognition. IEEE Trans Speech Audio Proc 11(1)

    Google Scholar 

  10. Wessel F, Ney H (2005) Unsupervised training of acoustic models for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Proc 13(1)

    Google Scholar 

  11. Lui X et al (2007) A study of variable parameter Gaussian mixture HMM modeling fro Noisy speech recognition. IEEE Trans Audio, Speech Lang Proc 15(1)

    Google Scholar 

  12. Hu GR, Wei XD (2000) End point detection of noisy speech based on cepstrum feature. J Electron 28(10):95–97

    Google Scholar 

  13. Almajai BM, Darch J (2006) Analysis of correlation between audio and visual speech features forc lean audio feature prediction in noise. In: Proceeding of ICSLP

    Google Scholar 

  14. Afify M, Siohan O (2004) Sequential estimation with optimal forgetting for robust speech recognition. IEEE Trans Speech Audio Proc 12(1)

    Google Scholar 

  15. Li XK, Zheng YL, Yuan N et al (2018) Research on voiceprint recognition method based on deep learning. J Eng Heilongjiang Univ 9(1):64–70

    Google Scholar 

  16. Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proceedings of the conference on artificial neural networks in engineering, pp 7–10

    Google Scholar 

  17. Atassi H, Esposito A (2008) A speaker independent approach to the classification of emotional vocal expressions. In: Proceeding of 20th international conference tools with artificial intelligence, ICTAI 2008. IEEE Computer Society, Dayton, Ohio, USA, pp 147–151

    Google Scholar 

  18. Skowronski M, Harris J (2003) Improving the filter bank of a classic speech feature extraction algorithm. In: IEEE international symposium on circuits and system, Bangok, pp 281–284

    Google Scholar 

  19. O’Shaughnessy D (2000) Speech communications: human and machine 2nd edn. IEEE Press, New York

    Google Scholar 

  20. Siddiqi MH, Ali R, Rana MS, Hong E-K, Kim ES, Lee S (2014) Video-based human activity recognition using multilevel wavelet decomposition and step wise linear discriminant analysis. Sensors 14(4):6370–6392

    Google Scholar 

  21. Juang BH (1998) The past, present, and future of speech processing. In: IEEE signal processing magazine

    Google Scholar 

  22. Rabiner L, Juang B, Levinson S, Sondhi M (1986)Recent developments in the application of hidden Markov models to speaker-independent isolated word recognition. Proc IEEE Trans ASSP 34(1):52–59

    Google Scholar 

  23. Kim D-S (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Proc 7(1)

    Google Scholar 

  24. Garau G, Rebaks S (2008) Combining spectral representation for large vocabulary continuous speech recognition. IEEE Trans Audio, Speech Lang Proc 16(1)

    Google Scholar 

  25. Fraser KC, Meltzer JA, Rudzicz F (2016) Linguistic features identify Alzheimer’s disease in narrative speech. J Alzheimer’s Disease 49(2):407–422

    Google Scholar 

  26. Balagopalan A, Novikova J, Rudzicz F, Ghassemi M (2018) The effect of heterogeneous data for alzheimer’s disease detection from speech. In: Proceedings of the machine learning for health (ML4H) workshop at Neur IPS 2018

    Google Scholar 

  27. Ma JL, Jing XX, Yang HY (2015) Application of principal component analysis and K-means clustering in Speaker recognition. Comput Appl 35(s1):127–129

    Google Scholar 

  28. Fergani B, Davy M, Houacine A (2008) Speaker diarizationusing one-class support vector machines. Speech Commun 50(5):355–365

    Google Scholar 

  29. Delacourt P, Wellekens CJ (2000) DISTBIC: a speaker-based segmentation for audio data indexing. SpeechCommun 32(1–2):111–126

    Google Scholar 

  30. HannunC, Case JC (2014) Deep speech: scaling up end-to-end speech recognition. Comput Sci 17:1–12

    Google Scholar 

  31. Di WU, Zhao H, Huang C et al (2014) Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter. J Sig Proc Syst 39(4):392–399

    Google Scholar 

  32. Bora DJ, Gupta AK (2014) A comparative study between fuzzy clustering algorithm and hard clustering algorithm. Int J Comput Trends Technol 10(4):108–113

    Google Scholar 

  33. Xumin SNL, Yong G (2010) Research on K-Means clustering algorithm: an improved k-means clustering algorithm. In: Proceedings of 3rd international symposium on intelligent information technology and security informatics, pp 1–5

    Google Scholar 

  34. Ramathilagam S, Devi R, Kannan SR (2012) Extended fuzzy C-Means: an analyzing data clustering problems. Cluster Comput 16(3):389–406

    Google Scholar 

  35. Gopal A et al (2021) Automated recognition of Hindi word audio clips for Indian children using clustering based filters and binary classifier. In: ICNLSP

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mampi Devi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Devi, M., Sarma, M.K., Talukdar, J. (2023). Speech Recognition Via Machine Learning in Recording Studio. In: Singh, S.N., Mahanta, S., Singh, Y.J. (eds) Proceedings of the NIELIT's International Conference on Communication, Electronics and Digital Technology. NICE-DT 2023. Lecture Notes in Networks and Systems, vol 676. Springer, Singapore. https://doi.org/10.1007/978-981-99-1699-3_4

Download citation

Publish with us

Policies and ethics