Skip to main content

Robust Feature Extraction and Recognition Model for Automatic Speech Recognition System on News Report Dataset

  • Conference paper
  • First Online:
Information and Communication Technology for Competitive Strategies (ICTCS 2021)

Abstract

Information processing has become ubiquitous. The process of deriving speech from transcription is known as automatic speech recognition systems. In recent days, most of the real-time applications such as home computer systems, mobile telephones, and various public and private telephony services have been deployed with automatic speech recognition (ASR) systems. Inspired by commercial speech recognition technologies, the study on automatic speech recognition (ASR) systems has developed an immense interest among the researchers. This paper is an enhancement of convolution neural networks (CNNs) via a robust feature extraction model and intelligent recognition systems. First, the news report dataset is collected from a public repository. The collected dataset is subjective to different noises that are preprocessed by min–max normalization. The normalization technique linearly transforms the data into an understandable form. Then, the best sequence of words, corresponding to the audio based on the acoustic and language model, undergoes feature extraction using Mel-frequency Cepstral Coefficients (MFCCs). The transformed features are then fed into convolutional neural networks. Hidden layers perform limited iterations to get robust recognition systems. Experimental results have proved better accuracy of 96.17% than existing ANN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network-based language model. In: Proceedings of the 11th annual conference of the international speech communication association, 2010, pp 1045–1048

    Google Scholar 

  2. Mikolov T, Zweig G (2012) Context-dependent recurrent neural network language model. In: Proceedings of the IEEE workshop spoken language technology, 2012, pp 234–239

    Google Scholar 

  3. Alumäe T (2013) Multi-domain neural network language model. In: Proceedings of the 14th annual conference of the international speech communication association, 2013, pp 2182–2186

    Google Scholar 

  4. Chen X, Wang Y, Liu X, Gales MJ, Woodland PC (2014) Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. In: Proceedings of the 11th annual conference of the international speech communication association, 2014, pp 641–645

    Google Scholar 

  5. Chen X et al (2015) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. In: Proceedings of the 6th annual conference of the international speech communication association, 2015, pp 3511–3515

    Google Scholar 

  6. Hawley MS, Cunningham SP, Green PD, Enderby P, Palmer R, Sehgal S, O'Neill P (2013) A voice-input voice-output communication aid for people with severe speech impairment. IEEE Trans Neural Syst Rehabil Eng 21(1):23–31

    Google Scholar 

  7. Shao Y, Chang CH (2011) Bayesian separation with sparsity promotion in perceptual wavelet domain for speech enhancement and hybrid speech recognition. IEEE Trans Syst Man Cybern 41(2)

    Google Scholar 

  8. Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC (2017) Silent speech recognition as an alternative communication device for persons with laryngectomy. IEEE/ACM Trans Audio Speech Lang Process 25(12):2386–2398

    Article  Google Scholar 

  9. Wu B, Li K, Ge F, Huang Z, Yang M, Siniscalchi SM (2017) An end-to-end deep learning approach to simultaneous speech dereverberation and acoustic modelling for robust speech recognition. IEEE J Sel Top Sign Process 11(8):1–11

    Google Scholar 

  10. Baby D, Virtanen T, Gemmeke JF, Van Hamme H (2015) Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23(11):1788–1799

    Article  Google Scholar 

  11. Chandrakala S, Rajeswari N (2016) Representation learning-based speech assistive system for persons with dysarthria. IEEE Trans Neural Syst Rehabil Eng 1–12

    Google Scholar 

  12. Shahamiri SR, Salim SSB (2014) A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks. EEE Trans Neural Syst Rehabil Eng 22(5):1053–1063

    Article  Google Scholar 

  13. Mitra V, Nam H, Espy-Wilson CY, Saltzman E, Goldstein L (2011) Articulatory information for noise robust speech recognition. IEEE Trans Audio Speech Lang Process 19(7):1913–1924

    Google Scholar 

  14. Sahraeian R, Van Compernolle D (2017) Cross-lingual and multilingual speech recognition based on the speech manifold. IEEE/ACM IEEE Trans Audio Speech Lang Process 25(12):2301–2312

    Google Scholar 

  15. Grozdić ĐT, Jovičić ST (2017) Whispered speech recognition using deep denoising autoencoder and inverse filtering. IEEE/ACM Trans Audio Speech Lang Process 25(12):2313–2322

    Google Scholar 

  16. Ming J, Crookes D (2017) Speech enhancement based on full-sentence correlation and clean speech recognition. IEEE/ACM Trans Audio Speech Lang Process 25(12):1–13

    Google Scholar 

  17. Lee J, Park S, Hong I, Yoo H-J (2016) An energy-efficient speech extraction processor for robust user speech recognition in mobile head-mounted display systems. IEEE Trans Circ Syst 64(4):1–13

    Google Scholar 

  18. Siniscalchi SM, Yu D, Deng L, Lee CH (2013) Speech recognition using long-span temporal patterns in a deep network model. IEEE Sign Process Lett 20(3):1–12

    Google Scholar 

  19. Bapat OA, Fastow RM, Olson J (2013) Acoustic coprocessor for HMM-based embedded speech recognition systems. IEEE Trans Consum Electron 59(3):629–633

    Article  Google Scholar 

  20. Kim M, Kim Y, Yoo J, Wang J (2017) Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans Neural Syst Rehabil Eng 25(9):1–14

    Article  Google Scholar 

  21. Pan S-T, Li X-Y (2012) An FPGA-based embedded robust speech recognition system designed by combining empirical mode decomposition and a genetic algorithm. IEEE Trans Instrum Meas 61(9):2560–2572

    Article  Google Scholar 

  22. Hermansky H (2013) Multistream recognition of speech: dealing with unknown unknowns. IEEE Proc 101(5):1076–1088

    Article  Google Scholar 

  23. Zhang Y, Li P, Jin Y, Choe Y (2015) A digital liquid state machine with biologically inspired learning and its application to speech recognition. IEEE Trans Neural Netw Learn Syst 26(11):2635–2649

    Article  MathSciNet  Google Scholar 

  24. Mowlaee P, Saeidi R, Christensen MG, Tan Z-H (2012) A joint approach for single-channel speaker identification and speech separation. IEEE Trans Audio Speech Lang Process 20(9):2586–2601

    Article  Google Scholar 

  25. Petkov PN, Henter GE, Bastiaan Kleijn W (2013) Maximizing phoneme recognition accuracy for enhanced speech intelligibility in noise. IEEE Trans Audio Speech Lang Process 21(5):1035–1045

    Article  Google Scholar 

  26. Reale MJ, Liu P, Yin L (2013) Art critic: multi signal vision and speech interaction system in a gaming context. IEEE Trans Cybern 43(6):1546–1559

    Google Scholar 

  27. Tao F, Busso C (2017) Gating neural network for large vocabulary audiovisual speech recognition. IEEE Trans Audio Speech Lang Process 21(5):1–12

    Google Scholar 

  28. Deena S, Hasan M, Doulaty M, Saz O, Hain T (2019) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment. IEEE/ACM Trans Audio Speech Lang Process 27(3):572–578

    Article  Google Scholar 

  29. Abdelaziz AH (2018) Comparing fusion models for DNN-based audiovisual continuous speech recognition. IEEE Trans Audio Speech Lang Process 26(3):475–484

    Google Scholar 

  30. Yoon WJ, Park KS (2011) Building robust emotion recognition system on heterogeneous speech databases. IEEE Proc 57(2):747–750

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunanda Mendiratta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mendiratta, S., Turk, N., Bansal, D. (2023). Robust Feature Extraction and Recognition Model for Automatic Speech Recognition System on News Report Dataset. In: Joshi, A., Mahmud, M., Ragel, R.G. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2021). Lecture Notes in Networks and Systems, vol 400. Springer, Singapore. https://doi.org/10.1007/978-981-19-0095-2_56

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-0095-2_56

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-0094-5

  • Online ISBN: 978-981-19-0095-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics