Robust Feature Extraction and Recognition Model for Automatic Speech Recognition System on News Report Dataset

Mendiratta, Sunanda; Turk, Neelam; Bansal, Dipali

doi:10.1007/978-981-19-0095-2_56

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 400))

516 Accesses

Abstract

Information processing has become ubiquitous. The process of deriving speech from transcription is known as automatic speech recognition systems. In recent days, most of the real-time applications such as home computer systems, mobile telephones, and various public and private telephony services have been deployed with automatic speech recognition (ASR) systems. Inspired by commercial speech recognition technologies, the study on automatic speech recognition (ASR) systems has developed an immense interest among the researchers. This paper is an enhancement of convolution neural networks (CNNs) via a robust feature extraction model and intelligent recognition systems. First, the news report dataset is collected from a public repository. The collected dataset is subjective to different noises that are preprocessed by min–max normalization. The normalization technique linearly transforms the data into an understandable form. Then, the best sequence of words, corresponding to the audio based on the acoustic and language model, undergoes feature extraction using Mel-frequency Cepstral Coefficients (MFCCs). The transformed features are then fed into convolutional neural networks. Hidden layers perform limited iterations to get robust recognition systems. Experimental results have proved better accuracy of 96.17% than existing ANN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network-based language model. In: Proceedings of the 11th annual conference of the international speech communication association, 2010, pp 1045–1048
Google Scholar
Mikolov T, Zweig G (2012) Context-dependent recurrent neural network language model. In: Proceedings of the IEEE workshop spoken language technology, 2012, pp 234–239
Google Scholar
Alumäe T (2013) Multi-domain neural network language model. In: Proceedings of the 14th annual conference of the international speech communication association, 2013, pp 2182–2186
Google Scholar
Chen X, Wang Y, Liu X, Gales MJ, Woodland PC (2014) Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. In: Proceedings of the 11th annual conference of the international speech communication association, 2014, pp 641–645
Google Scholar
Chen X et al (2015) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. In: Proceedings of the 6th annual conference of the international speech communication association, 2015, pp 3511–3515
Google Scholar
Hawley MS, Cunningham SP, Green PD, Enderby P, Palmer R, Sehgal S, O'Neill P (2013) A voice-input voice-output communication aid for people with severe speech impairment. IEEE Trans Neural Syst Rehabil Eng 21(1):23–31
Google Scholar
Shao Y, Chang CH (2011) Bayesian separation with sparsity promotion in perceptual wavelet domain for speech enhancement and hybrid speech recognition. IEEE Trans Syst Man Cybern 41(2)
Google Scholar
Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC (2017) Silent speech recognition as an alternative communication device for persons with laryngectomy. IEEE/ACM Trans Audio Speech Lang Process 25(12):2386–2398
Article Google Scholar
Wu B, Li K, Ge F, Huang Z, Yang M, Siniscalchi SM (2017) An end-to-end deep learning approach to simultaneous speech dereverberation and acoustic modelling for robust speech recognition. IEEE J Sel Top Sign Process 11(8):1–11
Google Scholar
Baby D, Virtanen T, Gemmeke JF, Van Hamme H (2015) Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23(11):1788–1799
Article Google Scholar
Chandrakala S, Rajeswari N (2016) Representation learning-based speech assistive system for persons with dysarthria. IEEE Trans Neural Syst Rehabil Eng 1–12
Google Scholar
Shahamiri SR, Salim SSB (2014) A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks. EEE Trans Neural Syst Rehabil Eng 22(5):1053–1063
Article Google Scholar
Mitra V, Nam H, Espy-Wilson CY, Saltzman E, Goldstein L (2011) Articulatory information for noise robust speech recognition. IEEE Trans Audio Speech Lang Process 19(7):1913–1924
Google Scholar
Sahraeian R, Van Compernolle D (2017) Cross-lingual and multilingual speech recognition based on the speech manifold. IEEE/ACM IEEE Trans Audio Speech Lang Process 25(12):2301–2312
Google Scholar
Grozdić ĐT, Jovičić ST (2017) Whispered speech recognition using deep denoising autoencoder and inverse filtering. IEEE/ACM Trans Audio Speech Lang Process 25(12):2313–2322
Google Scholar
Ming J, Crookes D (2017) Speech enhancement based on full-sentence correlation and clean speech recognition. IEEE/ACM Trans Audio Speech Lang Process 25(12):1–13
Google Scholar
Lee J, Park S, Hong I, Yoo H-J (2016) An energy-efficient speech extraction processor for robust user speech recognition in mobile head-mounted display systems. IEEE Trans Circ Syst 64(4):1–13
Google Scholar
Siniscalchi SM, Yu D, Deng L, Lee CH (2013) Speech recognition using long-span temporal patterns in a deep network model. IEEE Sign Process Lett 20(3):1–12
Google Scholar
Bapat OA, Fastow RM, Olson J (2013) Acoustic coprocessor for HMM-based embedded speech recognition systems. IEEE Trans Consum Electron 59(3):629–633
Article Google Scholar
Kim M, Kim Y, Yoo J, Wang J (2017) Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans Neural Syst Rehabil Eng 25(9):1–14
Article Google Scholar
Pan S-T, Li X-Y (2012) An FPGA-based embedded robust speech recognition system designed by combining empirical mode decomposition and a genetic algorithm. IEEE Trans Instrum Meas 61(9):2560–2572
Article Google Scholar
Hermansky H (2013) Multistream recognition of speech: dealing with unknown unknowns. IEEE Proc 101(5):1076–1088
Article Google Scholar
Zhang Y, Li P, Jin Y, Choe Y (2015) A digital liquid state machine with biologically inspired learning and its application to speech recognition. IEEE Trans Neural Netw Learn Syst 26(11):2635–2649
Article MathSciNet Google Scholar
Mowlaee P, Saeidi R, Christensen MG, Tan Z-H (2012) A joint approach for single-channel speaker identification and speech separation. IEEE Trans Audio Speech Lang Process 20(9):2586–2601
Article Google Scholar
Petkov PN, Henter GE, Bastiaan Kleijn W (2013) Maximizing phoneme recognition accuracy for enhanced speech intelligibility in noise. IEEE Trans Audio Speech Lang Process 21(5):1035–1045
Article Google Scholar
Reale MJ, Liu P, Yin L (2013) Art critic: multi signal vision and speech interaction system in a gaming context. IEEE Trans Cybern 43(6):1546–1559
Google Scholar
Tao F, Busso C (2017) Gating neural network for large vocabulary audiovisual speech recognition. IEEE Trans Audio Speech Lang Process 21(5):1–12
Google Scholar
Deena S, Hasan M, Doulaty M, Saz O, Hain T (2019) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment. IEEE/ACM Trans Audio Speech Lang Process 27(3):572–578
Article Google Scholar
Abdelaziz AH (2018) Comparing fusion models for DNN-based audiovisual continuous speech recognition. IEEE Trans Audio Speech Lang Process 26(3):475–484
Google Scholar
Yoon WJ, Park KS (2011) Building robust emotion recognition system on heterogeneous speech databases. IEEE Proc 57(2):747–750
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics Engineering, JCBoseUST, Faridabad, India
Sunanda Mendiratta & Neelam Turk
GB Pant Okhla Campus 1, Delhi Skill & Entrepreneurship University, Delhi, India
Dipali Bansal

Authors

Sunanda Mendiratta
View author publications
You can also search for this author in PubMed Google Scholar
Neelam Turk
View author publications
You can also search for this author in PubMed Google Scholar
Dipali Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunanda Mendiratta .

Editor information

Editors and Affiliations

Global Knowledge Research Foundation, Ahmedabad, India
Amit Joshi
Nottingham Trent University, Nottingham, UK
Mufti Mahmud
University of Peradeniya, Kandy, Sri Lanka
Roshan G. Ragel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mendiratta, S., Turk, N., Bansal, D. (2023). Robust Feature Extraction and Recognition Model for Automatic Speech Recognition System on News Report Dataset. In: Joshi, A., Mahmud, M., Ragel, R.G. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2021). Lecture Notes in Networks and Systems, vol 400. Springer, Singapore. https://doi.org/10.1007/978-981-19-0095-2_56

Download citation

DOI: https://doi.org/10.1007/978-981-19-0095-2_56
Published: 23 June 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0094-5
Online ISBN: 978-981-19-0095-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics