Abstract
Information processing has become ubiquitous. The process of deriving speech from transcription is known as automatic speech recognition systems. In recent days, most of the real-time applications such as home computer systems, mobile telephones, and various public and private telephony services have been deployed with automatic speech recognition (ASR) systems. Inspired by commercial speech recognition technologies, the study on automatic speech recognition (ASR) systems has developed an immense interest among the researchers. This paper is an enhancement of convolution neural networks (CNNs) via a robust feature extraction model and intelligent recognition systems. First, the news report dataset is collected from a public repository. The collected dataset is subjective to different noises that are preprocessed by min–max normalization. The normalization technique linearly transforms the data into an understandable form. Then, the best sequence of words, corresponding to the audio based on the acoustic and language model, undergoes feature extraction using Mel-frequency Cepstral Coefficients (MFCCs). The transformed features are then fed into convolutional neural networks. Hidden layers perform limited iterations to get robust recognition systems. Experimental results have proved better accuracy of 96.17% than existing ANN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network-based language model. In: Proceedings of the 11th annual conference of the international speech communication association, 2010, pp 1045–1048
Mikolov T, Zweig G (2012) Context-dependent recurrent neural network language model. In: Proceedings of the IEEE workshop spoken language technology, 2012, pp 234–239
Alumäe T (2013) Multi-domain neural network language model. In: Proceedings of the 14th annual conference of the international speech communication association, 2013, pp 2182–2186
Chen X, Wang Y, Liu X, Gales MJ, Woodland PC (2014) Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. In: Proceedings of the 11th annual conference of the international speech communication association, 2014, pp 641–645
Chen X et al (2015) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. In: Proceedings of the 6th annual conference of the international speech communication association, 2015, pp 3511–3515
Hawley MS, Cunningham SP, Green PD, Enderby P, Palmer R, Sehgal S, O'Neill P (2013) A voice-input voice-output communication aid for people with severe speech impairment. IEEE Trans Neural Syst Rehabil Eng 21(1):23–31
Shao Y, Chang CH (2011) Bayesian separation with sparsity promotion in perceptual wavelet domain for speech enhancement and hybrid speech recognition. IEEE Trans Syst Man Cybern 41(2)
Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC (2017) Silent speech recognition as an alternative communication device for persons with laryngectomy. IEEE/ACM Trans Audio Speech Lang Process 25(12):2386–2398
Wu B, Li K, Ge F, Huang Z, Yang M, Siniscalchi SM (2017) An end-to-end deep learning approach to simultaneous speech dereverberation and acoustic modelling for robust speech recognition. IEEE J Sel Top Sign Process 11(8):1–11
Baby D, Virtanen T, Gemmeke JF, Van Hamme H (2015) Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23(11):1788–1799
Chandrakala S, Rajeswari N (2016) Representation learning-based speech assistive system for persons with dysarthria. IEEE Trans Neural Syst Rehabil Eng 1–12
Shahamiri SR, Salim SSB (2014) A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks. EEE Trans Neural Syst Rehabil Eng 22(5):1053–1063
Mitra V, Nam H, Espy-Wilson CY, Saltzman E, Goldstein L (2011) Articulatory information for noise robust speech recognition. IEEE Trans Audio Speech Lang Process 19(7):1913–1924
Sahraeian R, Van Compernolle D (2017) Cross-lingual and multilingual speech recognition based on the speech manifold. IEEE/ACM IEEE Trans Audio Speech Lang Process 25(12):2301–2312
Grozdić ĐT, Jovičić ST (2017) Whispered speech recognition using deep denoising autoencoder and inverse filtering. IEEE/ACM Trans Audio Speech Lang Process 25(12):2313–2322
Ming J, Crookes D (2017) Speech enhancement based on full-sentence correlation and clean speech recognition. IEEE/ACM Trans Audio Speech Lang Process 25(12):1–13
Lee J, Park S, Hong I, Yoo H-J (2016) An energy-efficient speech extraction processor for robust user speech recognition in mobile head-mounted display systems. IEEE Trans Circ Syst 64(4):1–13
Siniscalchi SM, Yu D, Deng L, Lee CH (2013) Speech recognition using long-span temporal patterns in a deep network model. IEEE Sign Process Lett 20(3):1–12
Bapat OA, Fastow RM, Olson J (2013) Acoustic coprocessor for HMM-based embedded speech recognition systems. IEEE Trans Consum Electron 59(3):629–633
Kim M, Kim Y, Yoo J, Wang J (2017) Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans Neural Syst Rehabil Eng 25(9):1–14
Pan S-T, Li X-Y (2012) An FPGA-based embedded robust speech recognition system designed by combining empirical mode decomposition and a genetic algorithm. IEEE Trans Instrum Meas 61(9):2560–2572
Hermansky H (2013) Multistream recognition of speech: dealing with unknown unknowns. IEEE Proc 101(5):1076–1088
Zhang Y, Li P, Jin Y, Choe Y (2015) A digital liquid state machine with biologically inspired learning and its application to speech recognition. IEEE Trans Neural Netw Learn Syst 26(11):2635–2649
Mowlaee P, Saeidi R, Christensen MG, Tan Z-H (2012) A joint approach for single-channel speaker identification and speech separation. IEEE Trans Audio Speech Lang Process 20(9):2586–2601
Petkov PN, Henter GE, Bastiaan Kleijn W (2013) Maximizing phoneme recognition accuracy for enhanced speech intelligibility in noise. IEEE Trans Audio Speech Lang Process 21(5):1035–1045
Reale MJ, Liu P, Yin L (2013) Art critic: multi signal vision and speech interaction system in a gaming context. IEEE Trans Cybern 43(6):1546–1559
Tao F, Busso C (2017) Gating neural network for large vocabulary audiovisual speech recognition. IEEE Trans Audio Speech Lang Process 21(5):1–12
Deena S, Hasan M, Doulaty M, Saz O, Hain T (2019) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment. IEEE/ACM Trans Audio Speech Lang Process 27(3):572–578
Abdelaziz AH (2018) Comparing fusion models for DNN-based audiovisual continuous speech recognition. IEEE Trans Audio Speech Lang Process 26(3):475–484
Yoon WJ, Park KS (2011) Building robust emotion recognition system on heterogeneous speech databases. IEEE Proc 57(2):747–750
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mendiratta, S., Turk, N., Bansal, D. (2023). Robust Feature Extraction and Recognition Model for Automatic Speech Recognition System on News Report Dataset. In: Joshi, A., Mahmud, M., Ragel, R.G. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2021). Lecture Notes in Networks and Systems, vol 400. Springer, Singapore. https://doi.org/10.1007/978-981-19-0095-2_56
Download citation
DOI: https://doi.org/10.1007/978-981-19-0095-2_56
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0094-5
Online ISBN: 978-981-19-0095-2
eBook Packages: EngineeringEngineering (R0)