Skip to main content

Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature

  • Conference paper
  • First Online:
Machine Intelligence for Research and Innovations (MAiTRI 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 832))

  • 26 Accesses

Abstract

The study introduces an enhanced method for improving the accuracy and performance of End-to-End Automatic Speech Recognition (ASR) systems. This involves combining Gammatone Frequency Cepstral Coefficient (GTCC) and Mel Frequency Cepstral Coefficient (MFCC) features with a hybrid CNN-BiGRU model. MFCC and GTCC features capture temporal and spectral aspects of speech, while the hybrid architecture enables effective local and global context modelling. The proposed approach is evaluated using a low-resource Gujarati multi-person speech dataset, incorporating clean and noisy conditions via added white noise. Results demonstrate a 4.6% reduction in Word Error Rate (WER) for clean speech and a significant 7.83% reduction in WER for noisy speech, compared to baseline MFCC with greedy decoding. This method exhibits potential for enhancing ASR systems, making them more reliable and accurate for real-world applications necessitating precise speech-to-text conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Deshmukh AM (2020) Comparison of hidden markov model and recurrent neural network in automatic speech recognition. Eur J Eng Technol Res 5(8):958–965

    Google Scholar 

  2. Billa J (2018) ISI ASR system for the low resource speech recognition challenge for Indian languages. Interspeech

    Google Scholar 

  3. Gaudani H, Patel NM (2022) Comparative study of robust feature extraction techniques for ASR for limited resource Hindi language. In: Proceedings of second international conference on sustainable expert systems (ICSES 2021). Springer Nature, Singapore

    Google Scholar 

  4. Lakshminarayanan V (2022) Impact of noise in automatic speech recognition for low-resourced languages. Rochester Institute of Technology

    Google Scholar 

  5. Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10:2301–2314

    Google Scholar 

  6. Dua M, Aggarwal RK, Biswas M (2018) Discriminative training using noise robust integrated features and refined HMM modeling. J Intell Syst 29(1):327–344

    Google Scholar 

  7. Graves A et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning

    Google Scholar 

  8. Bourlard HA, Morgan N (1994) Connectionist speech recognition: a hybrid approach, vol 247. Springer Science & Business Media

    Google Scholar 

  9. Maji B, Swain M, Panda R (2022) A feature selection based parallelized CNN-BiGRU network for speech emotion recognition in Odia language

    Google Scholar 

  10. Dubey P, Shah B (2022) Deep speech based end-to-end automated speech recognition (asr) for indian-english accents. Preprint at arXiv:2204.00977

  11. Anoop CS, Ramakrishnan AG (2021) CTC-based end-to-end ASR for the low resource Sanskrit language with spectrogram augmentation. In: 2021 National conference on communications (NCC). IEEE

    Google Scholar 

  12. Joshi B et al (2022) A novel deep learning based Nepali speech recognition. In: International conference on electrical and electronics engineering. Springer, Singapore

    Google Scholar 

  13. Ephrat A et al (2018) Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. Preprint at arXiv:1804.03619

  14. Bhogale K et al (2023) Effectiveness of mining audio and text pairs from public data for improving ASR systems for low-resource languages. In: ICASSP 2023-2023 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE

    Google Scholar 

  15. Diwan A et al (2021) Multilingual and code-switching ASR challenges for low resource Indian languages. Preprint at arXiv:2104.00235

  16. Raval D et al (2021) Improving deep learning based automatic speech recognition for Gujarati. Trans Asian Low-Resour Lang Inf Process 21(3):1–18

    Google Scholar 

  17. Diwan A, Jyothi P (2020) Reduce and reconstruct: ASR for low-resource phonetic languages. Preprint at arXiv:2010.09322

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhavesh Bhagat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhagat, B., Dua, M. (2024). Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature. In: Verma, O.P., Wang, L., Kumar, R., Yadav, A. (eds) Machine Intelligence for Research and Innovations. MAiTRI 2023. Lecture Notes in Networks and Systems, vol 832. Springer, Singapore. https://doi.org/10.1007/978-981-99-8129-8_19

Download citation

Publish with us

Policies and ethics