“Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication

Authors

  • Grisha Bandodkar University of California, Davis
  • Shyam Agarwal University of California, Davis
  • Athul Krishna Sughosh University of California, Davis
  • Sahilbir Singh University of California, Davis
  • Taeyeong Choi Kennesaw State University

DOI:

https://doi.org/10.1609/aaai.v38i21.30381

Keywords:

Deep Learning, Machine Learning, Automatic Speech Recognition, Audio And Speech Processing, Wav2vec 2.0, Sound, Computation And Language, Data Augmentation, Accented Speech

Abstract

The proliferation of Automatic Speech Recognition (ASR) systems has revolutionized translation and transcription. However, challenges persist in ensuring inclusive communication for non-native English speakers. This study quantifies the gap between accented and native English speech using Wav2Vec 2.0, a state-of-the-art transformer model. Notably, we found that accented speech exhibits significantly higher word error rates of 30-50%, in contrast to native speakers’ 2-8% (Baevski et al. 2020). Our exploration extends to leveraging accessible online datasets to highlight the potential of enhancing speech recognition by fine-tuning the Wav2Vec 2.0 model. Through experimentation and analysis, we highlight the challenges with training models on accented speech. By refining models and addressing data quality issues, our work presents a pipeline for future investigations aimed at developing an integrated system capable of effectively engaging with a broader range of individuals with diverse backgrounds. Accurate recognition of accented speech is a pivotal step toward democratizing AI-driven communication products.

Downloads

Published

2024-03-24

How to Cite

Bandodkar, G., Agarwal, S., Sughosh, A. K., Singh, S., & Choi, T. (2024). “Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23327-23334. https://doi.org/10.1609/aaai.v38i21.30381

Issue

Section

EAAI: Mentored Undergraduate Research Challenge: AI for Accessibility in Comm