“Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication

Grisha Bandodkar; Shyam Agarwal; Athul Krishna Sughosh; Sahilbir Singh; Taeyeong Choi

doi:10.1609/aaai.v38i21.30381

“Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication

Authors

Grisha Bandodkar University of California, Davis
Shyam Agarwal University of California, Davis
Athul Krishna Sughosh University of California, Davis
Sahilbir Singh University of California, Davis
Taeyeong Choi Kennesaw State University

DOI:

https://doi.org/10.1609/aaai.v38i21.30381

Keywords:

Deep Learning, Machine Learning, Automatic Speech Recognition, Audio And Speech Processing, Wav2vec 2.0, Sound, Computation And Language, Data Augmentation, Accented Speech

Abstract

The proliferation of Automatic Speech Recognition (ASR) systems has revolutionized translation and transcription. However, challenges persist in ensuring inclusive communication for non-native English speakers. This study quantifies the gap between accented and native English speech using Wav2Vec 2.0, a state-of-the-art transformer model. Notably, we found that accented speech exhibits significantly higher word error rates of 30-50%, in contrast to native speakers’ 2-8% (Baevski et al. 2020). Our exploration extends to leveraging accessible online datasets to highlight the potential of enhancing speech recognition by fine-tuning the Wav2Vec 2.0 model. Through experimentation and analysis, we highlight the challenges with training models on accented speech. By refining models and addressing data quality issues, our work presents a pipeline for future investigations aimed at developing an integrated system capable of effectively engaging with a broader range of individuals with diverse backgrounds. Accurate recognition of accented speech is a pivotal step toward democratizing AI-driven communication products.

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

Downloads

Published

2024-03-24

How to Cite

Bandodkar, G., Agarwal, S., Sughosh, A. K., Singh, S., & Choi, T. (2024). “Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23327-23334. https://doi.org/10.1609/aaai.v38i21.30381

Download Citation

Issue

Vol. 38 No. 21: IAAI-24, EAAI-24, AAAI-24 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

EAAI: Mentored Undergraduate Research Challenge: AI for Accessibility in Comm

“Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription