ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation

M.A. Tuğtekin Turan, Emmanuel Vincent, Denis Jouvet

Current automatic speech recognition (ASR) systems trained on native speech often perform poorly when applied to non-native or accented speech. In this work, we propose to compute x-vector-like accent embeddings and use them as auxiliary inputs to an acoustic model trained on native data only in order to improve the recognition of multi-accent data comprising native, non-native, and accented speech. In addition, we leverage untranscribed accented training data by means of semi-supervised learning. Our experiments show that acoustic models trained with the proposed accent embeddings outperform those trained with conventional i-vector or x-vector speaker embeddings, and achieve a 15% relative word error rate (WER) reduction on non-native and accented speech w.r.t. acoustic models trained with regular spectral features only. Semi-supervised training using just 1 hour of untranscribed speech per accent yields an additional 15% relative WER reduction w.r.t. models trained on native data only.


doi: 10.21437/Interspeech.2020-2742

Cite as: Turan, M.A.T., Vincent, E., Jouvet, D. (2020) Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation. Proc. Interspeech 2020, 1286-1290, doi: 10.21437/Interspeech.2020-2742

@inproceedings{turan20_interspeech,
  author={M.A. Tuğtekin Turan and Emmanuel Vincent and Denis Jouvet},
  title={{Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1286--1290},
  doi={10.21437/Interspeech.2020-2742}
}