ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

BiCAPT: Bidirectional Computer-Assisted Pronunciation Training with Normalizing Flows

Zhan Zhang, Yuehai Wang, Jianyi Yang

Computer-Assisted Pronunciation Training (CAPT) plays an important role in language learning. So far, most existing CAPT methods are discriminative and focus on detecting where the mispronunciation is. Although learners receive feedback about their current pronunciation, they may still not be able to learn the correct pronunciation. Nevertheless, there has been little discussion about speech-based teaching in CAPT. To fill this gap, we propose a novel bidirectional CAPT method to detect mispronunciations and generate the corrected pronunciations simultaneously. This correction-based feedback can better preserve the speaking style to make the learning process more personalized. In addition, we propose to adopt normalizing flows to share the latent for these two mirrored discriminative-generative tasks, making the whole model more compact. Experiments show that our method is efficient for mispronunciation detection and can naturally correct the speech under different CAPT granularity requirements.


doi: 10.21437/Interspeech.2022-878

Cite as: Zhang, Z., Wang, Y., Yang, J. (2022) BiCAPT: Bidirectional Computer-Assisted Pronunciation Training with Normalizing Flows. Proc. Interspeech 2022, 4332-4336, doi: 10.21437/Interspeech.2022-878

@inproceedings{zhang22q_interspeech,
  author={Zhan Zhang and Yuehai Wang and Jianyi Yang},
  title={{BiCAPT: Bidirectional Computer-Assisted Pronunciation Training with Normalizing Flows}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={4332--4336},
  doi={10.21437/Interspeech.2022-878}
}