ASR Error Detection via Audio-Transcript entailment

Meripo, Nimshi Venkat; Konam, Sandeep

doi:10.21437/Interspeech.2022-11177

ASR Error Detection via Audio-Transcript entailment

Nimshi Venkat Meripo, Sandeep Konam

Despite improved performances of the latest Automatic Speech Recognition (ASR) systems, transcription errors are still unavoidable. These errors can have a considerable impact in critical domains such as healthcare, when used to help with clinical documentation. Therefore detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment. To the best of our knowledge, we are the first to frame this problem as an end-to-end entailment task between the audio segment and its corresponding transcript segment. Our intuition is that there should be a bidirectional entailment between audio and transcript when there is no recognition error and vice versa. The proposed model utilizes an acoustic encoder and a linguistic encoder to model the speech and transcript respectively. The encoded representations of both modalities are fused to predict the entailment. Since doctor-patient conversations are used in our experiments, a particular emphasis is placed on medical terms. Our proposed model achieves classification error rates (CER) of 26.2% on all transcription errors and 23% on medical errors specifically, leading to improvements upon baseline by 12% and 15.4%, respectively.

doi: 10.21437/Interspeech.2022-11177

Cite as: Meripo, N.V., Konam, S. (2022) ASR Error Detection via Audio-Transcript entailment. Proc. Interspeech 2022, 3358-3362, doi: 10.21437/Interspeech.2022-11177

@inproceedings{meripo22_interspeech,
  author={Nimshi Venkat Meripo and Sandeep Konam},
  title={{ASR Error Detection via Audio-Transcript entailment}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={3358--3362},
  doi={10.21437/Interspeech.2022-11177}
}