ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps

Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon

This paper proposes an automatic method to refine broadcast data collected every week for efficient acoustic model training. For training acoustic models, we use only audio signals, subtitle texts, and subtitle timestamps accompanied by recorded broadcast programs. However, the subtitle timestamps are often inaccurate due to inherent characteristics of closed captioning. In the proposed method, we remove subtitle texts with low subtitle quality index, concatenate adjacent subtitle texts into a merged subtitle text, and correct the timestamp of the merged subtitle text by adding a margin. Then, a speech recognizer is used to obtain a hypothesis text from the speech segment corresponding to the merged subtitle text. Finally, the refined speech segments to be used for acoustic model training, are generated by selecting the subparts of the merged subtitle text that matches the hypothesis text. It is shown that the acoustic models trained by using refined broadcast data give significantly higher speech recognition accuracy than those trained by using raw broadcast data. Consequently, the proposed method can efficiently refine a large amount of broadcast data with inaccurate timestamps taking about half of the time, compared with the previous approaches.


doi: 10.21437/Interspeech.2017-650

Cite as: Bang, J.-U., Choi, M.-Y., Kim, S.-H., Kwon, O.-W. (2017) Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps. Proc. Interspeech 2017, 2929-2933, doi: 10.21437/Interspeech.2017-650

@inproceedings{bang17_interspeech,
  author={Jeong-Uk Bang and Mu-Yeol Choi and Sang-Hun Kim and Oh-Wook Kwon},
  title={{Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps}},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={2929--2933},
  doi={10.21437/Interspeech.2017-650}
}