Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera

Cardinal, Patrick; Ali, Ahmed; Dehak, Najim; Zhang, Yu; Hanai, Tuka Al; Zhang, Yifan; Glass, James R.; Vogel, Stephan

doi:10.21437/Interspeech.2014-474

Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera

Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James R. Glass, Stephan Vogel

This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86%, and a broadcast conversations with a best WER of 29.85%. The overall WER on this test set is 25.6%.

doi: 10.21437/Interspeech.2014-474

Cite as: Cardinal, P., Ali, A., Dehak, N., Zhang, Y., Hanai, T.A., Zhang, Y., Glass, J.R., Vogel, S. (2014) Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera. Proc. Interspeech 2014, 2088-2092, doi: 10.21437/Interspeech.2014-474

@inproceedings{cardinal14_interspeech,
  author={Patrick Cardinal and Ahmed Ali and Najim Dehak and Yu Zhang and Tuka Al Hanai and Yifan Zhang and James R. Glass and Stephan Vogel},
  title={{Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2088--2092},
  doi={10.21437/Interspeech.2014-474}
}