A weighted combination of speech with text-based models for Arabic diacritization

Azim, Aisha S.; Wang, Xiaoxuan; Chai, Sim Khe

doi:10.21437/Interspeech.2012-612

A weighted combination of speech with text-based models for Arabic diacritization

Aisha S. Azim, Xiaoxuan Wang, Sim Khe Chai

The majority of studies on Arabic diacritization have employed textually inferred features alone. This paper proposes a novel approach, where the weighted combination of speech with a text-based model is used to allow linguistically-insensitive acoustic information to correct and complement the errors generated by the text model's diacritic predictions. The acoustic model is based on Hidden Markov Models and the textual model on Conditional Random Fields. The combination brings significant reduction in error rates across all metrics, especially in case endings, which are the most difficult to predict. The results in this paper are the most accurate reported to date, with diacritic and word error rates of 1.5 and 4.9 inclusive of case endings, and 1.0 and 2.7 exclusive of them.

Index Terms: Arabic diacritization, case endings, multimodal systems

doi: 10.21437/Interspeech.2012-612

Cite as: Azim, A.S., Wang, X., Chai, S.K. (2012) A weighted combination of speech with text-based models for Arabic diacritization. Proc. Interspeech 2012, 2334-2337, doi: 10.21437/Interspeech.2012-612

@inproceedings{azim12_interspeech,
  author={Aisha S. Azim and Xiaoxuan Wang and Sim Khe Chai},
  title={{A weighted combination of speech with text-based models for Arabic diacritization}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={2334--2337},
  doi={10.21437/Interspeech.2012-612}
}