ISCA Archive Eurospeech 2003
ISCA Archive Eurospeech 2003

Evaluation of the stochastic morphosyntactic language model on a one million word hungarian dictation task

Mate Szarvas, Sadaoki Furui

In this article we evaluate our stochastic morphosyntactic language model (SMLM) on a Hungarian newspaper dictation task that requires modeling over 1 million different word forms. The proposed method is based on the use of morphemes as the basic recognition units and the combination of a morpheme N-gram model and a morphosyntactic language model. The architecture of the recognition system is based on the weighted finite-state transducer (WFST) paradigm. Thanks to the flexible transducer-based architecture, the morphosyntactic component is integrated seamlessly with the basic modules with no need to modify the decoder itself. We compare the phoneme, morpheme, and word error-rates as well as the sizes of the recognition networks in two configurations. In one configuration we use only the N-gram model while in the other we use the combined model. The proposed stochastic morphosyntactic language model decreases the morpheme error rate by between 1.7 and 7.2% relatively when compared to the baseline trigram system. The morpheme error-rate of the best configuration is 18% and the best word error-rate is 22.3%.


doi: 10.21437/Eurospeech.2003-641

Cite as: Szarvas, M., Furui, S. (2003) Evaluation of the stochastic morphosyntactic language model on a one million word hungarian dictation task. Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), 2297-2300, doi: 10.21437/Eurospeech.2003-641

@inproceedings{szarvas03_eurospeech,
  author={Mate Szarvas and Sadaoki Furui},
  title={{Evaluation of the stochastic morphosyntactic language model on a one million word hungarian dictation task}},
  year=2003,
  booktitle={Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003)},
  pages={2297--2300},
  doi={10.21437/Eurospeech.2003-641}
}