Automatically deriving categories for translation

Barrachina, Sergio; Vilar, Juan Miguel

doi:10.21437/Eurospeech.1999-530

Automatically deriving categories for translation

Sergio Barrachina, Juan Miguel Vilar

An adequate approach to speech translation for small to medium sized tasks is the use of subsequential trans-ducers - a finite state model - as language model for a speech recognizer. These transducers can be automatically trained from sample corpora. The use of manually defined categories improves the training of the subsequential transducers when the available data are scarce. These categories depend on the source and target languages we want to translate. We introduce an automatic approach to derive categories that can be used in training subsequential transducers. This approach extends monolingual word clustering methods to the bilingual case using alignments obtained from statistical models. Experimental results indicate that the models trained with these categories have lower translation errors.

doi: 10.21437/Eurospeech.1999-530

Cite as: Barrachina, S., Vilar, J.M. (1999) Automatically deriving categories for translation. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2415-2418, doi: 10.21437/Eurospeech.1999-530

@inproceedings{barrachina99_eurospeech,
  author={Sergio Barrachina and Juan Miguel Vilar},
  title={{Automatically deriving categories for translation}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={2415--2418},
  doi={10.21437/Eurospeech.1999-530}
}