ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language

Thomas Pellegrini, Lori Lamel

In this paper, a data-driven word decompounding algorithm is described and applied to a broadcast news corpus in Amharic. The baseline algorithm has been enhanced in order to address the problem of increased phonetic confusability arising from word decompounding by incorporating phonetic properties and some constraints on recognition units derived from prior forced alignment experiments. Speech recognition experiments have been carried out to validate the approach. Out of vocabulary (OOV) words rates can be reduced by 30% to 40% and an absolute Word Error Rate (WER) reduction of 0.4% has been achieved. The algorithm is relatively language independent and requires minimal adaptation to be applied to other languages.


doi: 10.21437/Interspeech.2007-502

Cite as: Pellegrini, T., Lamel, L. (2007) Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language. Proc. Interspeech 2007, 1797-1800, doi: 10.21437/Interspeech.2007-502

@inproceedings{pellegrini07_interspeech,
  author={Thomas Pellegrini and Lori Lamel},
  title={{Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={1797--1800},
  doi={10.21437/Interspeech.2007-502}
}