ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Reducing the OOV rate in broadcast news speech recognition

Thomas Kemp, Alex Waibel

To achieve the long-term goal of robust, real-time broadcast news transcription, several problems have to be overcome, e.g. the variety of acoustic conditions and the unlimited vocabulary. In this paper we address the problem of unlimited vocabulary. We show, that this problem is more serious for German than it is for English. Using a speech recognition system with a large vocabulary, we dynamically adapt the active vocabulary to the topic of the current news segment. This is done by using information retrieval (IR) techniques on a large collection of texts automatically gathered from the internet. The same technique is also used to adapt the language model of the recognition system. The process of vocabulary adaptation and language model retraining is completely unsupervised. We show, that dynamic vocabulary adaptation can significantly reduce the out-of-vocabulary (OOV) rate and the word error rate of our broadcast news transcription system View4You.


doi: 10.21437/ICSLP.1998-628

Cite as: Kemp, T., Waibel, A. (1998) Reducing the OOV rate in broadcast news speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0757, doi: 10.21437/ICSLP.1998-628

@inproceedings{kemp98_icslp,
  author={Thomas Kemp and Alex Waibel},
  title={{Reducing the OOV rate in broadcast news speech recognition}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0757},
  doi={10.21437/ICSLP.1998-628}
}