We examine the performance of three different unsupervised language model adaptation schemes applied to speech recognition of spontaneous speech lecture presentations. Each adaptation scheme is based on a combination of word n-gram and class n-gram models and uses an initial transcription hypothesis to adapt the class model. The adapted class model is linearly interpolated with the baseline word n-gram model and the combination is then applied in a subsequent recognition step. One scheme also contains an element of domain adaptation in which the transcription hypothesis is also used to determine the interpolation weights of several class models each of which is built on automatically derived clusters of presentations. We also investigate multi-pass adaptation for each scheme and show this gives additional improvements in performance. Relative improvements in word error rate of up to 12.3% (2.9% absolute) are obtained on a held-out test set with the best adaptation scheme.
Cite as: Lussier, L., Whittaker, E.W.D., Furui, S. (2004) Unsupervised language model adaptation methods for spontaneous speech. Proc. Interspeech 2004, 1981-1984, doi: 10.21437/Interspeech.2004-609
@inproceedings{lussier04_interspeech, author={Luc Lussier and Edward W.D. Whittaker and Sadaoki Furui}, title={{Unsupervised language model adaptation methods for spontaneous speech}}, year=2004, booktitle={Proc. Interspeech 2004}, pages={1981--1984}, doi={10.21437/Interspeech.2004-609} }