The random forest language model (RFLM) has shown encouraging results in several automatic speech recognition (ASR) tasks but has been hindered by practical limitations, notably the space-complexity of RFLM estimation from large amounts of data. This paper addresses large-scale training and testing of the RFLM via an efficient disk-swapping strategy that exploits the recursive structure of a binary decision tree and the local access property of the tree-growing algorithm, redeeming the full potential of the RFLM, and opening avenues of further research, including useful comparisons with n-gram models. Benefits of this strategy are demonstrated by perplexity reduction and lattice rescoring experiments using a state-of-the-art ASR system.
Cite as: Su, Y., Jelinek, F., Khudanpur, S. (2007) Large-scale random forest language models for speech recognition. Proc. Interspeech 2007, 598-601, doi: 10.21437/Interspeech.2007-259
@inproceedings{su07_interspeech, author={Yi Su and Frederick Jelinek and Sanjeev Khudanpur}, title={{Large-scale random forest language models for speech recognition}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={598--601}, doi={10.21437/Interspeech.2007-259} }