Large-scale random forest language models for speech recognition

Su, Yi; Jelinek, Frederick; Khudanpur, Sanjeev

doi:10.21437/Interspeech.2007-259

Large-scale random forest language models for speech recognition

Yi Su, Frederick Jelinek, Sanjeev Khudanpur

The random forest language model (RFLM) has shown encouraging results in several automatic speech recognition (ASR) tasks but has been hindered by practical limitations, notably the space-complexity of RFLM estimation from large amounts of data. This paper addresses large-scale training and testing of the RFLM via an efficient disk-swapping strategy that exploits the recursive structure of a binary decision tree and the local access property of the tree-growing algorithm, redeeming the full potential of the RFLM, and opening avenues of further research, including useful comparisons with n-gram models. Benefits of this strategy are demonstrated by perplexity reduction and lattice rescoring experiments using a state-of-the-art ASR system.

doi: 10.21437/Interspeech.2007-259

Cite as: Su, Y., Jelinek, F., Khudanpur, S. (2007) Large-scale random forest language models for speech recognition. Proc. Interspeech 2007, 598-601, doi: 10.21437/Interspeech.2007-259

@inproceedings{su07_interspeech,
  author={Yi Su and Frederick Jelinek and Sanjeev Khudanpur},
  title={{Large-scale random forest language models for speech recognition}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={598--601},
  doi={10.21437/Interspeech.2007-259}
}