ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Improved katz smoothing for language modeling in speech recogniton

Genqing Wu, Fang Zheng, Wenhu Wu, Mingxing Xu, Ling Jin

In this paper, a new method is proposed to improve the canonical Katz back-off smoothing technique in language modeling. The process of Katz smoothing is detailedly analyzed and the global discounting parameters are selected for discounting. Further more, a modified version of the formula for discounting parameters is proposed, in which the discounting parameters are determined by not only the occurring counts of the n-gram units but also the low-order history frequencies. This modification makes the smoothing more reasonable for those n-gram units that have homophonic (same in pronunciation) histories. The new method is tested on a Chinese Pinyin-to-character (where Pinyin is the pronunciation string) conversion system and the results show that the improved method can achieve a surprising reduction both in perplexity and Chinese character error rate.


doi: 10.21437/ICSLP.2002-309

Cite as: Wu, G., Zheng, F., Wu, W., Xu, M., Jin, L. (2002) Improved katz smoothing for language modeling in speech recogniton. Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), 925-928, doi: 10.21437/ICSLP.2002-309

@inproceedings{wu02d_icslp,
  author={Genqing Wu and Fang Zheng and Wenhu Wu and Mingxing Xu and Ling Jin},
  title={{Improved katz smoothing for language modeling in speech recogniton}},
  year=2002,
  booktitle={Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002)},
  pages={925--928},
  doi={10.21437/ICSLP.2002-309}
}