Pronunciation and silence probability modeling for ASR

Chen, Guoguo; Xu, Hainan; Wu, Minhua; Povey, Daniel; Khudanpur, Sanjeev

doi:10.21437/Interspeech.2015-198

Pronunciation and silence probability modeling for ASR

Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey, Sanjeev Khudanpur

In this paper we evaluate the WER improvement from modeling pronunciation probabilities and word-specific silence probabilities in speech recognition. We do this in the context of Finite State Transducer (FST)-based decoding, where pronunciation and silence probabilities are encoded in the lexicon (L) transducer. We describe a novel way to model word-dependent silence probabilities, where in addition to modeling the probability of silence following each individual word, we also model the probability of each word appearing after silence. All of these probabilities are estimated from aligned training data, with suitable smoothing. We conduct our experiments on four commonly used automatic speech recognition datasets, namelyWall Street Journal, Switchboard, TED-LIUM, and Librispeech. The improvement from modeling pronunciation and silence probabilities is small but fairly consistent across datasets.

doi: 10.21437/Interspeech.2015-198

Cite as: Chen, G., Xu, H., Wu, M., Povey, D., Khudanpur, S. (2015) Pronunciation and silence probability modeling for ASR. Proc. Interspeech 2015, 533-537, doi: 10.21437/Interspeech.2015-198

@inproceedings{chen15g_interspeech,
  author={Guoguo Chen and Hainan Xu and Minhua Wu and Daniel Povey and Sanjeev Khudanpur},
  title={{Pronunciation and silence probability modeling for ASR}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={533--537},
  doi={10.21437/Interspeech.2015-198}
}