Exploiting spatial-temporal feature distribution characteristics for robust speech recognition

Chen, Wei-Hau; Lin, Shih-Hsiang; Chen, Berlin

doi:10.21437/Interspeech.2008-319

Exploiting spatial-temporal feature distribution characteristics for robust speech recognition

Wei-Hau Chen, Shih-Hsiang Lin, Berlin Chen

Noise robustness is one of the primary challenges facing most automatic speech recognition (ASR) systems. Quite several speech feature histogram equalization (HEQ) methods have been developed to compensate for nonlinear noise distortions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without taking into consideration the contextual relationships between consecutive speech frames. In this paper, we present a novel HEQ approach that exploits spatial-temporal feature distribution characteristics for speech feature normalization. All experiments were carried out on the Aurora-2 database and task. The performance of the presented approach is tested and verified by comparison with the other HEQ methods. The experiment results show that for clean-condition training, our method yields a significant word error rate reduction over the baseline system, and also considerably outperforms the other HEQ methods compared in this paper.

doi: 10.21437/Interspeech.2008-319

Cite as: Chen, W.-H., Lin, S.-H., Chen, B. (2008) Exploiting spatial-temporal feature distribution characteristics for robust speech recognition. Proc. Interspeech 2008, 2004-2007, doi: 10.21437/Interspeech.2008-319

@inproceedings{chen08d_interspeech,
  author={Wei-Hau Chen and Shih-Hsiang Lin and Berlin Chen},
  title={{Exploiting spatial-temporal feature distribution characteristics for robust speech recognition}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2004--2007},
  doi={10.21437/Interspeech.2008-319}
}