ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

Long short-term memory networks for noise robust speech recognition

Martin Wöllmer, Yang Sun, Florian Eyben, Björn Schuller

In this paper we introduce a novel hybrid model architecture for speech recognition and investigate its noise robustness on the Aurora 2 database. Our model is composed of a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net exploiting long-range context information for phoneme prediction and a Dynamic Bayesian Network (DBN) for decoding. The DBN is able to learn pronunciation variants as well as typical phoneme confusions of the BLSTM predictor in order to compensate signal disturbances. Unlike conventional Hidden Markov Model (HMM) systems, the proposed architecture is not based on Gaussian mixture modeling. Even without any feature enhancement, our BLSTM-DBN system outperforms a baseline HMM recognizer by up to 18%.


doi: 10.21437/Interspeech.2010-30

Cite as: Wöllmer, M., Sun, Y., Eyben, F., Schuller, B. (2010) Long short-term memory networks for noise robust speech recognition. Proc. Interspeech 2010, 2966-2969, doi: 10.21437/Interspeech.2010-30

@inproceedings{wollmer10_interspeech,
  author={Martin Wöllmer and Yang Sun and Florian Eyben and Björn Schuller},
  title={{Long short-term memory networks for noise robust speech recognition}},
  year=2010,
  booktitle={Proc. Interspeech 2010},
  pages={2966--2969},
  doi={10.21437/Interspeech.2010-30}
}