ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Improving language-universal feature extraction with deep maxout and convolutional neural networks

Yajie Miao, Florian Metze

When deployed in automated speech recognition (ASR), deep neural networks (DNNs) can be treated as a complex feature extractor plus a simple linear classifier. Previous work has investigated the utility of multilingual DNNs acting as language-universal feature extractors (LUFEs). In this paper, we explore different strategies to further improve LUFEs. First, we replace the standard sigmoid nonlinearity with the recently proposed maxout units. The resulting maxout LUFEs have the nice property of generating sparse feature representations. Second, the convolutional neural network (CNN) architecture is applied to obtain more invariant feature space. We evaluate the performance of LUFEs on a cross-language ASR task. Each of the proposed techniques results in word error rate reduction compared with the existing DNN-based LUFEs. Combining the two methods together brings additional improvement on the target language.


doi: 10.21437/Interspeech.2014-205

Cite as: Miao, Y., Metze, F. (2014) Improving language-universal feature extraction with deep maxout and convolutional neural networks. Proc. Interspeech 2014, 800-804, doi: 10.21437/Interspeech.2014-205

@inproceedings{miao14_interspeech,
  author={Yajie Miao and Florian Metze},
  title={{Improving language-universal feature extraction with deep maxout and convolutional neural networks}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={800--804},
  doi={10.21437/Interspeech.2014-205}
}