ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Using an autoencoder with deformable templates to discover features for automated speech recognition

Navdeep Jaitly, Geoffrey E. Hinton

In this paper we show how we can discover non-linear features of frames of spectrograms using a novel autoencoder. The autoencoder uses a neural network encoder that predicts how a set of prototypes called templates need to be transformed to reconstruct the data, and a decoder that is a function that performs this operation of transforming prototypes and reconstructing the input. We demonstrate this method on spectrograms from the TIMIT database. The features are used in a Deep Neural Network - Hidden Markov Model (DNN-HMM) hybrid system for automatic speech recognition. On the TIMIT monophone recognition task we were able to achieve gains of 0.5% over Mel log spectra, by augmenting traditional the spectra with the predicted transformation parameters. Further, using the recently discovered edropoutf training, we were able to achieve a phone error rate (PER) of 17.9% on the dev set and 19.5% on the test set, which, to our knowledge is the best reported number on this task using a hybrid system. Speaking Rate Normalization with Lattice-Based Context-Dependent Phoneme Duration Modeling for Personalized Speech Recognizers on Mobile Devices


doi: 10.21437/Interspeech.2013-432

Cite as: Jaitly, N., Hinton, G.E. (2013) Using an autoencoder with deformable templates to discover features for automated speech recognition. Proc. Interspeech 2013, 1737-1740, doi: 10.21437/Interspeech.2013-432

@inproceedings{jaitly13_interspeech,
  author={Navdeep Jaitly and Geoffrey E. Hinton},
  title={{Using an autoencoder with deformable templates to discover features for automated speech recognition}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={1737--1740},
  doi={10.21437/Interspeech.2013-432},
  issn={2308-457X}
}