Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach for embedded platforms is to either train a small DNN directly, or to train a small DNN that learns the output distribution of a large DNN. In this paper, we utilize a state-of-the-art RNN to transfer knowledge to small DNN. We use the RNN model to generate soft alignments and minimize the Kullback-Leibler divergence against the small DNN. The small DNN trained on the soft RNN alignments achieved a 3.9 WER on the Wall Street Journal (WSJ) eval92 task compared to a baseline 4.6 WER or more than 13% relative improvement.
Cite as: Chan, W., Ke, N.R., Lane, I. (2015) Transferring knowledge from a RNN to a DNN. Proc. Interspeech 2015, 3264-3268, doi: 10.21437/Interspeech.2015-657
@inproceedings{chan15_interspeech, author={William Chan and Nan Rosemary Ke and Ian Lane}, title={{Transferring knowledge from a RNN to a DNN}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={3264--3268}, doi={10.21437/Interspeech.2015-657} }