Restructuring output layers of deep neural networks using minimum risk parameter clustering

Kubo, Yotaro; Suzuki, Jun; Hori, Takaaki; Nakamura, Atsushi

doi:10.21437/Interspeech.2014-276

Restructuring output layers of deep neural networks using minimum risk parameter clustering

Yotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura

This paper attempts to optimize a topology of hidden Markov models (HMMs) for automatic speech recognition. Current state-of-the-art acoustic models for ASR involve HMMs with deep neural network (DNN)-based emission density functions. Even though DNN parameters are typically trained by optimizing a discriminative criterion, topology optimization of HMMs is usually performed by optimizing a generative criterion. Several approaches have been studied to achieve a discriminative state clustering, these approaches typically assume underlying Gaussian distributions of the acoustic features, and do not compatible with DNN-based emission density functions. In this paper, we attempt to derive a discriminative restructuring method of an HMM topology by introducing discriminative optimization with discrete constraints on the parameters, which force the parameters to be tied with the parameters of the other states. By applying this constrained optimization to the clustering of parameters of DNN-based acoustic models, we derived a discriminative HMM restructuring method that maintains discriminative performance of the original HMMs with the large number of states.

doi: 10.21437/Interspeech.2014-276

Cite as: Kubo, Y., Suzuki, J., Hori, T., Nakamura, A. (2014) Restructuring output layers of deep neural networks using minimum risk parameter clustering. Proc. Interspeech 2014, 1068-1072, doi: 10.21437/Interspeech.2014-276

@inproceedings{kubo14_interspeech,
  author={Yotaro Kubo and Jun Suzuki and Takaaki Hori and Atsushi Nakamura},
  title={{Restructuring output layers of deep neural networks using minimum risk parameter clustering}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1068--1072},
  doi={10.21437/Interspeech.2014-276}
}