Deep bottleneck network based i-vector representation for language identification

Song, Yan; Hong, Xinhai; Jiang, Bing; Cui, Ruilian; McLoughlin, Ian; Dai, Li-Rong

doi:10.21437/Interspeech.2015-163

Deep bottleneck network based i-vector representation for language identification

Yan Song, Xinhai Hong, Bing Jiang, Ruilian Cui, Ian McLoughlin, Li-Rong Dai

This paper presents a unified i-vector framework for language identification (LID) based on deep bottleneck networks (DBN) trained for automatic speech recognition (ASR). The framework covers both front-end feature extraction and back-end modeling stages.The output from different layers of a DBN are exploited to improve the effectiveness of the i-vector representation through incorporating a mixture of acoustic and phonetic information. Furthermore, a universal model is derived from the DBN with a LID corpus. This is a somewhat inverse process to the GMM-UBM method, in which the GMM of each language is mapped from a GMM-UBM. Evaluations on specific dialect recognition tasks show that the DBN based i-vector can achieve significant and consistent performance gains over conventional GMM-UBM and DNN based i-vector methods [1][2]. The generalization capability of this framework is also evaluated using DBNs trained on Mandarin and English corpuses.

doi: 10.21437/Interspeech.2015-163

Cite as: Song, Y., Hong, X., Jiang, B., Cui, R., McLoughlin, I., Dai, L.-R. (2015) Deep bottleneck network based i-vector representation for language identification. Proc. Interspeech 2015, 398-402, doi: 10.21437/Interspeech.2015-163

@inproceedings{song15_interspeech,
  author={Yan Song and Xinhai Hong and Bing Jiang and Ruilian Cui and Ian McLoughlin and Li-Rong Dai},
  title={{Deep bottleneck network based i-vector representation for language identification}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={398--402},
  doi={10.21437/Interspeech.2015-163}
}