Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study

Chen, Hongjie; Leung, Cheung-Chi; Xie, Lei; Ma, Bin; Li, Haizhou

doi:10.21437/Interspeech.2015-642

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study

Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li

We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic modeling and represent speech frames with Gaussian posteriorgrams. The model performs unsupervised clustering on untranscribed data, and each Gaussian component can be considered as a cluster of sounds from various speakers. The model infers its model complexity (i.e. the number of Gaussian components) from the data. For computation efficiency, we use a parallel sampler for the model inference. Our experiments are conducted on the corpus provided by the zero resource speech challenge. Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperform MFCC, and perform comparably to the posteriorgrams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test. The error rates can be further reduced by the fusion of these two kinds of posteriorgrams.

doi: 10.21437/Interspeech.2015-642

Cite as: Chen, H., Leung, C.-C., Xie, L., Ma, B., Li, H. (2015) Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study. Proc. Interspeech 2015, 3189-3193, doi: 10.21437/Interspeech.2015-642

@inproceedings{chen15n_interspeech,
  author={Hongjie Chen and Cheung-Chi Leung and Lei Xie and Bin Ma and Haizhou Li},
  title={{Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={3189--3193},
  doi={10.21437/Interspeech.2015-642}
}