Unsupervised latent speaker language modeling

Tam, Yik-Cheung; Vozila, Paul

doi:10.21437/Interspeech.2011-261

Unsupervised latent speaker language modeling

Yik-Cheung Tam, Paul Vozila

In commercial speech applications, millions of speech utterances from the field are collected from millions of users, creating a challenge to best leverage the user data to enhance speech recognition performance. Motivated by an intuition that similar users may produce similar utterances, we propose a latent speaker model for unsupervised language modeling. Inspired by latent semantic analysis (LSA), an unsupervised method to extract latent topics from document corpora, we view the accumulated unsupervised text from a user as a document in the corpora. We employ latent Dirichlet-Tree allocation, a tree-based LSA, to organize the latent speakers in a tree hierarchy in an unsupervised fashion. During speaker adaptation, a new speaker model is adapted via a linear interpolation of the latent speaker models. On an in-house evaluation, the proposed method reduces the word error rates by 1.4% compared to a well-tuned baseline with speaker-independent and speaker-dependent adaptation. Compared to a competitive document clustering approach based on the exchange algorithm, our model yields slightly better recognition performance.

doi: 10.21437/Interspeech.2011-261

Cite as: Tam, Y.-C., Vozila, P. (2011) Unsupervised latent speaker language modeling. Proc. Interspeech 2011, 1477-1480, doi: 10.21437/Interspeech.2011-261

@inproceedings{tam11_interspeech,
  author={Yik-Cheung Tam and Paul Vozila},
  title={{Unsupervised latent speaker language modeling}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={1477--1480},
  doi={10.21437/Interspeech.2011-261},
  issn={2308-457X}
}