In commercial speech applications, millions of speech utterances from the field are collected from millions of users, creating a challenge to best leverage the user data to enhance speech recognition performance. Motivated by an intuition that similar users may produce similar utterances, we propose a latent speaker model for unsupervised language modeling. Inspired by latent semantic analysis (LSA), an unsupervised method to extract latent topics from document corpora, we view the accumulated unsupervised text from a user as a document in the corpora. We employ latent Dirichlet-Tree allocation, a tree-based LSA, to organize the latent speakers in a tree hierarchy in an unsupervised fashion. During speaker adaptation, a new speaker model is adapted via a linear interpolation of the latent speaker models. On an in-house evaluation, the proposed method reduces the word error rates by 1.4% compared to a well-tuned baseline with speaker-independent and speaker-dependent adaptation. Compared to a competitive document clustering approach based on the exchange algorithm, our model yields slightly better recognition performance.
Cite as: Tam, Y.-C., Vozila, P. (2011) Unsupervised latent speaker language modeling. Proc. Interspeech 2011, 1477-1480, doi: 10.21437/Interspeech.2011-261
@inproceedings{tam11_interspeech, author={Yik-Cheung Tam and Paul Vozila}, title={{Unsupervised latent speaker language modeling}}, year=2011, booktitle={Proc. Interspeech 2011}, pages={1477--1480}, doi={10.21437/Interspeech.2011-261}, issn={2308-457X} }