ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Language independent and unsupervised acoustic models for speech recognition and keyword spotting

Kate M. Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath

Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models (AM) are then trained. This work considers a particular scenario where the target language is unseen in multi-language training and has limited language model training data, a limited lexicon, and acoustic training data without transcriptions. A zero acoustic resources case is first described where a multi-language AM is directly applied, as a language independent AM (LIAM), to an unseen language. Secondly, in an unsupervised approach a LIAM is used to obtain hypotheses for the target language acoustic data transcriptions which are then used in training a language dependent AM. 3 languages from the IARPA Babel project are used for assessment: Vietnamese, Haitian Creole and Bengali. Performance of the zero acoustic resources system is found to be poor, with keyword spotting at best 60% of language dependent performance. Unsupervised language dependent training yields performance gains. For one language (Haitian Creole) the Babel target is achieved on the in-vocabulary data.


doi: 10.21437/Interspeech.2014-4

Cite as: Knill, K.M., Gales, M.J.F., Ragni, A., Rath, S.P. (2014) Language independent and unsupervised acoustic models for speech recognition and keyword spotting. Proc. Interspeech 2014, 16-20, doi: 10.21437/Interspeech.2014-4

@inproceedings{knill14_interspeech,
  author={Kate M. Knill and Mark J. F. Gales and Anton Ragni and Shakti P. Rath},
  title={{Language independent and unsupervised acoustic models for speech recognition and keyword spotting}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={16--20},
  doi={10.21437/Interspeech.2014-4}
}