Improving Children’s Speech Recognition by HMM Interpolation with an Adults’ Speech Recognizer

Steidl, Stefan; Stemmer, Georg; Hacker, Christian; Nöth, Elmar; Niemann, Heinrich

doi:10.1007/978-3-540-45243-0_76

Stefan Steidl⁶,
Georg Stemmer⁶,
Christian Hacker⁶,
Elmar Nöth⁶ &
…
Heinrich Niemann⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2781))

Included in the following conference series:

Joint Pattern Recognition Symposium

2843 Accesses
6 Citations

Abstract

In this paper we address the problem of building a good speech recognizer if there is only a small amount of training data available. The acoustic models can be improved by interpolation with the well-trained models of a second recognizer from a different application scenario. In our case, we interpolate a children’s speech recognizer with a recognizer for adults’ speech. Each hidden Markov model has its own set of interpolation partners; experiments were conducted with up to 50 partners. The interpolation weights are estimated automatically on a validation set using the EM algorithm. The word accuracy of the children’s speech recognizer could be improved from 74.6 % to 81.5 %. This is a relative improvement of almost 10 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jelinek, F., Mercer, R.L.: Interpolated Estimation of Markov Source Parameters from Sparse Data. In: Gelsema, E.S., Kanal, L.N. (eds.) Pattern Recognition in Practice, pp. 381–397. North Holland Publishing Co., Amsterdam (1980)
Google Scholar
Linder, M., Grissemann, H.: Zürcher Lesetest. 6th edn. Testzentrale Göttingen, Robert-Bosch-Breite 25, 37079 Göttingen (2000), http://www.testzentrale.de
Livescu, K.: Analysis and Modeling of Non–Native Speech for Automatic Speech Recognition. Master Thesis, Massachusetts Institute of Technology (1999)
Google Scholar
Mayfield Tomokiyo, L.: Recognizing Non–Native Speech: Characterizing and Adapting to Non–Native Usage in LVCSR. PhD Thesis, Carnegie Mellon University (2001)
Google Scholar
SAMPA – Computer Readable Phonetic Alphabet, http://www.phon.ucl.ac.uk/home/sampa/home.htm
Schukat-Talamazzini, E.G.: Automatische Spracherkennung – Grundlagen, statistische Modelle und effiziente Algorithmen. Vieweg (1995)
Google Scholar
Steidl, S.: Interpolation von Hidden Markov Modellen. Diploma Thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg (2002) (in German)
Google Scholar
Wahlster, W.: Verbmobil: Foundations of Speech-to-Speech Translation, p. 56. Springer, Heidelberg (2000)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Mustererkennung, Universität Erlangen-Nürnberg, Martensstraße 3, D-91058, Erlangen, Germany
Stefan Steidl, Georg Stemmer, Christian Hacker, Elmar Nöth & Heinrich Niemann

Authors

Stefan Steidl
View author publications
You can also search for this author in PubMed Google Scholar
Georg Stemmer
View author publications
You can also search for this author in PubMed Google Scholar
Christian Hacker
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar
Heinrich Niemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Otto-von-Guericke University, 39106, Magdeburg, Germany
Bernd Michaelis
Institute for Electronics, Signal Processing and Communications (IESK), Otto-von-Guericke-University Magdeburg, P.O. Box 4120, D-39016, Magdeburg, Germany
Gerald Krell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Steidl, S., Stemmer, G., Hacker, C., Nöth, E., Niemann, H. (2003). Improving Children’s Speech Recognition by HMM Interpolation with an Adults’ Speech Recognizer. In: Michaelis, B., Krell, G. (eds) Pattern Recognition. DAGM 2003. Lecture Notes in Computer Science, vol 2781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45243-0_76

Download citation

DOI: https://doi.org/10.1007/978-3-540-45243-0_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40861-1
Online ISBN: 978-3-540-45243-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics