Robust Parallel Speech Recognition in Multiple Energy Bands

Maier, Andreas; Hacker, Christian; Steidl, Stefan; Nöth, Elmar; Niemann, Heinrich

doi:10.1007/11550518_17

Andreas Maier¹⁹,
Christian Hacker¹⁹,
Stefan Steidl¹⁹,
Elmar Nöth¹⁹ &
…
Heinrich Niemann¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3663))

Included in the following conference series:

Joint Pattern Recognition Symposium

1878 Accesses
5 Citations

Abstract

In this paper we will investigate the performance of TRAP-features on clean and noisy data. Multiple feature sets are evaluated on a corpus which was recorded in clean and noisy environment. In addition, the clean version was reverberated artificially. The feature sets are assembled from selected energy bands. In this manner multiple recognizers are trained using different energy bands. The outputs of all recognizers are joined with ROVER in order to achieve a single recognition result. This system is compared to a baseline recognizer that uses Mel frequency cepstrum coefficients (MFCC). In this paper we will point out that the use of artificial reverberation leads to more robustness to noise in general. Furthermore most TRAP-based features excel in phone recognition. While MFCC features prove to be better in a matched training/test situation, TRAP-features clearly outperform them in a mismatched training/test situation: When we train on clean data and evaluate on noisy data the word accuracy (WA) can be raised by 173 % relative (from 12.0 % to 32.8 % WA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hermansky, H., Sharma, S.: TRAPs - Classifiers of Temporal Patterns. In: Proc. ICSLP 1998, Sydney, Australia, vol. 3, pp. 1003–1006 (1998)
Google Scholar
Fiscus, J.: A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction. In: Proc. IEEE ASRU Workshop, Santa Barbara, USA, pp. 347–352 (1997)
Google Scholar
Speech Recognition Scoring Toolkit (SCTK). NIST Spoken Language Technology Evaluation and Utility, http://www.nist.gov/speech/tools/ (last visited 28.03.2005)
Hermansky, H.: The Modulation Spectrum in Automatic Recognition of Speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, Santa Barbara, USA (1997)
Google Scholar
Greenberg, S., Kingsbury, B.E.: The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech. In: Proc. ICASSP 1997, Munich, Germany, pp. 1647–1650 (1997)
Google Scholar
Couvreur, L., Couvreur, C.: On the Use of Artificial Reverberation for ASR in Highly Reverberant Environments. In: Proc. of 2nd IEEE Benelux Signal Processing Symposium, Hilvaranbeek, The Netherlands (2000)
Google Scholar
Sony Europe. AIBO Europe - Official Website (2004), http://www.aibo-europe.com (last visited 19.12.2004)
Batliner, A., Hacker, C., Steidl, S., Nöth, E.: “You stupid tin box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus. In: Proc. of the 4th International Conference of Language Resources and Evaluation 2004, Lisbon, Portugal, pp. 171–174 (2004)
Google Scholar
Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung, Germany (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Mustererkennung, Universität Erlangen-Nürnberg, Germany
Andreas Maier, Christian Hacker, Stefan Steidl, Elmar Nöth & Heinrich Niemann

Authors

Andreas Maier
View author publications
You can also search for this author in PubMed Google Scholar
Christian Hacker
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Steidl
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar
Heinrich Niemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

PRIP, Vienna University of Technology, Austria
Walter G. Kropatsch
Vienna University of Technology, Vienna, Austria
Robert Sablatnig
Pattern Recognition and Image Processing Group, Institute of Computer-Aided Automation, Vienna University of Technology, Favoritenstraße 9/1832, A-1040, Vienna, Austria
Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maier, A., Hacker, C., Steidl, S., Nöth, E., Niemann, H. (2005). Robust Parallel Speech Recognition in Multiple Energy Bands. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds) Pattern Recognition. DAGM 2005. Lecture Notes in Computer Science, vol 3663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550518_17

Download citation

DOI: https://doi.org/10.1007/11550518_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28703-2
Online ISBN: 978-3-540-31942-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics