Skip to main content

Robust Parallel Speech Recognition in Multiple Energy Bands

  • Conference paper
Pattern Recognition (DAGM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3663))

Included in the following conference series:

Abstract

In this paper we will investigate the performance of TRAP-features on clean and noisy data. Multiple feature sets are evaluated on a corpus which was recorded in clean and noisy environment. In addition, the clean version was reverberated artificially. The feature sets are assembled from selected energy bands. In this manner multiple recognizers are trained using different energy bands. The outputs of all recognizers are joined with ROVER in order to achieve a single recognition result. This system is compared to a baseline recognizer that uses Mel frequency cepstrum coefficients (MFCC). In this paper we will point out that the use of artificial reverberation leads to more robustness to noise in general. Furthermore most TRAP-based features excel in phone recognition. While MFCC features prove to be better in a matched training/test situation, TRAP-features clearly outperform them in a mismatched training/test situation: When we train on clean data and evaluate on noisy data the word accuracy (WA) can be raised by 173 % relative (from 12.0 % to 32.8 % WA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hermansky, H., Sharma, S.: TRAPs - Classifiers of Temporal Patterns. In: Proc. ICSLP 1998, Sydney, Australia, vol. 3, pp. 1003–1006 (1998)

    Google Scholar 

  2. Fiscus, J.: A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction. In: Proc. IEEE ASRU Workshop, Santa Barbara, USA, pp. 347–352 (1997)

    Google Scholar 

  3. Speech Recognition Scoring Toolkit (SCTK). NIST Spoken Language Technology Evaluation and Utility, http://www.nist.gov/speech/tools/ (last visited 28.03.2005)

  4. Hermansky, H.: The Modulation Spectrum in Automatic Recognition of Speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, Santa Barbara, USA (1997)

    Google Scholar 

  5. Greenberg, S., Kingsbury, B.E.: The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech. In: Proc. ICASSP 1997, Munich, Germany, pp. 1647–1650 (1997)

    Google Scholar 

  6. Couvreur, L., Couvreur, C.: On the Use of Artificial Reverberation for ASR in Highly Reverberant Environments. In: Proc. of 2nd IEEE Benelux Signal Processing Symposium, Hilvaranbeek, The Netherlands (2000)

    Google Scholar 

  7. Sony Europe. AIBO Europe - Official Website (2004), http://www.aibo-europe.com (last visited 19.12.2004)

  8. Batliner, A., Hacker, C., Steidl, S., Nöth, E.: “You stupid tin box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus. In: Proc. of the 4th International Conference of Language Resources and Evaluation 2004, Lisbon, Portugal, pp. 171–174 (2004)

    Google Scholar 

  9. Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Universität Erlangen-Nürnberg, Lehrstuhl für Mustererkennung, Germany (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maier, A., Hacker, C., Steidl, S., Nöth, E., Niemann, H. (2005). Robust Parallel Speech Recognition in Multiple Energy Bands. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds) Pattern Recognition. DAGM 2005. Lecture Notes in Computer Science, vol 3663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550518_17

Download citation

  • DOI: https://doi.org/10.1007/11550518_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28703-2

  • Online ISBN: 978-3-540-31942-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics