Skip to main content

Preliminaries

  • Chapter
  • First Online:
  • 2219 Accesses

Abstract

Robust automatic speech recognition (ASR) technologies have greatly evolved due to the emergence of deep learning. This chapter introduces the general background of robustness issues of deep neural-network-based ASR. It provides an overview of robust ASR research including a brief history of several studies before the deep learning era, basic formulations of ASR, signal processing, and neural networks. This chapter also introduces common notations for variables and equations, which are extended in the later chapters to deal with more advanced topics. Finally, the chapter provides an overview of the book structure by summarizing the contributions of the individual chapters and associates them with the different components of a robust ASR system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The WERs refer to the Kaldi AMI recipe, November 15, 2016. https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5b.

  2. 2.

    http://www.clsp.jhu.edu/workshops/15-workshop/.

  3. 3.

    However, these concepts have inspired related techniques for DNN-based acoustic models, such as DNN parameter regularization based on the L2 norm and Kullback–Leibler (KL) divergence, that can be regarded as a variant of MAP adaptation in the context of DNNs.

  4. 4.

    This problem is discussed in Chap. 13

References

  1. Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third “CHiME” speech separation and recognition challenge: dataset, task and baselines. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 504–511 (2015)

    Google Scholar 

  2. Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP’79, vol. 4, pp. 208–211. IEEE, New York (1979)

    Google Scholar 

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  4. Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)

    Article  Google Scholar 

  5. Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., et al.: The AMI meeting corpus: a pre-announcement. In: International Workshop on Machine Learning for Multimodal Interaction, pp. 28–39. Springer, Berlin (2005)

    Google Scholar 

  6. Deng, L., Droppo, J., Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans. Speech Audio Process. 13(3), 412–421 (2005)

    Article  Google Scholar 

  7. Digalakis, V.V., Rtischev, D., Neumeyer, L.G.: Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)

    Article  Google Scholar 

  8. Eide, E., Gish, H.: A parametric approach to vocal tract length normalization. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 96, vol. 1, pp. 346–348. IEEE, New York (1996)

    Google Scholar 

  9. ETSI: Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI ES 202, 050 (2002)

    Google Scholar 

  10. Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)

    Article  Google Scholar 

  11. Gales, M.J., Young, S.J.: Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)

    Article  Google Scholar 

  12. Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)

    Article  Google Scholar 

  13. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  14. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs, NJ (2001)

    Google Scholar 

  15. Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., Maas, R.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE, New York (2013)

    Google Scholar 

  16. Kolossa, D., Haeb-Umbach, R.: Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications. Springer Science & Business Media, Berlin (2011)

    Book  MATH  Google Scholar 

  17. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  18. Lee, K.F., Hon, H.W.: Large-vocabulary speaker-independent continuous speech recognition using HMM. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 88, pp. 123–126. IEEE, New York (1988)

    Google Scholar 

  19. Lee, C.H., Lin, C.H., Juang, B.H.: A study on speaker adaptation of the parameters of continuous density hidden Markov models. IEEE Trans. Signal Process. 39(4), 806–814 (1991)

    Article  Google Scholar 

  20. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)

    Article  Google Scholar 

  21. Li, J., Deng, L., Gong, Y., Haeb-Umbach, R.: An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 745–777 (2014)

    Article  Google Scholar 

  22. Moreno, P.J., Raj, B., Stern, R.M.: A vector Taylor series approach for environment-independent speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 96, vol. 2, pp. 733–736. IEEE, New York (1996)

    Google Scholar 

  23. Virtanen, T., Singh, R., Raj, B.: Techniques for Noise Robustness in Automatic Speech Recognition. Wiley, New York (2012)

    Book  Google Scholar 

  24. Watanabe, S., Chien, J.T.: Bayesian Speech and Language Processing. Cambridge University Press, Cambridge (2015)

    Book  MATH  Google Scholar 

  25. Yu, D., Deng, L.: Automatic Speech Recognition. Springer, Berlin (2012)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shinji Watanabe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Watanabe, S., Delcroix, M., Metze, F., Hershey, J.R. (2017). Preliminaries. In: Watanabe, S., Delcroix, M., Metze, F., Hershey, J. (eds) New Era for Robust Speech Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-64680-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64680-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64679-4

  • Online ISBN: 978-3-319-64680-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics