Skip to main content

Hybrid HMM/ANN systems for speech recognition: Overview and new research directions

  • Chapter
  • First Online:
Adaptive Processing of Sequences and Data Structures (NN 1997)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1387))

Included in the following conference series:

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J.B., “How do humans process and recognize speech?,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, pp.567–577, 1994.

    Article  Google Scholar 

  2. Austin, S., Zavaliagkos, G., Makhoul, J., and Schwartz, J., “Improving state-of-the-art continuous speech recognition systems using the N-best paradigm with neural networks,” Proc. DARPA Speech and Natural Language Workshop (Harriman, NY), Morgan Kaufmann, pp. 180–184, Feb. 1992.

    Google Scholar 

  3. Baum, L., “An inequality and associated maximization techniques in statistical estimation of probabilistic functions of Markov processes,” Inequalities, no. 3, pp. 1–8, 1972.

    Google Scholar 

  4. Bengio, Y., De Mori, R., Flammia, G. and Kompe, R., “Global optimization of a neural network-Hidden Markov Model hybrid,” IEEE Trans. on Neural Networks, vol. 3, no. 2, pp. 252–259, 1992.

    Article  Google Scholar 

  5. Bilmes, J., Morgan, N., Wu, S., and Bourlard, H., “Stochastic perceptual speech models with durational dependence,” Intl. Conference on Spoken Language Processing, pp. 1301–1304, 1996.

    Google Scholar 

  6. Bourlard, H. and Morgan, N., Connectionist Speech Recognition — A Hybrid Approach, Kluwer Academic Publishers, 1994.

    Google Scholar 

  7. Bourlard, H., Konig, Y. and Morgan, N., “REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities in connectionist speech recognition”, Proc. EUROSPEECH'95 (Madrid, Spain), Sep. 1995.

    Google Scholar 

  8. Bourlard, H. and Dupont, S. (1996), “A new ASR approach based on independent processing and recombination of partial frequency bands,” Proc. of Intl. Conf. on Spoken Language Processing (ICSLP) (Philadelphia), pp. 426–429, Oct. 3–6, 1996.

    Google Scholar 

  9. Bridle, J.S., “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Soulié and J. Hérault (Eds.), NATO ASI Series, pp. 227–236, 1990.

    Google Scholar 

  10. Dupont, S. and Bourlard, H., “Using multiple time scales in a multi-stream speech recognition system,” to be published in Proc. EUROSPEECH'97 (Rhodes, Greece), Sep. 1997.

    Google Scholar 

  11. Furui, S., “Speaker independent isolated word recognizer using dynamic features of speech spectrum,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 52–59, 1986.

    Article  Google Scholar 

  12. Gish, H., “A probabilistic approach to the understanding and training of neural network classifiers,” in IEEE Proc. Intl. Conf. on Acoustics, Speech and Signal Processing (Albuquerque, NM), pp. 1361–1364, 1990.

    Google Scholar 

  13. Haeb-Umbach, R., Geller, D., Ney, H., “Improvements in connected digit recognition using linear discriminant analysis and mixture densities,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Adelaide, Australia), pp. II-239–242, 1994.

    Google Scholar 

  14. Hennebert, J., Ris, C., Bourlard, H., and Renals, S., “Estimation of global posteriors and forward-backward training of hybrid HMM/ANN systems,” to be published in Proc. EUROSPEECH'97 (Rhodes, Greece), Sep. 1997.

    Google Scholar 

  15. Hermansky, H., “Perceptual Linear Predictive (PLP) analysis of speech,” Journal of the Acoust. Soc. Am., vol. 87, no. 4, 1990.

    Google Scholar 

  16. Hochberg, M.M., Renals, S.J., Robinson, A.J., and G.D. Cook., “Recent improvements to the ABBOT large vocabulary CSR system,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Detroit, MI), pp. 69–72, 1995.

    Google Scholar 

  17. Huang, X.D., Lee, K.F. and Waibel, A., “Connectionist speaker normalization and its application to speech recognition,” Proc. of IEEE Workshop on Neural Networks for Signal Processing, pp. 357–366, IEEE Press, 1991.

    Google Scholar 

  18. Katagiri, S., Lee, C., and Juang, B., “New Discriminative Training Algorithms Based on the Generalized Probabilistic Descent Method”, Proc. of the 1991 IEEE Workshop on Neural Networks for Signal Processing, ppp. 299–308, 1991.

    Google Scholar 

  19. Kohonen, T., “The ‘neural’ phonetic typewriter,” IEEE Computer: 11–22, 1988.

    Google Scholar 

  20. Levin, E., “Speech recognition using hidden control neural network architecture,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Albuquerque, NM), pp. 433–436, 1990.

    Google Scholar 

  21. Lippmann, R.P., “Review of neural networks for speech recognition,” Neural Computation, vol. 1, no. 1, pp. 1–38, 1989.

    Google Scholar 

  22. Lubensky, D.M., Asadi, A.O. and Naik, J.M., “Connected digit recognition using connectionist probability estimators and mixture-gaussian densities,” IEEE Proc. of the Intl. Conf. on Spoken Language Processing, pp.295–298, Yokohama, Japan, 1994.

    Google Scholar 

  23. Morgan, N. and Bourlard, H., “Generalization and parameter estimation in feed-forward nets: some experiments, “ in Advances in Neural Information Processing Systems 2 (D.S. Touretzky, Ed.), San Mateo, CA: Morgan Kaufmann, pp. 630–637, 1990.

    Google Scholar 

  24. Morgan, N., “Big Dumb Deural Nets (BDNN): a working brute force approach to speech recognition”, Proceedings of the ICNN, vol. VII, pp.4462–4465, 1994.

    Google Scholar 

  25. Morgan, N. and Bourlard, H., “Neural networks for statistical recognition of continuous speech,” Proceedings of the IEEE, vol. 83, no. 5, pp. 741–770, 1995.

    Article  Google Scholar 

  26. Ney, N., “The use of a one-stage dynamic programming algorithm for connected word recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, 32:263–271, 1984.

    Article  Google Scholar 

  27. Poritz, A., “Linear predictive Hidden Markov Models and the speech signal,” Proc. IEEE Intl. Conf. on Acoustic, Speech, and Signal Processing, pp. 1291–1294, Paris, 1982.

    Google Scholar 

  28. Poritz, A.B. and Richter, A.L., “On hidden Markov models in isolated word recognition”, IEEE Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 14.3.1–4, Tokyo, Japan, 1986.

    Google Scholar 

  29. Rabiner, L.R., “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–285, 1989.

    Article  Google Scholar 

  30. Renals, S., Morgan, N., Bourlard, H., Cohen, M. and Franco, F., “Connectionist probability estimators in HMM speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 161–174, 1994.

    Article  Google Scholar 

  31. Renals, S. and Hochberg, M., “Efficient search using posterior phone probability estimates,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Detroit, MI), pp. 596–599, 1995.

    Google Scholar 

  32. Richard, M.D. and Lippmann, R.P., “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Computation, no. 3, pp. 461–483, 1991.

    Google Scholar 

  33. Robinson, T., Almeida, L., Boite, J.M., Bourlard, H., Fallside, F., Hochberg, M., Kershaw, D., Kohn, P., Konig, Y., Morgan, N., Neto, J.P., Renals, S., Saerens, M. and Wooters, C., “A neural network based, speaker independent, large vocabulary, continuous speech recognition system: The WERNICKE Project,” Proc. EUROSPEECH'93 (Berlin, Germany), pp. 1941–1944, 1993.

    Google Scholar 

  34. Sorenson, H., “A cepstral noise reduction multi-layer network,” Proc. IEEE Intl. Conf. on Acoustic, Speech, and Signal Processing Toronto, Canada, pp. 933–936, 1991.

    Google Scholar 

  35. Steeneken, J.M. and Van Leeuwen, D.A., “Multi-lingual assessment of speaker independent large vocabulary speech-recognition systems: the SQALE project (speech recognition quality assessment for language engineering),” Proc. EUROSPEECH'95 (Madrid, Spain), Sep. 1995.

    Google Scholar 

  36. Tebelskis, J. and Waibel, A., “Large vocabulary recognition using linked predictive neural networks,” in Proc. IEEE Intl. Conf. on Acoustic, Speech, and Signal Processing (Albuquerque, NM), pp. 437–440, 1990.

    Google Scholar 

  37. Tomlinson, M.J., Russell, M.J., Moore, R.K., Buckland, A.P., Fawley, M.A., “Modelling asynchrony in speech using elementary single-signal decomposition,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Munich, Germany), pp. 1247–1250, 1997.

    Google Scholar 

  38. Varga, A. and Moore, R., “Hidden Markov model decomposition of speech and noise,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 845–848, 1990.

    Google Scholar 

  39. Zavaliagkos, G., Zhao, Y., Schwartz, R. and Makhoul, J., “A hybrid segmental neural net/hidden markov model system for continuous speech recognition” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 151–160, 1994.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

C. Lee Giles Marco Gori

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bourlard, H., Morgan, N. (1998). Hybrid HMM/ANN systems for speech recognition: Overview and new research directions. In: Giles, C.L., Gori, M. (eds) Adaptive Processing of Sequences and Data Structures. NN 1997. Lecture Notes in Computer Science, vol 1387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054006

Download citation

  • DOI: https://doi.org/10.1007/BFb0054006

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64341-8

  • Online ISBN: 978-3-540-69752-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics