Hybrid HMM/ANN systems for speech recognition: Overview and new research directions

Bourlard, Hervé; Morgan, Nelson

doi:10.1007/BFb0054006

Hervé Bourlard^1,2 &
Nelson Morgan^2,3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1387))

Included in the following conference series:

International School on Neural Networks, Initiated by IIASS and EMFCSC

395 Accesses
26 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen, J.B., “How do humans process and recognize speech?,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, pp.567–577, 1994.
Article Google Scholar
Austin, S., Zavaliagkos, G., Makhoul, J., and Schwartz, J., “Improving state-of-the-art continuous speech recognition systems using the N-best paradigm with neural networks,” Proc. DARPA Speech and Natural Language Workshop (Harriman, NY), Morgan Kaufmann, pp. 180–184, Feb. 1992.
Google Scholar
Baum, L., “An inequality and associated maximization techniques in statistical estimation of probabilistic functions of Markov processes,” Inequalities, no. 3, pp. 1–8, 1972.
Google Scholar
Bengio, Y., De Mori, R., Flammia, G. and Kompe, R., “Global optimization of a neural network-Hidden Markov Model hybrid,” IEEE Trans. on Neural Networks, vol. 3, no. 2, pp. 252–259, 1992.
Article Google Scholar
Bilmes, J., Morgan, N., Wu, S., and Bourlard, H., “Stochastic perceptual speech models with durational dependence,” Intl. Conference on Spoken Language Processing, pp. 1301–1304, 1996.
Google Scholar
Bourlard, H. and Morgan, N., Connectionist Speech Recognition — A Hybrid Approach, Kluwer Academic Publishers, 1994.
Google Scholar
Bourlard, H., Konig, Y. and Morgan, N., “REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities in connectionist speech recognition”, Proc. EUROSPEECH'95 (Madrid, Spain), Sep. 1995.
Google Scholar
Bourlard, H. and Dupont, S. (1996), “A new ASR approach based on independent processing and recombination of partial frequency bands,” Proc. of Intl. Conf. on Spoken Language Processing (ICSLP) (Philadelphia), pp. 426–429, Oct. 3–6, 1996.
Google Scholar
Bridle, J.S., “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Soulié and J. Hérault (Eds.), NATO ASI Series, pp. 227–236, 1990.
Google Scholar
Dupont, S. and Bourlard, H., “Using multiple time scales in a multi-stream speech recognition system,” to be published in Proc. EUROSPEECH'97 (Rhodes, Greece), Sep. 1997.
Google Scholar
Furui, S., “Speaker independent isolated word recognizer using dynamic features of speech spectrum,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 52–59, 1986.
Article Google Scholar
Gish, H., “A probabilistic approach to the understanding and training of neural network classifiers,” in IEEE Proc. Intl. Conf. on Acoustics, Speech and Signal Processing (Albuquerque, NM), pp. 1361–1364, 1990.
Google Scholar
Haeb-Umbach, R., Geller, D., Ney, H., “Improvements in connected digit recognition using linear discriminant analysis and mixture densities,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Adelaide, Australia), pp. II-239–242, 1994.
Google Scholar
Hennebert, J., Ris, C., Bourlard, H., and Renals, S., “Estimation of global posteriors and forward-backward training of hybrid HMM/ANN systems,” to be published in Proc. EUROSPEECH'97 (Rhodes, Greece), Sep. 1997.
Google Scholar
Hermansky, H., “Perceptual Linear Predictive (PLP) analysis of speech,” Journal of the Acoust. Soc. Am., vol. 87, no. 4, 1990.
Google Scholar
Hochberg, M.M., Renals, S.J., Robinson, A.J., and G.D. Cook., “Recent improvements to the ABBOT large vocabulary CSR system,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Detroit, MI), pp. 69–72, 1995.
Google Scholar
Huang, X.D., Lee, K.F. and Waibel, A., “Connectionist speaker normalization and its application to speech recognition,” Proc. of IEEE Workshop on Neural Networks for Signal Processing, pp. 357–366, IEEE Press, 1991.
Google Scholar
Katagiri, S., Lee, C., and Juang, B., “New Discriminative Training Algorithms Based on the Generalized Probabilistic Descent Method”, Proc. of the 1991 IEEE Workshop on Neural Networks for Signal Processing, ppp. 299–308, 1991.
Google Scholar
Kohonen, T., “The ‘neural’ phonetic typewriter,” IEEE Computer: 11–22, 1988.
Google Scholar
Levin, E., “Speech recognition using hidden control neural network architecture,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Albuquerque, NM), pp. 433–436, 1990.
Google Scholar
Lippmann, R.P., “Review of neural networks for speech recognition,” Neural Computation, vol. 1, no. 1, pp. 1–38, 1989.
Google Scholar
Lubensky, D.M., Asadi, A.O. and Naik, J.M., “Connected digit recognition using connectionist probability estimators and mixture-gaussian densities,” IEEE Proc. of the Intl. Conf. on Spoken Language Processing, pp.295–298, Yokohama, Japan, 1994.
Google Scholar
Morgan, N. and Bourlard, H., “Generalization and parameter estimation in feed-forward nets: some experiments, “ in Advances in Neural Information Processing Systems 2 (D.S. Touretzky, Ed.), San Mateo, CA: Morgan Kaufmann, pp. 630–637, 1990.
Google Scholar
Morgan, N., “Big Dumb Deural Nets (BDNN): a working brute force approach to speech recognition”, Proceedings of the ICNN, vol. VII, pp.4462–4465, 1994.
Google Scholar
Morgan, N. and Bourlard, H., “Neural networks for statistical recognition of continuous speech,” Proceedings of the IEEE, vol. 83, no. 5, pp. 741–770, 1995.
Article Google Scholar
Ney, N., “The use of a one-stage dynamic programming algorithm for connected word recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, 32:263–271, 1984.
Article Google Scholar
Poritz, A., “Linear predictive Hidden Markov Models and the speech signal,” Proc. IEEE Intl. Conf. on Acoustic, Speech, and Signal Processing, pp. 1291–1294, Paris, 1982.
Google Scholar
Poritz, A.B. and Richter, A.L., “On hidden Markov models in isolated word recognition”, IEEE Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 14.3.1–4, Tokyo, Japan, 1986.
Google Scholar
Rabiner, L.R., “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–285, 1989.
Article Google Scholar
Renals, S., Morgan, N., Bourlard, H., Cohen, M. and Franco, F., “Connectionist probability estimators in HMM speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 161–174, 1994.
Article Google Scholar
Renals, S. and Hochberg, M., “Efficient search using posterior phone probability estimates,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Detroit, MI), pp. 596–599, 1995.
Google Scholar
Richard, M.D. and Lippmann, R.P., “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Computation, no. 3, pp. 461–483, 1991.
Google Scholar
Robinson, T., Almeida, L., Boite, J.M., Bourlard, H., Fallside, F., Hochberg, M., Kershaw, D., Kohn, P., Konig, Y., Morgan, N., Neto, J.P., Renals, S., Saerens, M. and Wooters, C., “A neural network based, speaker independent, large vocabulary, continuous speech recognition system: The WERNICKE Project,” Proc. EUROSPEECH'93 (Berlin, Germany), pp. 1941–1944, 1993.
Google Scholar
Sorenson, H., “A cepstral noise reduction multi-layer network,” Proc. IEEE Intl. Conf. on Acoustic, Speech, and Signal Processing Toronto, Canada, pp. 933–936, 1991.
Google Scholar
Steeneken, J.M. and Van Leeuwen, D.A., “Multi-lingual assessment of speaker independent large vocabulary speech-recognition systems: the SQALE project (speech recognition quality assessment for language engineering),” Proc. EUROSPEECH'95 (Madrid, Spain), Sep. 1995.
Google Scholar
Tebelskis, J. and Waibel, A., “Large vocabulary recognition using linked predictive neural networks,” in Proc. IEEE Intl. Conf. on Acoustic, Speech, and Signal Processing (Albuquerque, NM), pp. 437–440, 1990.
Google Scholar
Tomlinson, M.J., Russell, M.J., Moore, R.K., Buckland, A.P., Fawley, M.A., “Modelling asynchrony in speech using elementary single-signal decomposition,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Munich, Germany), pp. 1247–1250, 1997.
Google Scholar
Varga, A. and Moore, R., “Hidden Markov model decomposition of speech and noise,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 845–848, 1990.
Google Scholar
Zavaliagkos, G., Zhao, Y., Schwartz, R. and Makhoul, J., “A hybrid segmental neural net/hidden markov model system for continuous speech recognition” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 151–160, 1994.
Article Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP, Martigny, Switzerland
Hervé Bourlard
Intl. Comp. Science Institute, Berkeley, CA
Hervé Bourlard & Nelson Morgan
UC Berkeley, Berkeley, CA
Nelson Morgan

Authors

Hervé Bourlard
View author publications
You can also search for this author in PubMed Google Scholar
Nelson Morgan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

C. Lee Giles Marco Gori

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bourlard, H., Morgan, N. (1998). Hybrid HMM/ANN systems for speech recognition: Overview and new research directions. In: Giles, C.L., Gori, M. (eds) Adaptive Processing of Sequences and Data Structures. NN 1997. Lecture Notes in Computer Science, vol 1387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054006

Download citation

DOI: https://doi.org/10.1007/BFb0054006
Published: 25 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64341-8
Online ISBN: 978-3-540-69752-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics