Application of a stereovision neural network to continuous speech recognition

Kitazoe, Tetsuro; Ichiki, Tomoyuki; Funamori, Makoto

doi:10.1007/BF02481464

Application of a stereovision neural network to continuous speech recognition

Original Article
Published: September 2001

Volume 5, pages 165–170, (2001)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Tetsuro Kitazoe¹,
Tomoyuki Ichiki¹ &
Makoto Funamori¹

68 Accesses
Explore all metrics

Abstract

The two- or three-layered neural networks (2LNN, 3LNN) which originated from stereovision neural networks are applied to speech recognition. To accommodate sequential data flow, we consider a window through which the new acoustic data enter and from which the final neural activities are output. Inside the window, a recurrent neural network develops neural activity toward a stable point. The process is called winner-take-all (WTA) with cooperation and competition. The resulting neural activities clearly showed recognition of continuous speech of a word. The string of phonemes obtained is compared with reference words by using a dynamic programming method. The resulting recognition rate was 96.7% for 100 words spoken by nine male speakers, compared with 97.9% by a hidden Markov model (HMM) with three states and a single gaussian distribution. These results, which are close to those of HMM, seem important because the architecture of the neural network is very simple, and the number of parameters in the neural net equations is small and fixed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing the Models of Speech Recognition on the Basis of Neural Networks of Deep Learning for Examination of Digital Phonograms

Article 30 January 2021

Neuron-Like Approach to Speech Recognition

Article 01 May 2018

Dissecting neural computations in the human auditory pathway using deep neural networks for speech

Article Open access 30 October 2023

References

Lippmann R (1987) An introduction to computing with a neural net, IEEE ASSP Mag 4:4–22
Article Google Scholar
Waibel A, Hanazawa T, Hinton G, et al. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37:329–339
Google Scholar
Lang J, Waibel A, Hinton GE (1992) A time-delay neural network architecture for isolated word recognition. Artificial neural networks, paradigms, applications and hardware implementations. IEEE Press, New York, p 388–408
Google Scholar
Martinelli G (1994) Hidden control neural network. IEEE Trans Circuits Syst Analog Signal Process 41:245–247
Article Google Scholar
Levin E (1993) Hidden control neural architecture modeling of nonlinear time-varying systems and its application. IEEE Trans Neural Networks 4:109–116
Article Google Scholar
Boulard H, Wellekens CJ (1990) Links between Markov models and multi-layer perceptrons. IEEE Trans Patt Anal Mach Intell 12:1167–1178
Article Google Scholar
Robinson T (1994) An application of recurrent nets to phone probability estimation. IEEE transactions on neural networks, 5:298–305
Article Google Scholar
Williams RJ, Zipser D (1990) Gradient-based learning algorithms for recurrent connectionist networks. Tech Rep NU CCS-90-9, Northeastern University, College of Computer Science, Boston
Google Scholar
Amari S, Arbib MA (1977) Competition and cooperation in neural nets. Systems in neuroscience. Academic Press, New York, p 119–165
Google Scholar
Geman S, Geman D (1984) Stochastic relaxation. Gibbs distribution and the Bayesian restoration of image. IEEE Trans Patt Anal Mach Intell 6:721–741
Article MATH Google Scholar
Terzopoulos D, Witkin A, Kass M (1987) Stereo matching as constrained optimization using scale continuation methods, optical and digital pattern recognition. SPIE 754:92–99
Google Scholar
Hentschel HE, Fine A (1989) Statistical mechanics of stereoscopic vision. Phys Rev A 40:3983–3997
Article Google Scholar
Yuille A, Geiger D, Bulthoff HH (1991) Stereo integration, mean field theory and psychophysics, Network 2:423–442
Article MATH Google Scholar
Reimann D, Haken H (1994) Stereo vision by self-organization. Biol Cybern 71:17–26
MATH Google Scholar
Yoshitomi Y, Kanda T, Kitazoe T (1998) Neural nets pattern recognition equation for stereo vision. Trans. IPS Japan 39:29–38
Google Scholar
Kitazoe T, Tomiyama J, Yoshitomi Y, et al. (1998) Sequential stereoscopic vision and hysteresis. Proceedings of the 5th International Conference on Neural Information Processing, Kitakyushu, Japan, p 391–396
Kitazoe T, Ichiki T, Kim S-I (1998) Acoustic speech recognition model by neural net equations with competition and cooperation. 5th International Conference on Spoken Language Processing, Sydney, vol 7, p 3281–3284
Google Scholar
Kitazoe T, Kim S-I, Ichiki T, et al. (1999) Acoustic speech recognition by two- and three-layered neural networks with competition and cooperation. International Workshop Speech and Computer, Moscow, October, p 111–114
Kitazoe T, Kim S-I, Ichiki T (2000) Speech recognition using a stereo vision neural network model. Artif Life Robotics 4:37–41
Article Google Scholar
Sagisaka Y, Uratani N (1992) ATR Spoken language data base. J. Acoust. Soc. Jpn. 48:878–882
Google Scholar
Kobayashi T, Itabashi S, Hayamizu S, Takezawa T (1992) ASJ continuous speech corpus for research. J. Acoust. Soc. Jpn. 48:888–893
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Systems Engineering, Faculty of Engineering, Miyazaki University, 1-1 Gakuen Kibanadai Nishi, 889-2192, Miyazaki, Japan
Tetsuro Kitazoe, Tomoyuki Ichiki & Makoto Funamori

Authors

Tetsuro Kitazoe
View author publications
You can also search for this author in PubMed Google Scholar
Tomoyuki Ichiki
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Funamori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsuro Kitazoe.

Additional information

This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000

About this article

Cite this article

Kitazoe, T., Ichiki, T. & Funamori, M. Application of a stereovision neural network to continuous speech recognition. Artif Life Robotics 5, 165–170 (2001). https://doi.org/10.1007/BF02481464

Download citation

Received: 10 November 2000
Accepted: 30 May 2002
Issue Date: September 2001
DOI: https://doi.org/10.1007/BF02481464

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of a stereovision neural network to continuous speech recognition

Abstract

Access this article

Similar content being viewed by others

Analyzing the Models of Speech Recognition on the Basis of Neural Networks of Deep Learning for Examination of Digital Phonograms

Neuron-Like Approach to Speech Recognition

Dissecting neural computations in the human auditory pathway using deep neural networks for speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Key words

Navigation

Application of a stereovision neural network to continuous speech recognition

Abstract

Access this article

Similar content being viewed by others

Analyzing the Models of Speech Recognition on the Basis of Neural Networks of Deep Learning for Examination of Digital Phonograms

Neuron-Like Approach to Speech Recognition

Dissecting neural computations in the human auditory pathway using deep neural networks for speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Key words

Search

Navigation