Abstract
The two- or three-layered neural networks (2LNN, 3LNN) which originated from stereovision neural networks are applied to speech recognition. To accommodate sequential data flow, we consider a window through which the new acoustic data enter and from which the final neural activities are output. Inside the window, a recurrent neural network develops neural activity toward a stable point. The process is called winner-take-all (WTA) with cooperation and competition. The resulting neural activities clearly showed recognition of continuous speech of a word. The string of phonemes obtained is compared with reference words by using a dynamic programming method. The resulting recognition rate was 96.7% for 100 words spoken by nine male speakers, compared with 97.9% by a hidden Markov model (HMM) with three states and a single gaussian distribution. These results, which are close to those of HMM, seem important because the architecture of the neural network is very simple, and the number of parameters in the neural net equations is small and fixed.
Similar content being viewed by others
References
Lippmann R (1987) An introduction to computing with a neural net, IEEE ASSP Mag 4:4–22
Waibel A, Hanazawa T, Hinton G, et al. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37:329–339
Lang J, Waibel A, Hinton GE (1992) A time-delay neural network architecture for isolated word recognition. Artificial neural networks, paradigms, applications and hardware implementations. IEEE Press, New York, p 388–408
Martinelli G (1994) Hidden control neural network. IEEE Trans Circuits Syst Analog Signal Process 41:245–247
Levin E (1993) Hidden control neural architecture modeling of nonlinear time-varying systems and its application. IEEE Trans Neural Networks 4:109–116
Boulard H, Wellekens CJ (1990) Links between Markov models and multi-layer perceptrons. IEEE Trans Patt Anal Mach Intell 12:1167–1178
Robinson T (1994) An application of recurrent nets to phone probability estimation. IEEE transactions on neural networks, 5:298–305
Williams RJ, Zipser D (1990) Gradient-based learning algorithms for recurrent connectionist networks. Tech Rep NU CCS-90-9, Northeastern University, College of Computer Science, Boston
Amari S, Arbib MA (1977) Competition and cooperation in neural nets. Systems in neuroscience. Academic Press, New York, p 119–165
Geman S, Geman D (1984) Stochastic relaxation. Gibbs distribution and the Bayesian restoration of image. IEEE Trans Patt Anal Mach Intell 6:721–741
Terzopoulos D, Witkin A, Kass M (1987) Stereo matching as constrained optimization using scale continuation methods, optical and digital pattern recognition. SPIE 754:92–99
Hentschel HE, Fine A (1989) Statistical mechanics of stereoscopic vision. Phys Rev A 40:3983–3997
Yuille A, Geiger D, Bulthoff HH (1991) Stereo integration, mean field theory and psychophysics, Network 2:423–442
Reimann D, Haken H (1994) Stereo vision by self-organization. Biol Cybern 71:17–26
Yoshitomi Y, Kanda T, Kitazoe T (1998) Neural nets pattern recognition equation for stereo vision. Trans. IPS Japan 39:29–38
Kitazoe T, Tomiyama J, Yoshitomi Y, et al. (1998) Sequential stereoscopic vision and hysteresis. Proceedings of the 5th International Conference on Neural Information Processing, Kitakyushu, Japan, p 391–396
Kitazoe T, Ichiki T, Kim S-I (1998) Acoustic speech recognition model by neural net equations with competition and cooperation. 5th International Conference on Spoken Language Processing, Sydney, vol 7, p 3281–3284
Kitazoe T, Kim S-I, Ichiki T, et al. (1999) Acoustic speech recognition by two- and three-layered neural networks with competition and cooperation. International Workshop Speech and Computer, Moscow, October, p 111–114
Kitazoe T, Kim S-I, Ichiki T (2000) Speech recognition using a stereo vision neural network model. Artif Life Robotics 4:37–41
Sagisaka Y, Uratani N (1992) ATR Spoken language data base. J. Acoust. Soc. Jpn. 48:878–882
Kobayashi T, Itabashi S, Hayamizu S, Takezawa T (1992) ASJ continuous speech corpus for research. J. Acoust. Soc. Jpn. 48:888–893
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000
About this article
Cite this article
Kitazoe, T., Ichiki, T. & Funamori, M. Application of a stereovision neural network to continuous speech recognition. Artif Life Robotics 5, 165–170 (2001). https://doi.org/10.1007/BF02481464
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02481464