Skip to main content
Log in

Application of a stereovision neural network to continuous speech recognition

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

The two- or three-layered neural networks (2LNN, 3LNN) which originated from stereovision neural networks are applied to speech recognition. To accommodate sequential data flow, we consider a window through which the new acoustic data enter and from which the final neural activities are output. Inside the window, a recurrent neural network develops neural activity toward a stable point. The process is called winner-take-all (WTA) with cooperation and competition. The resulting neural activities clearly showed recognition of continuous speech of a word. The string of phonemes obtained is compared with reference words by using a dynamic programming method. The resulting recognition rate was 96.7% for 100 words spoken by nine male speakers, compared with 97.9% by a hidden Markov model (HMM) with three states and a single gaussian distribution. These results, which are close to those of HMM, seem important because the architecture of the neural network is very simple, and the number of parameters in the neural net equations is small and fixed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lippmann R (1987) An introduction to computing with a neural net, IEEE ASSP Mag 4:4–22

    Article  Google Scholar 

  2. Waibel A, Hanazawa T, Hinton G, et al. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37:329–339

    Google Scholar 

  3. Lang J, Waibel A, Hinton GE (1992) A time-delay neural network architecture for isolated word recognition. Artificial neural networks, paradigms, applications and hardware implementations. IEEE Press, New York, p 388–408

    Google Scholar 

  4. Martinelli G (1994) Hidden control neural network. IEEE Trans Circuits Syst Analog Signal Process 41:245–247

    Article  Google Scholar 

  5. Levin E (1993) Hidden control neural architecture modeling of nonlinear time-varying systems and its application. IEEE Trans Neural Networks 4:109–116

    Article  Google Scholar 

  6. Boulard H, Wellekens CJ (1990) Links between Markov models and multi-layer perceptrons. IEEE Trans Patt Anal Mach Intell 12:1167–1178

    Article  Google Scholar 

  7. Robinson T (1994) An application of recurrent nets to phone probability estimation. IEEE transactions on neural networks, 5:298–305

    Article  Google Scholar 

  8. Williams RJ, Zipser D (1990) Gradient-based learning algorithms for recurrent connectionist networks. Tech Rep NU CCS-90-9, Northeastern University, College of Computer Science, Boston

    Google Scholar 

  9. Amari S, Arbib MA (1977) Competition and cooperation in neural nets. Systems in neuroscience. Academic Press, New York, p 119–165

    Google Scholar 

  10. Geman S, Geman D (1984) Stochastic relaxation. Gibbs distribution and the Bayesian restoration of image. IEEE Trans Patt Anal Mach Intell 6:721–741

    Article  MATH  Google Scholar 

  11. Terzopoulos D, Witkin A, Kass M (1987) Stereo matching as constrained optimization using scale continuation methods, optical and digital pattern recognition. SPIE 754:92–99

    Google Scholar 

  12. Hentschel HE, Fine A (1989) Statistical mechanics of stereoscopic vision. Phys Rev A 40:3983–3997

    Article  Google Scholar 

  13. Yuille A, Geiger D, Bulthoff HH (1991) Stereo integration, mean field theory and psychophysics, Network 2:423–442

    Article  MATH  Google Scholar 

  14. Reimann D, Haken H (1994) Stereo vision by self-organization. Biol Cybern 71:17–26

    MATH  Google Scholar 

  15. Yoshitomi Y, Kanda T, Kitazoe T (1998) Neural nets pattern recognition equation for stereo vision. Trans. IPS Japan 39:29–38

    Google Scholar 

  16. Kitazoe T, Tomiyama J, Yoshitomi Y, et al. (1998) Sequential stereoscopic vision and hysteresis. Proceedings of the 5th International Conference on Neural Information Processing, Kitakyushu, Japan, p 391–396

  17. Kitazoe T, Ichiki T, Kim S-I (1998) Acoustic speech recognition model by neural net equations with competition and cooperation. 5th International Conference on Spoken Language Processing, Sydney, vol 7, p 3281–3284

    Google Scholar 

  18. Kitazoe T, Kim S-I, Ichiki T, et al. (1999) Acoustic speech recognition by two- and three-layered neural networks with competition and cooperation. International Workshop Speech and Computer, Moscow, October, p 111–114

  19. Kitazoe T, Kim S-I, Ichiki T (2000) Speech recognition using a stereo vision neural network model. Artif Life Robotics 4:37–41

    Article  Google Scholar 

  20. Sagisaka Y, Uratani N (1992) ATR Spoken language data base. J. Acoust. Soc. Jpn. 48:878–882

    Google Scholar 

  21. Kobayashi T, Itabashi S, Hayamizu S, Takezawa T (1992) ASJ continuous speech corpus for research. J. Acoust. Soc. Jpn. 48:888–893

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuro Kitazoe.

Additional information

This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000

About this article

Cite this article

Kitazoe, T., Ichiki, T. & Funamori, M. Application of a stereovision neural network to continuous speech recognition. Artif Life Robotics 5, 165–170 (2001). https://doi.org/10.1007/BF02481464

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02481464

Key words

Navigation