Skip to main content

Phone Recognition Using High Order Phonotactic Constraints

  • Conference paper
Speech Recognition and Understanding

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

  • 280 Accesses

Abstract

A new phone recognizer has been implemented which extends the (phonotactic) decoding constraint to sequences of three phones. It is based on a structure similar to a second order ergodic hidden Markov model (HMM). This kind of model assumes direct correspondence between the model states and phones, thus constraints on possible state sequences are equivalent to phonotactic constraints. Very high coverage by both left and right context dependent phone models has been achieved using two methods. The first assumes that some contexts have the same or very similar effect on the phone in question. Thus they are merged into the same contextual class. The outcome is a set of 19 left context classes and 18 right context classes. The second assumes that left context mostly influences the beginning of a phone, whereas the right context influences the end of the phone. Each phone (a state in an ergodic HMM) is represented by a sequence of three probability density functions (pdf s), which is equivalent to a three state left-to-right HMM. We generate acoustic models such that first pdf in the model is conditioned on the left context, the middle pdf is context independent, and the last pdf is conditioned on the right context. A large number of such quasi-triphonic acoustic models can be generated providing a good triphone coverage for a given task efficiently utilizing the available training data set. The current implementation of the recognizer described here has been applied to the DARPA Resource Management Task. Since true phone sequences are not available, they are estimated from text from a phone realization regression tree trained on TIMIT database transcriptions. The estimates of the true phone sequences are used in training the models and generating reference phone sequences for scoring. The best phone recognition match between the most likely output of the regression tree and the phone recognizer for the DARPA February 89 test set was 75.5% correct with 79.5% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Juang, B.H., Rabiner, L.R. and Wilpon, J.G.: On the Use of Bandpass Liftering in Speech Recognition. IEEE Trans. ASSP-35, 7, 957–954 (1987)

    Google Scholar 

  2. Lee, C.-H., Rabiner, L.R., Pierracini, R. and Wilpon, J.G.: Acoustic Modeling for Large Vocabulary Speech Recognition. Computer Speech and Language, 4, 127–165 (1990)

    Article  Google Scholar 

  3. Lee, K.-F.: Automatic Speech Recognition — The Development of the SPHINX System. Boston: Kluwer Academic Publishers 1989.

    Google Scholar 

  4. Levinson, S.E., Ljolje, A. and Miller, L.G.: Continuous Speech Recognition from a Phonetic Transcription, Proc. ICASSP-90, Albuquerque, 93–96 (1990)

    Google Scholar 

  5. Paul, D.B.: The Lincoln Robust Continuous Speech Recognizer. Proc. ICASSP-89, Glasgow, 449–452 (1989)

    Google Scholar 

  6. Zue, V., Glass, J., Phillips, M. and Seneff, S.: The MIT Summit Speech Recognition System: A Progress Report. Proc. Speech and Natural Language Workshop, Philadelphia, 179–189 (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ljolje, A. (1992). Phone Recognition Using High Order Phonotactic Constraints. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-76626-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-76628-2

  • Online ISBN: 978-3-642-76626-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics