Skip to main content

State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition

  • Conference paper
Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

  • 1574 Accesses

Abstract

Aiming at building a dialectal Chinese speech recognizer from a standard Chinese speech recognizer with a small amount of dialectal Chinese speech, a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM) method, is proposed and evaluated, where a tied-state of standard triphone(s) will be merged with a state of the dialectal monophone that is identical with the central phoneme in the triphone(s). It can be seen that the proposed method has a good performance however it will introduce a Gaussian mixtures expansion problem. To deal with it, an acoustic model distance measure, named pseudo-divergence based distance measure, is proposed based on the difference measurement of Gaussian mixture models and then implemented to downsize the model size almost without causing any performance degradation for dialectal speech. With a small amount of only 40-minute Shanghai-dialectal Chinese speech, the proposed SDPBMM achieves a significant absolute syllable error rate (SER) reduction of 5.9% for dialectal Chinese and almost no performance degradation for standard Chinese. In combination with a certain existing adaptation method, another absolute SER reduction of 1.9% can be further achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goronzy, S., Kompe, R., Rapp, S.: Generating Non-Native Pronunciation Variants for Lexicon Adaptation. Speech Communication 42(1), 109–123 (2004)

    Article  Google Scholar 

  2. Huang, C., Chen, T., Chang, E.: Accent Issue in Large Vocabulary Continuous Speech Recognition. International Journal of Speech Technology 7, 141–153 (2004)

    Article  Google Scholar 

  3. Tjalve, M., Huckvale, M.: Pronunciation Variation Modeling using Accent Features. In: Proc. Interspeech 2005, Lisbon (2005)

    Google Scholar 

  4. Liu, Y., Fung, P.: Pronunciation Modeling for Spontaneous Mandarin Speech Recognition. International Journal of Speech Technology 7, 155–172 (2004)

    Article  Google Scholar 

  5. Saraclar, M., Nock, H., Khudanpur, S.: Pronunciation Modeling by Sharing Gaussian Densities across Phonetic Models. Computer Speech and Language 14, 137–160 (2000)

    Article  Google Scholar 

  6. Li, J., Zheng, T.-F., Byrne, W., Jurafsky, D.: A Dialectal Chinese Speech Recognition Framework. Journal of Computer Science and Technology 21(1), 106–115 (2006)

    Article  Google Scholar 

  7. Diakoloukas, V., Digalakis, V., Neumeyer, L., Kaja, J.: Development of Dialect-Specific Speech Recognizers Using Adaptation Methods. IEEE ICASSP 2, 1455 (1997)

    Google Scholar 

  8. Tomokiyo, L.-M.: Recognizing Non-Native Speech: Characterizing and Adapting to Non- Native Usage in LVCSR. PhD Thesis, Carnegie Mellon University (2001)

    Google Scholar 

  9. Gao, J.-F., Goodman, J., Li, M.-J., Lee, K.-F.: Toward a Unified Approach to Statistical Language Modeling for Chinese. ACM Transactions on Asian Language Information Processing 1(1), 3–33 (2002)

    Article  Google Scholar 

  10. Wang, Z.-R., Schultz, T., Waibel, A.: Comparison of Acoustic Model Adaptation Techniques on Non-native Speech. In: IEEE ICASSP, pp. 540–543 (2003)

    Google Scholar 

  11. Zheng, Y.-L., Sproat, R., Gu, L., et al.: Accent Detection and Speech Recognition for Shanghai-Accented Mandarin. In: Interspeech 2005, Lisbon (2005)

    Google Scholar 

  12. Sproat, R., Zheng, T.-F., Gu, L., Jurafsky, D., Shanfran, I., Li, J., Zheng, Y.-L., Zhou, H., Su, Y., Tsakalidis, S., Bramsen, P., Kirsch, D.: Dialectal Chinese Speech Recognition: Final Technical Report (2004), http://www.clsp.jhu.edu/ws2004/

  13. Li, A.-J., Wang, X.: A Contrastive Investigation of Standard Mandarin and Accented Mandarin. In: EuroSpeech 2003, Geneva (2003)

    Google Scholar 

  14. Hwang, M.-Y., Huang, X.-D., Alleva, F.-A.: Predicting Unseen Triphones with Senones. IEEE Transaction on Speech and Audio Processing 4(6), 412–419 (1996)

    Article  Google Scholar 

  15. Cover, T.-M., Thomas, J.-A.: Elements of Information Theory. John Wiley & Sons, Chichester (1991)

    Book  MATH  Google Scholar 

  16. Liu, Z., Huang, Q.: A New Distance Measure for Probability Distribution Function of Mixture Types. In: Proc. ICASSP, pp. 1345–1348 (2000)

    Google Scholar 

  17. Liu, Y., Fung, P.: Acoustic and Phonetic Confusions in Accented Speech Recognition. In: Proc. INTERSPEECH, pp. 3033–3036 (2005)

    Google Scholar 

  18. Xuan, P., Wang, B.-X.: Speaker Clustering via Distance Measurement of Gaussian Mixtures Models. Journal of Computer Engineering and Technology (May 2005)

    Google Scholar 

  19. Huang, X.-D., Acero, A., Hon, S.-W.: Spoken Language Processing. Prentice Hall, Englewood Cliffs (2001)

    Google Scholar 

  20. Li, J., Zheng, F., Xiong, Z.-Y., Wu, W.-H.: Construction of Large-Scale Shanghai Putonghua Speech Corpus for Chinese Speech Recognition. In: Oriental-COCOSDA, Singapore, October 2003, pp. 62–69 (2003)

    Google Scholar 

  21. Young, S., Evermann, G., Hain, T., et al.: The HTK Book (for HTK Version 3.2.1). Cambridge University, Cambridge (2002), http://htk.eng.cam.ac.uk/

    Google Scholar 

  22. Zheng, F., Song, Z.-J., Fung, P., Byrne, W.: Mandarin Pronunciation Modeling Based on CASS Corpus. Journal of Computer Science and Technology 17(3), 249–263 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, L., Zheng, T.F., Wu, W. (2006). State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_32

Download citation

  • DOI: https://doi.org/10.1007/11939993_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics