State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition

Liu, Linquan; Zheng, Thomas Fang; Wu, Wenhu

doi:10.1007/11939993_32

Linquan Liu²²,
Thomas Fang Zheng²² &
Wenhu Wu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1574 Accesses

Abstract

Aiming at building a dialectal Chinese speech recognizer from a standard Chinese speech recognizer with a small amount of dialectal Chinese speech, a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM) method, is proposed and evaluated, where a tied-state of standard triphone(s) will be merged with a state of the dialectal monophone that is identical with the central phoneme in the triphone(s). It can be seen that the proposed method has a good performance however it will introduce a Gaussian mixtures expansion problem. To deal with it, an acoustic model distance measure, named pseudo-divergence based distance measure, is proposed based on the difference measurement of Gaussian mixture models and then implemented to downsize the model size almost without causing any performance degradation for dialectal speech. With a small amount of only 40-minute Shanghai-dialectal Chinese speech, the proposed SDPBMM achieves a significant absolute syllable error rate (SER) reduction of 5.9% for dialectal Chinese and almost no performance degradation for standard Chinese. In combination with a certain existing adaptation method, another absolute SER reduction of 1.9% can be further achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Goronzy, S., Kompe, R., Rapp, S.: Generating Non-Native Pronunciation Variants for Lexicon Adaptation. Speech Communication 42(1), 109–123 (2004)
Article Google Scholar
Huang, C., Chen, T., Chang, E.: Accent Issue in Large Vocabulary Continuous Speech Recognition. International Journal of Speech Technology 7, 141–153 (2004)
Article Google Scholar
Tjalve, M., Huckvale, M.: Pronunciation Variation Modeling using Accent Features. In: Proc. Interspeech 2005, Lisbon (2005)
Google Scholar
Liu, Y., Fung, P.: Pronunciation Modeling for Spontaneous Mandarin Speech Recognition. International Journal of Speech Technology 7, 155–172 (2004)
Article Google Scholar
Saraclar, M., Nock, H., Khudanpur, S.: Pronunciation Modeling by Sharing Gaussian Densities across Phonetic Models. Computer Speech and Language 14, 137–160 (2000)
Article Google Scholar
Li, J., Zheng, T.-F., Byrne, W., Jurafsky, D.: A Dialectal Chinese Speech Recognition Framework. Journal of Computer Science and Technology 21(1), 106–115 (2006)
Article Google Scholar
Diakoloukas, V., Digalakis, V., Neumeyer, L., Kaja, J.: Development of Dialect-Specific Speech Recognizers Using Adaptation Methods. IEEE ICASSP 2, 1455 (1997)
Google Scholar
Tomokiyo, L.-M.: Recognizing Non-Native Speech: Characterizing and Adapting to Non- Native Usage in LVCSR. PhD Thesis, Carnegie Mellon University (2001)
Google Scholar
Gao, J.-F., Goodman, J., Li, M.-J., Lee, K.-F.: Toward a Unified Approach to Statistical Language Modeling for Chinese. ACM Transactions on Asian Language Information Processing 1(1), 3–33 (2002)
Article Google Scholar
Wang, Z.-R., Schultz, T., Waibel, A.: Comparison of Acoustic Model Adaptation Techniques on Non-native Speech. In: IEEE ICASSP, pp. 540–543 (2003)
Google Scholar
Zheng, Y.-L., Sproat, R., Gu, L., et al.: Accent Detection and Speech Recognition for Shanghai-Accented Mandarin. In: Interspeech 2005, Lisbon (2005)
Google Scholar
Sproat, R., Zheng, T.-F., Gu, L., Jurafsky, D., Shanfran, I., Li, J., Zheng, Y.-L., Zhou, H., Su, Y., Tsakalidis, S., Bramsen, P., Kirsch, D.: Dialectal Chinese Speech Recognition: Final Technical Report (2004), http://www.clsp.jhu.edu/ws2004/
Li, A.-J., Wang, X.: A Contrastive Investigation of Standard Mandarin and Accented Mandarin. In: EuroSpeech 2003, Geneva (2003)
Google Scholar
Hwang, M.-Y., Huang, X.-D., Alleva, F.-A.: Predicting Unseen Triphones with Senones. IEEE Transaction on Speech and Audio Processing 4(6), 412–419 (1996)
Article Google Scholar
Cover, T.-M., Thomas, J.-A.: Elements of Information Theory. John Wiley & Sons, Chichester (1991)
Book MATH Google Scholar
Liu, Z., Huang, Q.: A New Distance Measure for Probability Distribution Function of Mixture Types. In: Proc. ICASSP, pp. 1345–1348 (2000)
Google Scholar
Liu, Y., Fung, P.: Acoustic and Phonetic Confusions in Accented Speech Recognition. In: Proc. INTERSPEECH, pp. 3033–3036 (2005)
Google Scholar
Xuan, P., Wang, B.-X.: Speaker Clustering via Distance Measurement of Gaussian Mixtures Models. Journal of Computer Engineering and Technology (May 2005)
Google Scholar
Huang, X.-D., Acero, A., Hon, S.-W.: Spoken Language Processing. Prentice Hall, Englewood Cliffs (2001)
Google Scholar
Li, J., Zheng, F., Xiong, Z.-Y., Wu, W.-H.: Construction of Large-Scale Shanghai Putonghua Speech Corpus for Chinese Speech Recognition. In: Oriental-COCOSDA, Singapore, October 2003, pp. 62–69 (2003)
Google Scholar
Young, S., Evermann, G., Hain, T., et al.: The HTK Book (for HTK Version 3.2.1). Cambridge University, Cambridge (2002), http://htk.eng.cam.ac.uk/
Google Scholar
Zheng, F., Song, Z.-J., Fung, P., Byrne, W.: Mandarin Pronunciation Modeling Based on CASS Corpus. Journal of Computer Science and Technology 17(3), 249–263 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Center for Speech Technology, Tsinghua National Laboratory for, Information Science and Technology, Tsinghua University, 100084, Beijing
Linquan Liu, Thomas Fang Zheng & Wenhu Wu

Authors

Linquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Fang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wenhu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, L., Zheng, T.F., Wu, W. (2006). State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_32

Download citation

DOI: https://doi.org/10.1007/11939993_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics