Abstract
Aiming at building a dialectal Chinese speech recognizer from a standard Chinese speech recognizer with a small amount of dialectal Chinese speech, a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM) method, is proposed and evaluated, where a tied-state of standard triphone(s) will be merged with a state of the dialectal monophone that is identical with the central phoneme in the triphone(s). It can be seen that the proposed method has a good performance however it will introduce a Gaussian mixtures expansion problem. To deal with it, an acoustic model distance measure, named pseudo-divergence based distance measure, is proposed based on the difference measurement of Gaussian mixture models and then implemented to downsize the model size almost without causing any performance degradation for dialectal speech. With a small amount of only 40-minute Shanghai-dialectal Chinese speech, the proposed SDPBMM achieves a significant absolute syllable error rate (SER) reduction of 5.9% for dialectal Chinese and almost no performance degradation for standard Chinese. In combination with a certain existing adaptation method, another absolute SER reduction of 1.9% can be further achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Goronzy, S., Kompe, R., Rapp, S.: Generating Non-Native Pronunciation Variants for Lexicon Adaptation. Speech Communication 42(1), 109–123 (2004)
Huang, C., Chen, T., Chang, E.: Accent Issue in Large Vocabulary Continuous Speech Recognition. International Journal of Speech Technology 7, 141–153 (2004)
Tjalve, M., Huckvale, M.: Pronunciation Variation Modeling using Accent Features. In: Proc. Interspeech 2005, Lisbon (2005)
Liu, Y., Fung, P.: Pronunciation Modeling for Spontaneous Mandarin Speech Recognition. International Journal of Speech Technology 7, 155–172 (2004)
Saraclar, M., Nock, H., Khudanpur, S.: Pronunciation Modeling by Sharing Gaussian Densities across Phonetic Models. Computer Speech and Language 14, 137–160 (2000)
Li, J., Zheng, T.-F., Byrne, W., Jurafsky, D.: A Dialectal Chinese Speech Recognition Framework. Journal of Computer Science and Technology 21(1), 106–115 (2006)
Diakoloukas, V., Digalakis, V., Neumeyer, L., Kaja, J.: Development of Dialect-Specific Speech Recognizers Using Adaptation Methods. IEEE ICASSPÂ 2, 1455 (1997)
Tomokiyo, L.-M.: Recognizing Non-Native Speech: Characterizing and Adapting to Non- Native Usage in LVCSR. PhD Thesis, Carnegie Mellon University (2001)
Gao, J.-F., Goodman, J., Li, M.-J., Lee, K.-F.: Toward a Unified Approach to Statistical Language Modeling for Chinese. ACM Transactions on Asian Language Information Processing 1(1), 3–33 (2002)
Wang, Z.-R., Schultz, T., Waibel, A.: Comparison of Acoustic Model Adaptation Techniques on Non-native Speech. In: IEEE ICASSP, pp. 540–543 (2003)
Zheng, Y.-L., Sproat, R., Gu, L., et al.: Accent Detection and Speech Recognition for Shanghai-Accented Mandarin. In: Interspeech 2005, Lisbon (2005)
Sproat, R., Zheng, T.-F., Gu, L., Jurafsky, D., Shanfran, I., Li, J., Zheng, Y.-L., Zhou, H., Su, Y., Tsakalidis, S., Bramsen, P., Kirsch, D.: Dialectal Chinese Speech Recognition: Final Technical Report (2004), http://www.clsp.jhu.edu/ws2004/
Li, A.-J., Wang, X.: A Contrastive Investigation of Standard Mandarin and Accented Mandarin. In: EuroSpeech 2003, Geneva (2003)
Hwang, M.-Y., Huang, X.-D., Alleva, F.-A.: Predicting Unseen Triphones with Senones. IEEE Transaction on Speech and Audio Processing 4(6), 412–419 (1996)
Cover, T.-M., Thomas, J.-A.: Elements of Information Theory. John Wiley & Sons, Chichester (1991)
Liu, Z., Huang, Q.: A New Distance Measure for Probability Distribution Function of Mixture Types. In: Proc. ICASSP, pp. 1345–1348 (2000)
Liu, Y., Fung, P.: Acoustic and Phonetic Confusions in Accented Speech Recognition. In: Proc. INTERSPEECH, pp. 3033–3036 (2005)
Xuan, P., Wang, B.-X.: Speaker Clustering via Distance Measurement of Gaussian Mixtures Models. Journal of Computer Engineering and Technology (May 2005)
Huang, X.-D., Acero, A., Hon, S.-W.: Spoken Language Processing. Prentice Hall, Englewood Cliffs (2001)
Li, J., Zheng, F., Xiong, Z.-Y., Wu, W.-H.: Construction of Large-Scale Shanghai Putonghua Speech Corpus for Chinese Speech Recognition. In: Oriental-COCOSDA, Singapore, October 2003, pp. 62–69 (2003)
Young, S., Evermann, G., Hain, T., et al.: The HTK Book (for HTK Version 3.2.1). Cambridge University, Cambridge (2002), http://htk.eng.cam.ac.uk/
Zheng, F., Song, Z.-J., Fung, P., Byrne, W.: Mandarin Pronunciation Modeling Based on CASS Corpus. Journal of Computer Science and Technology 17(3), 249–263 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, L., Zheng, T.F., Wu, W. (2006). State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_32
Download citation
DOI: https://doi.org/10.1007/11939993_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)