Abstract
One of the enduring problems in developing high-quality TTS (text-to-speech) system is pitch contour generation. Considering language specific knowledge, an adjusted Fujisaki model for Korean TTS system is introduced along with refined machine learning features. The results of quantitative and qualitative evaluations show the validity of our system: the accuracy of the phrase command prediction is 0.8928; the correlations of the predicted amplitudes of a phrase command and an accent command are 0.6644 and 0.6002, respectively; our method achieved the level of “fair” naturalness (3.6) in a MOS scale for generated F0 curves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jun, S.-A.: K-ToBI (Korean ToBI) Labelling Conventions (version 3.1), http://www.linguistics.ucla.edu/people/jun/ktobi/K-tobi.html (accessed on Feburary 10, 2010)
Fujisaki, H., Hirose, K.: Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan (E) 5(4), 233–242 (1984)
Fujisaki, H., Ohno, S.: The use of a generative model of F0 contours for multilingual speech synthesis. In: Proc. of the 4th International Conference on Signal Processing, pp. 714–717 (1998)
Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: Proc. of ICASSP, pp. 1281–1284 (2000)
Teixeira, J.P., Freitas, D., Fujisaki, H.: Prediction of Fujisaki Model’s Phrase Commands. In: Proc. of Eurospeech, pp. 397–400 (2003)
Teixeira, J.P., Freitas, D., Fujisaki, H.: Prediction of Accent Commands for the Fujisaki Intonation Model. In: Proc. of Speech Prosody, pp. 451–454 (2004)
Boersma, P., Weenink, D.: Praat: doing phonetics by computer, http://www.fon.hum.uva.nl/praat/ (accessed on Feburary 10, 2010)
Machine Intelligence Laboratory in Cambridge University Engineering Department, Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk/ (accessed on Feburary 10, 2010)
Lee, G.G., Cha, J., Lee, J.-H.: Syllable pattern-based unknown morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Computational Linguistics 28(1), 53–70 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, B., Lee, J., Lee, G.G. (2010). Fujisaki Model Based Intonation Modeling for Korean TTS System. In: Tomar, G.S., Grosky, W.I., Kim, Th., Mohammed, S., Saha, S.K. (eds) Ubiquitous Computing and Multimedia Applications. UCMA 2010. Communications in Computer and Information Science, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13467-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-13467-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13466-1
Online ISBN: 978-3-642-13467-8
eBook Packages: Computer ScienceComputer Science (R0)