Fujisaki Model Based Intonation Modeling for Korean TTS System

Kim, Byeongchang; Lee, Jinsik; Lee, Gary Geunbae

doi:10.1007/978-3-642-13467-8_10

Byeongchang Kim⁶,
Jinsik Lee⁷ &
Gary Geunbae Lee⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 75))

Included in the following conference series:

International Conference on Ubiquitous Computing and Multimedia Applications

526 Accesses

Abstract

One of the enduring problems in developing high-quality TTS (text-to-speech) system is pitch contour generation. Considering language specific knowledge, an adjusted Fujisaki model for Korean TTS system is introduced along with refined machine learning features. The results of quantitative and qualitative evaluations show the validity of our system: the accuracy of the phrase command prediction is 0.8928; the correlations of the predicted amplitudes of a phrase command and an accent command are 0.6644 and 0.6002, respectively; our method achieved the level of “fair” naturalness (3.6) in a MOS scale for generated F0 curves.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jun, S.-A.: K-ToBI (Korean ToBI) Labelling Conventions (version 3.1), http://www.linguistics.ucla.edu/people/jun/ktobi/K-tobi.html (accessed on Feburary 10, 2010)
Fujisaki, H., Hirose, K.: Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan (E) 5(4), 233–242 (1984)
Google Scholar
Fujisaki, H., Ohno, S.: The use of a generative model of F0 contours for multilingual speech synthesis. In: Proc. of the 4th International Conference on Signal Processing, pp. 714–717 (1998)
Google Scholar
Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: Proc. of ICASSP, pp. 1281–1284 (2000)
Google Scholar
Teixeira, J.P., Freitas, D., Fujisaki, H.: Prediction of Fujisaki Model’s Phrase Commands. In: Proc. of Eurospeech, pp. 397–400 (2003)
Google Scholar
Teixeira, J.P., Freitas, D., Fujisaki, H.: Prediction of Accent Commands for the Fujisaki Intonation Model. In: Proc. of Speech Prosody, pp. 451–454 (2004)
Google Scholar
Boersma, P., Weenink, D.: Praat: doing phonetics by computer, http://www.fon.hum.uva.nl/praat/ (accessed on Feburary 10, 2010)
Machine Intelligence Laboratory in Cambridge University Engineering Department, Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk/ (accessed on Feburary 10, 2010)
Lee, G.G., Cha, J., Lee, J.-H.: Syllable pattern-based unknown morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Computational Linguistics 28(1), 53–70 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Communication Engineering, Catholic University of Daegu, Gyeongbuk, South Korea
Byeongchang Kim
Department of Computer Science and Engineering, Pohang University of Science and Technology (POSTECH), Pohang, South Korea
Jinsik Lee & Gary Geunbae Lee

Authors

Byeongchang Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jinsik Lee
View author publications
You can also search for this author in PubMed Google Scholar
Gary Geunbae Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

VITM, Indore, India
G. S. Tomar
Department of Computer and Information Science, University of Michigan – Dearborn, 4901 Evergreen Road, 48128, Dearborn, MI, USA
William I. Grosky
Hannam University, 306-791, Daejeon, South Korea
Tai-hoon Kim
Department of Computer Science, Lakehead University, P7B 5E1, Thunder Bay, Ontario, Canada
Sabah Mohammed
Computer Science and Engineering Department, Jadavpur University, Kolkata, India
Sanjoy Kumar Saha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, B., Lee, J., Lee, G.G. (2010). Fujisaki Model Based Intonation Modeling for Korean TTS System. In: Tomar, G.S., Grosky, W.I., Kim, Th., Mohammed, S., Saha, S.K. (eds) Ubiquitous Computing and Multimedia Applications. UCMA 2010. Communications in Computer and Information Science, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13467-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-13467-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13466-1
Online ISBN: 978-3-642-13467-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics