Simple designing methods of corpus-based visual speech synthesis

Shiraishi, Tatsuya; Toda, Tomoki; Kawanami, Hiromichi; Saruwatari, Hiroshi; Shikano, Kiyohiro

doi:10.21437/Eurospeech.2003-627

Simple designing methods of corpus-based visual speech synthesis

Tatsuya Shiraishi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

This paper describes simple designing methods of corpus-based visual speech synthesis. Our approach needs only a synchronous real image and speech database. Visual speech is synthesized by concatenating real image segments and speech segments selected from the database. In order to automatically perform all processes, e.g. feature extraction, segment selection and segment concatenation, we simply design two types of visual speech synthesis. One is synthesizing visual speech using synchronous real image and speech segments selected with only speech information. The other is using speech segment selection and image segment selection with features extracted from the database without processes by hand. We performed objective and subjective experiments to evaluate these designing methods. As a result, the latter method can synthesize visual speech more naturally than the former method.

doi: 10.21437/Eurospeech.2003-627

Cite as: Shiraishi, T., Toda, T., Kawanami, H., Saruwatari, H., Shikano, K. (2003) Simple designing methods of corpus-based visual speech synthesis. Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), 2241-2244, doi: 10.21437/Eurospeech.2003-627

@inproceedings{shiraishi03_eurospeech,
  author={Tatsuya Shiraishi and Tomoki Toda and Hiromichi Kawanami and Hiroshi Saruwatari and Kiyohiro Shikano},
  title={{Simple designing methods of corpus-based visual speech synthesis}},
  year=2003,
  booktitle={Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003)},
  pages={2241--2244},
  doi={10.21437/Eurospeech.2003-627}
}