ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Speech synthesis based on articulatory-movement HMMs with voice-source codebooks

Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada

Speech synthesis based on one-model of articulatory movement HMMs, that are commonly applied to both speech recognition (SR) and speech synthesis (SS), is described. In an SS module, speaker-invariant HMMs are applied to generate an articulatory feature (AF) sequence, and then, after converting AFs into vocal tract parameters by using a multilayer neural network (MLN), a speech signal is synthesized through an LSP digital filter. The CELP coding technique is applied to improve voice-sources when generating these sources from embedded codes in the corresponding state of HMMs. The proposed SS module separates phonetic information and the individuality of a speaker. Therefore, the targeted speaker's voice can be synthesized with a small amount of speech data. In the experiments, we carried out listening tests for ten subjects and evaluated both of sound quality and individuality of synthesized speech. As a result, we confirmed that the proposed SS module could produce good quality speech of the targeted speaker even when the training was done with the data set of two-sentences.


doi: 10.21437/Interspeech.2011-43

Cite as: Nitta, T., Onoda, T., Kimura, M., Iribe, Y., Katsurada, K. (2011) Speech synthesis based on articulatory-movement HMMs with voice-source codebooks. Proc. Interspeech 2011, 1841-1844, doi: 10.21437/Interspeech.2011-43

@inproceedings{nitta11_interspeech,
  author={Tsuneo Nitta and Takayuki Onoda and Masashi Kimura and Yurie Iribe and Kouichi Katsurada},
  title={{Speech synthesis based on articulatory-movement HMMs with voice-source codebooks}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={1841--1844},
  doi={10.21437/Interspeech.2011-43}
}