ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM

Takashi Nose, Takao Kobayashi

This paper describes a technique for modeling and controlling emotional expressivity of speech in HMM-based speech synthesis. A problem of conventional emotional speech synthesis based on HMM is that the intensity of an emotional expression appearing in synthetic speech completely depends on the database used for model training. To take into account the emotional expressivity that listeners actually perceive, the perceptual expressivity scores are introduced into a style control technique based on multipleregression hidden semi-Markov model (MRHSMM). The objective and subjective evaluation results show that the proposed technique works well when there is a large bias of emotional expressivity in the training data.


doi: 10.21437/Interspeech.2011-28

Cite as: Nose, T., Kobayashi, T. (2011) A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM. Proc. Interspeech 2011, 109-112, doi: 10.21437/Interspeech.2011-28

@inproceedings{nose11_interspeech,
  author={Takashi Nose and Takao Kobayashi},
  title={{A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={109--112},
  doi={10.21437/Interspeech.2011-28}
}