Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis

Nakano, Yuji; Tachibana, Makoto; Yamagishi, Junichi; Kobayashi, Takao

doi:10.21437/Interspeech.2006-587

Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis

Yuji Nakano, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi

This paper proposes a constrained structural maximum a posteriori linear regression (CSMAPLR) algorithm for further improvement of speaker adaptation performance in HMM-based speech synthesis. In the algorithm, the concept of structural maximum a posteriori (SMAP) adaptation is applied to estimation of transformation matrices of the constrained MLLR (CMLLR), where recursive MAP-based estimation of the transformation matrices from the root node to lower nodes of context decision tree is conducted. We incorporate the algorithm into HSMM-based speech synthesis system and show that CSMAPLR adaptation utilizes both of the advantage of CMLLR and SMAPLR adaptation from the result of objective evaluation test. We also show that CSMAPLR adaptation provides more similar synthetic speech to the target speaker than CMLLR and SMAPLR adaptation from the result of subjective evaluation test.

doi: 10.21437/Interspeech.2006-587

Cite as: Nakano, Y., Tachibana, M., Yamagishi, J., Kobayashi, T. (2006) Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis. Proc. Interspeech 2006, paper 1784-Thu1BuP.10, doi: 10.21437/Interspeech.2006-587

@inproceedings{nakano06b_interspeech,
  author={Yuji Nakano and Makoto Tachibana and Junichi Yamagishi and Takao Kobayashi},
  title={{Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1784-Thu1BuP.10},
  doi={10.21437/Interspeech.2006-587}
}