An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis

Hu, Qiong; Stylianou, Yannis; Maia, Ranniery; Richmond, Korin; Yamagishi, Junichi; Latorre, Javier

doi:10.21437/Interspeech.2014-180

An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis

Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, Javier Latorre

This paper applies a dynamic sinusoidal synthesis model to statistical parametric speech synthesis (HTS). For this, we utilise regularised cepstral coefficients to represent both the static amplitude and dynamic slope of selected sinusoids for statistical modelling. During synthesis, a dynamic sinusoidal model is used to reconstruct speech. A preference test is conducted to compare the selection of different sinusoids for cepstral representation. Our results show that when integrated with HTS, a relatively small number of sinusoids selected according to a perceptual criterion can produce quality comparable to using all harmonics. A Mean Opinion Score (MOS) test shows that our proposed statistical system is preferred to one using mel-cepstra from pitch synchronous spectral analysis.

doi: 10.21437/Interspeech.2014-180

Cite as: Hu, Q., Stylianou, Y., Maia, R., Richmond, K., Yamagishi, J., Latorre, J. (2014) An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis. Proc. Interspeech 2014, 780-784, doi: 10.21437/Interspeech.2014-180

@inproceedings{hu14b_interspeech,
  author={Qiong Hu and Yannis Stylianou and Ranniery Maia and Korin Richmond and Junichi Yamagishi and Javier Latorre},
  title={{An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={780--784},
  doi={10.21437/Interspeech.2014-180}
}