Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort

Raitio, Tuomo; Suni, Antti; Juvela, Lauri; Vainio, Martti; Alku, Paavo

doi:10.21437/Interspeech.2014-444

Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort

Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, Paavo Alku

This paper studies a deep neural network (DNN) based voice source modelling method in the synthesis of speech with varying vocal effort. The new trainable voice source model learns a mapping between the acoustic features and the time-domain pitch-synchronous glottal flow waveform using a DNN. The voice source model is trained with various speech material from breathy, normal, and Lombard speech. In synthesis, a normal voice is first adapted to a desired style, and using the flexible DNN-based voice source model, a style-specific excitation waveform is automatically generated based on the adapted acoustic features. The proposed voice source model is compared to a robust and high-quality excitation modelling method based on manually selected mean glottal flow pulses for each vocal effort level and using a spectral matching filter to correctly match the voice source spectrum to a desired style. Subjective evaluations show that the proposed DNN-based method is rated comparable to the baseline method, but avoids the manual selection of the pulses and is computationally faster than a system using a spectral matching filter.

doi: 10.21437/Interspeech.2014-444

Cite as: Raitio, T., Suni, A., Juvela, L., Vainio, M., Alku, P. (2014) Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort. Proc. Interspeech 2014, 1969-1973, doi: 10.21437/Interspeech.2014-444

@inproceedings{raitio14_interspeech,
  author={Tuomo Raitio and Antti Suni and Lauri Juvela and Martti Vainio and Paavo Alku},
  title={{Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1969--1973},
  doi={10.21437/Interspeech.2014-444}
}