A perceptual study for modelling speaker-dependent intonation in TTS and dialog systems

Mersdorf, Joachim J.; Domhover, Thomas

doi:10.21437/Eurospeech.1997-291

A perceptual study for modelling speaker-dependent intonation in TTS and dialog systems

Joachim J. Mersdorf, Thomas Domhover

In general, most of the developed prosody and intonation models were obtained from a statistical analysis of F0 curves and resynthesis by TTS. But there is yet another chance improving quality and naturalness: effective results can also be obtained by analysing the listeners' common sense about natural intonational behavior. Therefore, we use a digital process that generates signals representing only the melody of the original speech signal. Comprehensive listening experiments become possible to analyse and compare the perception of natural and synthetic intonation. Based on the results of some listening experiments a statistical analysis of the F0 curves was carried out, regarding that a speaker-individual intonation model needs more quantitative F0 information than traditional descriptions. The aim is an prosodical speaker-dependent model for synthetic speech and dialog systems. Furthermore, this flexible approach should not be limited to speaker-individual intonation.

doi: 10.21437/Eurospeech.1997-291

Cite as: Mersdorf, J.J., Domhover, T. (1997) A perceptual study for modelling speaker-dependent intonation in TTS and dialog systems. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 867-870, doi: 10.21437/Eurospeech.1997-291

@inproceedings{mersdorf97_eurospeech,
  author={Joachim J. Mersdorf and Thomas Domhover},
  title={{A perceptual study for modelling speaker-dependent intonation in TTS and dialog systems}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={867--870},
  doi={10.21437/Eurospeech.1997-291}
}