ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service

F. R. McInnes, D. J. Attwater, Michael D. Edgington, Mark S. Schmidt, Mervyn A. Jack

Today’s automated telephone services generally use recorded speech from one speaker for all output. In applications with large and varying output vocabularies, such as place names, it may be necessary to employ a second speaker to provide new vocabulary items if the original speaker is not available, or to use text-tospeech (TTS) synthesis for the whole or parts of the output. This paper reports a comparison of 10 schemes for the generation of spoken output in a travel information service, ranging from natural speech from a single speaker, through combinations of different voices and of natural and synthetic speech, to TTS synthesis throughout. The results show strong preferences for concatenated speech over TTS and for quality recordings over amateur ones, and a weaker preference for a single speaker over two speakers.


doi: 10.21437/Eurospeech.1999-202

Cite as: McInnes, F.R., Attwater, D.J., Edgington, M.D., Schmidt, M.S., Jack, M.A. (1999) User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 831-834, doi: 10.21437/Eurospeech.1999-202

@inproceedings{mcinnes99_eurospeech,
  author={F. R. McInnes and D. J. Attwater and Michael D. Edgington and Mark S. Schmidt and Mervyn A. Jack},
  title={{User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={831--834},
  doi={10.21437/Eurospeech.1999-202}
}