Text-to-speech conversion with neural networks: a recurrent TDNN approach

Karaali, Orhan; Corrigan, Gerald; Gerson, Ira; Massey, Noel

doi:10.21437/Eurospeech.1997-209

Text-to-speech conversion with neural networks: a recurrent TDNN approach

Orhan Karaali, Gerald Corrigan, Ira Gerson, Noel Massey

This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries. Recurrent data input also helps smooth the output parameter tracks. Independent testing has demonstrated that the voice quality produced by this system compares favorably with speech from existing commercial text-to-speech systems.

doi: 10.21437/Eurospeech.1997-209

Cite as: Karaali, O., Corrigan, G., Gerson, I., Massey, N. (1997) Text-to-speech conversion with neural networks: a recurrent TDNN approach. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 561-564, doi: 10.21437/Eurospeech.1997-209

@inproceedings{karaali97_eurospeech,
  author={Orhan Karaali and Gerald Corrigan and Ira Gerson and Noel Massey},
  title={{Text-to-speech conversion with neural networks: a recurrent TDNN approach}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={561--564},
  doi={10.21437/Eurospeech.1997-209}
}