Duration modeling is a fundamental task of prosody generation for Text To Speech (TTS) systems. The objective of this task is to predict the duration of a speech unit from its phonological representation. Duration modeling has a significant influence on the intelligibility and the naturalness of the synthesized speech. This paper presents a Neural Network (NN) based approach to predict the duration of Arabic phonemes. The developed model utilizes neural networks to map the relation between the phonological features and duration values.
Cite as: Hifny, Y., Rashwan, M. (2002) Duration modeling for arabic text to speech synthesis. Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), 1773-1776, doi: 10.21437/ICSLP.2002-527
@inproceedings{hifny02_icslp, author={Yasser Hifny and Mohsen Rashwan}, title={{Duration modeling for arabic text to speech synthesis}}, year=2002, booktitle={Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002)}, pages={1773--1776}, doi={10.21437/ICSLP.2002-527} }