It has been shown that in natural speech filled pauses can be beneficial to a listener. In this paper, we attempt to discover whether listeners react in a similar way to filled pauses in synthetic and vocoded speech compared to natural speech. We present two experiments focusing on reaction time to a target word. In the first, we replicate earlier work in natural speech, namely that listeners respond faster to a target word following a filled pause than following a silent pause. This is replicated in vocoded but not in synthetic speech. Our second experiment investigates the effect of speaking rate on reaction times as this was potentially a confounding factor in the first experiment. Evidence suggests that slower speech rates lead to slower reaction times in synthetic and in natural speech. Moreover, in synthetic speech the response to a target word after a filled pause is slower than after a silent pause. This finding, combined with an overall slower reaction time, demonstrates a shortfall in current synthesis techniques. Remedying this could help make synthesis less demanding and more pleasant for the listener, and reaction time experiments could thus provide a measure of improvement in synthesis techniques.
Cite as: Dall, R., Wester, M., Corley, M. (2014) The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech. Proc. Interspeech 2014, 56-60, doi: 10.21437/Interspeech.2014-12
@inproceedings{dall14b_interspeech, author={Rasmus Dall and Mirjam Wester and Martin Corley}, title={{The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech}}, year=2014, booktitle={Proc. Interspeech 2014}, pages={56--60}, doi={10.21437/Interspeech.2014-12} }