In this paper, we propose a solution to reconstruct stress and accent contextual factors at the receiver of a very low bitrate speech codec built on recognition/synthesis architecture. In speech synthesis, accent and stress symbols are predicted from the text, which is not available at the receiver side of the speech codec. Therefore, speech signal-based symbols, generated as syllable-level log average F0 and energy acoustic measures, quantized using a scalar quantization, are used instead of accentual and stress symbols for HMM-based speech synthesis. Results from incremental real-time speech synthesis confirmed, that a combination of F0 and energy signal-based symbols can replace their counterparts of text-based binary accent and stress symbols developed for text-to-speech systems. The estimated transmission bit-rate overhead is about 14 bits/second per acoustic measure.
Cite as: Cernak, M., Lazaridis, A., Garner, P.N., Motlicek, P. (2014) Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding. Proc. Interspeech 2014, 2799-2803, doi: 10.21437/Interspeech.2014-587
@inproceedings{cernak14_interspeech, author={Milos Cernak and Alexandros Lazaridis and Philip N. Garner and Petr Motlicek}, title={{Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding}}, year=2014, booktitle={Proc. Interspeech 2014}, pages={2799--2803}, doi={10.21437/Interspeech.2014-587} }