Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding

Cernak, Milos; Lazaridis, Alexandros; Garner, Philip N.; Motlicek, Petr

doi:10.21437/Interspeech.2014-587

Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding

Milos Cernak, Alexandros Lazaridis, Philip N. Garner, Petr Motlicek

In this paper, we propose a solution to reconstruct stress and accent contextual factors at the receiver of a very low bitrate speech codec built on recognition/synthesis architecture. In speech synthesis, accent and stress symbols are predicted from the text, which is not available at the receiver side of the speech codec. Therefore, speech signal-based symbols, generated as syllable-level log average F0 and energy acoustic measures, quantized using a scalar quantization, are used instead of accentual and stress symbols for HMM-based speech synthesis. Results from incremental real-time speech synthesis confirmed, that a combination of F0 and energy signal-based symbols can replace their counterparts of text-based binary accent and stress symbols developed for text-to-speech systems. The estimated transmission bit-rate overhead is about 14 bits/second per acoustic measure.

doi: 10.21437/Interspeech.2014-587

Cite as: Cernak, M., Lazaridis, A., Garner, P.N., Motlicek, P. (2014) Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding. Proc. Interspeech 2014, 2799-2803, doi: 10.21437/Interspeech.2014-587

@inproceedings{cernak14_interspeech,
  author={Milos Cernak and Alexandros Lazaridis and Philip N. Garner and Petr Motlicek},
  title={{Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2799--2803},
  doi={10.21437/Interspeech.2014-587}
}