ISCA Archive SpeechProsody 2018
ISCA Archive SpeechProsody 2018

Consistency of base frequency labelling for the F0 contour generation model using expressive emotional speech corpora

Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno

To investigate the consistency of base frequency (Fb) labelling of the F0 contour generation model for expressive and/or authentic emotional speech, a Fb labelling experiment was conducted using three trained labellers employing the parallel corpus of emotional speech, Online-gaming voice chat corpus with emotional labelling (OGVC). Twenty-four utterances from spontaneous dialog speech and emotion-acted speech in the OGVC were labelled with the Fb, phrase command, and accent command by the three labellers. A repeated measure analysis of variance was performed with the factor of the corpus type, gender, speaker, emotion, and labeller, for the Fb value of each utterance. The results show a significant main effect on gender, speaker, and emotion and the significant interaction between speaker and emotion. The results also indicate that the value of Fb varied when the different emotions were expressed, even when uttered by the same speaker. Moreover, the precise inspection for the Fb of each utterance suggests that the Fb also varied when the linguistic content of the utterances differed, even if the same emotion was expressed in those utterances.


doi: 10.21437/SpeechProsody.2018-81

Cite as: Arimoto, Y., Horiuchi, Y., Ohno, S. (2018) Consistency of base frequency labelling for the F0 contour generation model using expressive emotional speech corpora. Proc. Speech Prosody 2018, 398-402, doi: 10.21437/SpeechProsody.2018-81

@inproceedings{arimoto18_speechprosody,
  author={Yoshiko Arimoto and Yasuo Horiuchi and Sumio Ohno},
  title={{Consistency of base frequency labelling for the F0 contour generation model using expressive emotional speech corpora}},
  year=2018,
  booktitle={Proc. Speech Prosody 2018},
  pages={398--402},
  doi={10.21437/SpeechProsody.2018-81}
}