Recognising interest in conversational speech - comparing bag of frames and supra-segmental features

Schuller, Björn; Rigoll, Gerhard

doi:10.21437/Interspeech.2009-484

Recognising interest in conversational speech - comparing bag of frames and supra-segmental features

Björn Schuller, Gerhard Rigoll

It is common knowledge that affective and emotion-related states are acoustically well modelled on a supra-segmental level. Nonetheless successes are reported for frame-level processing either by means of dynamic classification or multi-instance learning techniques. In this work a quantitative feature-type-wise comparison between frame-level and supra-segmental analysis is carried out for the recognition of interest in human conversational speech. To shed light on the respective differences the same classifier, namely Support-Vector-Machines, is used in both cases: once by clustering a bag of frames of unknown sequence length employing Multi- Instance Learning techniques, and once by statistical functional application for the projection of the time series onto a static feature vector. As database serves the Audiovisual Interest Corpus of naturalistic interest.

doi: 10.21437/Interspeech.2009-484

Cite as: Schuller, B., Rigoll, G. (2009) Recognising interest in conversational speech - comparing bag of frames and supra-segmental features. Proc. Interspeech 2009, 1999-2002, doi: 10.21437/Interspeech.2009-484

@inproceedings{schuller09b_interspeech,
  author={Björn Schuller and Gerhard Rigoll},
  title={{Recognising interest in conversational speech - comparing bag of frames and supra-segmental features}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1999--2002},
  doi={10.21437/Interspeech.2009-484}
}