It is common knowledge that affective and emotion-related states are acoustically well modelled on a supra-segmental level. Nonetheless successes are reported for frame-level processing either by means of dynamic classification or multi-instance learning techniques. In this work a quantitative feature-type-wise comparison between frame-level and supra-segmental analysis is carried out for the recognition of interest in human conversational speech. To shed light on the respective differences the same classifier, namely Support-Vector-Machines, is used in both cases: once by clustering a bag of frames of unknown sequence length employing Multi- Instance Learning techniques, and once by statistical functional application for the projection of the time series onto a static feature vector. As database serves the Audiovisual Interest Corpus of naturalistic interest.
Cite as: Schuller, B., Rigoll, G. (2009) Recognising interest in conversational speech - comparing bag of frames and supra-segmental features. Proc. Interspeech 2009, 1999-2002, doi: 10.21437/Interspeech.2009-484
@inproceedings{schuller09b_interspeech, author={Björn Schuller and Gerhard Rigoll}, title={{Recognising interest in conversational speech - comparing bag of frames and supra-segmental features}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={1999--2002}, doi={10.21437/Interspeech.2009-484} }