This paper explores several statistical pattern recognition techniques to classify utterances according to their emotional content. We have recorded a corpus containing emotional speech with over a 1000 utterances from different speakers. We present a new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour. To make maximal use of the limited amount of training data available, we introduce a novel pattern recognition technique: majority voting of subspace specialists. Using this technique, we obtain classification performance that is close to human performance on the task.
Cite as: Dellaert, F., Polzin, T., Waibel, A. (1996) Recognizing emotion in speech. Proc. 4th International Conference on Spoken Language Processing (ICSLP 1996), 1970-1973, doi: 10.21437/ICSLP.1996-462
@inproceedings{dellaert96_icslp, author={Frank Dellaert and Thomas Polzin and Alex Waibel}, title={{Recognizing emotion in speech}}, year=1996, booktitle={Proc. 4th International Conference on Spoken Language Processing (ICSLP 1996)}, pages={1970--1973}, doi={10.21437/ICSLP.1996-462} }