The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations

Metallinou, Angeliki; Yang, Zhaojun; Lee, Chi-chun; Busso, Carlos; Carnicke, Sharon; Narayanan, Shrikanth

doi:10.1007/s10579-015-9300-0

The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations

Original Paper
Published: 17 April 2015

Volume 50, pages 497–521, (2016)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Angeliki Metallinou¹,
Zhaojun Yang²,
Chi-chun Lee³,
Carlos Busso⁴,
Sharon Carnicke² &
…
Shrikanth Narayanan²

1288 Accesses
37 Citations
1 Altmetric
Explore all metrics

Abstract

Improvised acting is a viable technique to study expressive human communication and to shed light into actors’ creativity. The USC CreativeIT database provides a novel, freely-available multimodal resource for the study of theatrical improvisation and rich expressive human behavior (speech and body language) in dyadic interactions. The theoretical design of the database is based on the well-established improvisation technique of Active Analysis in order to provide naturally induced affective and expressive, goal-driven interactions. This database contains dyadic theatrical improvisations performed by 16 actors, providing detailed full body motion capture data and audio data of each participant in an interaction. The carefully engineered data collection, the improvisation design to elicit natural emotions and expressive speech and body language, as well as the well-developed annotation processes provide a gateway to study and model various aspects of theatrical performance, expressive behaviors and human communication and interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Corpus Creation and Perceptual Evaluation of Expressive Theatrical Gestures

A dyadic stimulus set of audiovisual affective displays for the study of multisensory, emotional, social interactions

Article Open access 05 November 2015

Lukasz Piwek, Karin Petrini & Frank Pollick

Human Nonverbal Behaviour Understanding in the Wild for New Media Art

Notes

HUMAINE is freely available at http://humaine-db.sspnet.eu/.
An alternative viewpoint to majority voting schemes is to explicitly model the diversity in these inherently subjective ratings when the ground truth is hidden from direct observation such as that proposed in Audhkhasi and Narayanan (2013).

References

Anolli, L., Mantovani, F., Mortillaro, M., Vescovo, A., Agliati, A., Confalonieri, L., Realdon, O., Zurloni, V., & Sacchi, A. (2005). A multimodal database as a background for emotional synthesis, recognition and training in e-learning systems. In Affective computing and intelligent interaction, pp. 566–573. Berlin: Springer.
Audhkhasi, K., & Narayanan, S. S. (2011). Emotion classification from speech using evaluator reliability-weighted combination of ranked lists. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4956–4959.
Audhkhasi, K., & Narayanan, S. (2013). A globally-variant locally-constant model for fusion of labels from multiple diverse experts without using reference labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 769–783.
Article Google Scholar
Bachorowski, J. A., Smoski, M. J., & Owren, M. J. (2001). The acoustic features of human laughter. The Journal of the Acoustical Society of America, 110(3), 1581–1597.
Article Google Scholar
Bänziger, T., & Scherer, K. R. (2007). Using actor portrayals to systematically study multimodal emotion expression: The GEMEP corpus. In Affective computing and intelligent interaction, pp. 476–487.
Beattie, G. (2004). Visible thought: The new psychology of body language. New York: Psychology Press.
Google Scholar
Busso, C., & Narayanan, S. (2008). Recording audio-visual emotional databases from actors: A closer look. In Second international workshop on emotion: Corpora for research on emotion and affect, international conference on language resources and evaluation, pp. 17–22.
Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335–359.
Article Google Scholar
Carnicke, S. M. (2009). Stanislavsky in focus: An acting master for the twenty-first century. London: Taylor & Francis.
Google Scholar
Cowie, R., & Sawey, M. (2011). GTrace-General trace program from Queen’s, Belfast.
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). ’feeltrace’: An instrument for recording perceived emotion in real time. In ISCA tutorial and research workshop (ITRW) on speech and emotion.
Cowie, R., McKeown, G., & Douglas-Cowie, E. (2012). Tracing emotion: An overview. International Journal of Synthetic Emotions (IJSE), 3(1), 1–17.
Article Google Scholar
Crane, E., & Gross, M. (2007). Motion capture and emotion: Affect detection in whole body movement. In Affective computing and intelligent interaction, pp. 95–101. Berlin: Springer.
Devillers, L., Cowie, R., Martin, J., Douglas-Cowie, E., Abrilian, S., & McRorie, M. (2006). Real life emotions in French and English tv video clips: An integrated annotation protocol combining continuous and discrete approaches. In 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy.
Dhall, A., Member, S., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia, 19(3), 34–41.
Article Google Scholar
Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., Martin, J. C., Devillers, L., Abrilian, S., Batliner, A., Amir, N., & Karpouzis, K. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In Affective computing and intelligent interaction, pp. 488–500. Berlin: Springer.
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1), 33–60.
Article Google Scholar
Enos, F., & Hirschberg, J. (2006). A framework for eliciting emotional speech: Capitalizing on the actors process. In First international workshop on emotion: Corpora for research on emotion and affect (international conference on language resources and evaluation (LREC 2006)), pp. 6–10.
Grafsgaard, J. F., Fulton, R. M., Boyer, K. E., Wiebe, E. N., & Lester, J. C. (2012). Multimodal analysis of the implicit affective channel in computer-mediated textual communication. In Proceedings of the 14th ACM international conference on multimodal interaction, pp. 145–152. New York: ACM.
Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio-visual emotional speech database. In 2008 IEEE international conference on multimedia and expo, pp. 865–868. New York: IEEE.
Harrigan, J., Rosenthal, R., & Scherer, K. (2005). The new handbook of methods in nonverbal behavior research. Oxford: Oxford University Press.
Google Scholar
Hayworth, D. (1928). The social origin and function of laughter. Psychological Review, 35(5), 367.
Article Google Scholar
Humphrey, G. (1924). The psychology of the gestalt. Journal of Educational Psychology, 15(7), 401.
Article Google Scholar
Johnstone, K. (1981). Impro: Improvisation and the theatre. London: Routledge.
Google Scholar
Kanluan, I., Grimm, M., & Kroschel, K. (2008). Audio-visual emotion recognition using an emotion space concept. In 16th European signal processing conference, Lausanne, Switzerland.
Kapur, A., Kapur, A., Virji-Babul, N., Tzanetakis, G., & Driessen, P. F. (2005). Gesture-based affective computing on motion capture data. In Affective computing and intelligent interaction, pp. 1–7. Berlin:Springer.
Kelly, S. D., Kravitz, C., & Hopkins, M. (2004). Neural correlates of bimodal speech and gesture comprehension. Brain and Language, 89(1), 253–260.
Article Google Scholar
Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T., et al. (2012). Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31.
Article Google Scholar
Lee, C. C., Busso, C., Lee, S., & Narayanan, S. S. (2009). Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. In INTERSPEECH, pp. 1983–1986.
Levine, S., Theobalt, C., & Koltun, V. (2009). Real-time prosody-driven synthesis of body language. ACM Transactions on Graphics (TOG), 28(5), 172.
Article Google Scholar
Lindahl, K. M. (2001). Methodological issues in family observational research. In: P. K. Kerig & K. M. Lindahl (Eds.), Family observational coding systems: Resources for systemic research (pp. 23–32). Mahwah, NJ:Lawrence Erlbaum Associates.
Malandrakis, N., Potamianos, A., Evangelopoulos, G., & Zlatintsi, A. (2011). A supervised approach to movie emotion tracking. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2376–2379. New York: IEEE.
McKeown, G., Curran, W., McLoughlin, C., Griffin, H. J., & Bianchi-Berthouze, N. (2013). Laughter induction techniques suitable for generating motion capture data of laughter associated body movements. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–5. New York: IEEE.
McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.
Article Google Scholar
Mendonca, D. J., & Wallace, W. A. (2007). A cognitive model of improvisation in emergency management. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 37(4), 547–561.
Article Google Scholar
Metallinou, A., & Narayanan, S. (2013). Annotation and processing of continuous emotional attributes: Challenges and opportunities. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–8. New York: IEEE.
Metallinou, A., Katsamanis, A., & Narayanan, S. (2013). Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, Special Issue on Continuous Affect Analysis, 31(2), 137–152.
Metallinou, A., Katsamanis, A., Wang, Y., & Narayanan, S. (2011). Tracking changes in continuous emotion states using body language and prosodic cues. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2288–2291. New Yok: IEEE.
Metallinou, A., Lee, C. C., Busso, C., Carnicke, S., Narayanan, S., & Tx, D. (2010). The USC CreativeIT database: A multimodal database of theatrical improvisation. In Workshop on Multimodal Corpora, LREC.
Narayanan, S., & Georgiou, P. G. (2013). Behavioral signal processing: Deriving human behavioral informatics from speech and language. Proceedings of the IEEE, 101(5), 1203–1233.
Article Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 689–696.
Niewiadomski, R., Hofmann, J., Urbain, J., Platt, T., Wagner, J., Piot, B., Cakmak, H., Pammi, S., Baur, T., & Dupont, S., et al. (2013). Laugh-aware virtual agent and its impact on user amusement. In Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp. 619–626. International Foundation for Autonomous Agents and Multiagent Systems.
Pelachaud, C., Carofiglio, V., De Carolis, B., de Rosis, F., & Poggi, I. (2002). Embodied contextual agent in information delivering application. In Proceedings of the first international joint conference on autonomous agents and multiagent systems: Part 2, pp. 758–765. New York: ACM.
Perlin, K., & Goldberg, A. (1996). Improv: A system for scripting interactive actors in virtual worlds. In Proceedings of the 23rd annual conference on computer graphics and interactive techniques, pp. 205–216. New York: ACM.
Sauter, D. A., Eisner, F., Ekman, P., & Scott, S. K. (2010). Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences, 107(6), 2408–2412.
Article Google Scholar
Scherer, K. R., Bänziger, T., & Roesch, E. (2010). A blueprint for affective computing: A sourcebook and manual. Oxford: Oxford University Press.
Google Scholar
Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2012). The belfast induced natural emotion database. IEEE Transactions on Affective Computing, 3(1), 32–41.
Article Google Scholar
Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42–55.
Article Google Scholar
Szameitat, D. P., Alter, K., Szameitat, A. J., Wildgruber, D., Sterr, A., & Darwin, C. J. (2009). Acoustic profiles of distinct emotional expressions in laughter. The Journal of the Acoustical Society of America, 126(1), 354–366.
Article Google Scholar
Wallbott, H. G., & Scherer, K. R. (1986). Cues and channels in emotion recognition. Journal of Personality and Social Psychology, 51(4), 690.
Article Google Scholar
Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.
Article Google Scholar
Yang, Z., & Narayanan, S. (2014). Analysis of emotional effect on speech-body gesture interplay. In Proceedings of Interspeech.
Yang, Z., Metallinou, A., & Narayanan, S. (2013). Towards body language generation in dyadic interaction settings from interlocutor multimodal cues. In Proceedings of ICASSP.
Yang, Z., Metallinou, A., Erzin, E., & Narayanan, S. (2014a). Analysis of interaction attitudes using data-driven hand gesture phrases. In Proceedings of ICASSP.
Yang, Z., Metallinou, A., & Narayanan, S. (2014b). Analysis and predictive modeling of body language behavior in dyadic interactions from multimodal interlocutor cues. IEEE Transactions on Multimedia, 16, 1766–1778.
Yang, Z., Ortega, A., & Narayanan, S. (2014c). Gesture dynamics modeling for attitude analysis using graph based transform. In Proceedings of IEEE international conference on image processing.
Yildirim, S., Narayanan, S., & Potamianos, A. (2011). Detecting emotional state of a child in a conversational computer game. Computer, Speech, and Language, 25, 29–44.
Article Google Scholar

Download references

Acknowledgments

This material is based upon work supported by DARPA and Space and Naval Warfare Systems Center Pacific under Contract Number N66001-11-C-4006 and the NSF.

Author information

Authors and Affiliations

Amazon Lab 126, Cupertino, CA, 95014, USA
Angeliki Metallinou
University of Southern California, Los Angeles, CA, 90089, USA
Zhaojun Yang, Sharon Carnicke & Shrikanth Narayanan
National Tsing Hua University, Hsinchu, Taiwan
Chi-chun Lee
University of Texas at Dallas, Richardson, TX, 75080, USA
Carlos Busso

Authors

Angeliki Metallinou
View author publications
You can also search for this author in PubMed Google Scholar
Zhaojun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chi-chun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Busso
View author publications
You can also search for this author in PubMed Google Scholar
Sharon Carnicke
View author publications
You can also search for this author in PubMed Google Scholar
Shrikanth Narayanan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaojun Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Metallinou, A., Yang, Z., Lee, Cc. et al. The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations. Lang Resources & Evaluation 50, 497–521 (2016). https://doi.org/10.1007/s10579-015-9300-0

Download citation

Published: 17 April 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10579-015-9300-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations

Abstract

Access this article

Similar content being viewed by others

Corpus Creation and Perceptual Evaluation of Expressive Theatrical Gestures

A dyadic stimulus set of audiovisual affective displays for the study of multisensory, emotional, social interactions

Human Nonverbal Behaviour Understanding in the Wild for New Media Art

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations

Abstract

Access this article

Similar content being viewed by others

Corpus Creation and Perceptual Evaluation of Expressive Theatrical Gestures

A dyadic stimulus set of audiovisual affective displays for the study of multisensory, emotional, social interactions

Human Nonverbal Behaviour Understanding in the Wild for New Media Art

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation