Skip to main content
Log in

The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Improvised acting is a viable technique to study expressive human communication and to shed light into actors’ creativity. The USC CreativeIT database provides a novel, freely-available multimodal resource for the study of theatrical improvisation and rich expressive human behavior (speech and body language) in dyadic interactions. The theoretical design of the database is based on the well-established improvisation technique of Active Analysis in order to provide naturally induced affective and expressive, goal-driven interactions. This database contains dyadic theatrical improvisations performed by 16 actors, providing detailed full body motion capture data and audio data of each participant in an interaction. The carefully engineered data collection, the improvisation design to elicit natural emotions and expressive speech and body language, as well as the well-developed annotation processes provide a gateway to study and model various aspects of theatrical performance, expressive behaviors and human communication and interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. HUMAINE is freely available at http://humaine-db.sspnet.eu/.

  2. An alternative viewpoint to majority voting schemes is to explicitly model the diversity in these inherently subjective ratings when the ground truth is hidden from direct observation such as that proposed in Audhkhasi and Narayanan (2013).

References

  • Anolli, L., Mantovani, F., Mortillaro, M., Vescovo, A., Agliati, A., Confalonieri, L., Realdon, O., Zurloni, V., & Sacchi, A. (2005). A multimodal database as a background for emotional synthesis, recognition and training in e-learning systems. In Affective computing and intelligent interaction, pp. 566–573. Berlin: Springer.

  • Audhkhasi, K., & Narayanan, S. S. (2011). Emotion classification from speech using evaluator reliability-weighted combination of ranked lists. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4956–4959.

  • Audhkhasi, K., & Narayanan, S. (2013). A globally-variant locally-constant model for fusion of labels from multiple diverse experts without using reference labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 769–783.

    Article  Google Scholar 

  • Bachorowski, J. A., Smoski, M. J., & Owren, M. J. (2001). The acoustic features of human laughter. The Journal of the Acoustical Society of America, 110(3), 1581–1597.

    Article  Google Scholar 

  • Bänziger, T., & Scherer, K. R. (2007). Using actor portrayals to systematically study multimodal emotion expression: The GEMEP corpus. In Affective computing and intelligent interaction, pp. 476–487.

  • Beattie, G. (2004). Visible thought: The new psychology of body language. New York: Psychology Press.

    Google Scholar 

  • Busso, C., & Narayanan, S. (2008). Recording audio-visual emotional databases from actors: A closer look. In Second international workshop on emotion: Corpora for research on emotion and affect, international conference on language resources and evaluation, pp. 17–22.

  • Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335–359.

    Article  Google Scholar 

  • Carnicke, S. M. (2009). Stanislavsky in focus: An acting master for the twenty-first century. London: Taylor & Francis.

    Google Scholar 

  • Cowie, R., & Sawey, M. (2011). GTrace-General trace program from Queen’s, Belfast.

  • Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). ’feeltrace’: An instrument for recording perceived emotion in real time. In ISCA tutorial and research workshop (ITRW) on speech and emotion.

  • Cowie, R., McKeown, G., & Douglas-Cowie, E. (2012). Tracing emotion: An overview. International Journal of Synthetic Emotions (IJSE), 3(1), 1–17.

    Article  Google Scholar 

  • Crane, E., & Gross, M. (2007). Motion capture and emotion: Affect detection in whole body movement. In Affective computing and intelligent interaction, pp. 95–101. Berlin: Springer.

  • Devillers, L., Cowie, R., Martin, J., Douglas-Cowie, E., Abrilian, S., & McRorie, M. (2006). Real life emotions in French and English tv video clips: An integrated annotation protocol combining continuous and discrete approaches. In 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy.

  • Dhall, A., Member, S., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia, 19(3), 34–41.

    Article  Google Scholar 

  • Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., Martin, J. C., Devillers, L., Abrilian, S., Batliner, A., Amir, N., & Karpouzis, K. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In Affective computing and intelligent interaction, pp. 488–500. Berlin: Springer.

  • Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1), 33–60.

    Article  Google Scholar 

  • Enos, F., & Hirschberg, J. (2006). A framework for eliciting emotional speech: Capitalizing on the actors process. In First international workshop on emotion: Corpora for research on emotion and affect (international conference on language resources and evaluation (LREC 2006)), pp. 6–10.

  • Grafsgaard, J. F., Fulton, R. M., Boyer, K. E., Wiebe, E. N., & Lester, J. C. (2012). Multimodal analysis of the implicit affective channel in computer-mediated textual communication. In Proceedings of the 14th ACM international conference on multimodal interaction, pp. 145–152. New York: ACM.

  • Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio-visual emotional speech database. In 2008 IEEE international conference on multimedia and expo, pp. 865–868. New York: IEEE.

  • Harrigan, J., Rosenthal, R., & Scherer, K. (2005). The new handbook of methods in nonverbal behavior research. Oxford: Oxford University Press.

    Google Scholar 

  • Hayworth, D. (1928). The social origin and function of laughter. Psychological Review, 35(5), 367.

    Article  Google Scholar 

  • Humphrey, G. (1924). The psychology of the gestalt. Journal of Educational Psychology, 15(7), 401.

    Article  Google Scholar 

  • Johnstone, K. (1981). Impro: Improvisation and the theatre. London: Routledge.

    Google Scholar 

  • Kanluan, I., Grimm, M., & Kroschel, K. (2008). Audio-visual emotion recognition using an emotion space concept. In 16th European signal processing conference, Lausanne, Switzerland.

  • Kapur, A., Kapur, A., Virji-Babul, N., Tzanetakis, G., & Driessen, P. F. (2005). Gesture-based affective computing on motion capture data. In Affective computing and intelligent interaction, pp. 1–7. Berlin:Springer.

  • Kelly, S. D., Kravitz, C., & Hopkins, M. (2004). Neural correlates of bimodal speech and gesture comprehension. Brain and Language, 89(1), 253–260.

    Article  Google Scholar 

  • Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T., et al. (2012). Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31.

    Article  Google Scholar 

  • Lee, C. C., Busso, C., Lee, S., & Narayanan, S. S. (2009). Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. In INTERSPEECH, pp. 1983–1986.

  • Levine, S., Theobalt, C., & Koltun, V. (2009). Real-time prosody-driven synthesis of body language. ACM Transactions on Graphics (TOG), 28(5), 172.

    Article  Google Scholar 

  • Lindahl, K. M. (2001). Methodological issues in family observational research. In: P. K. Kerig & K. M. Lindahl (Eds.), Family observational coding systems: Resources for systemic research (pp. 23–32). Mahwah, NJ:Lawrence Erlbaum Associates.

  • Malandrakis, N., Potamianos, A., Evangelopoulos, G., & Zlatintsi, A. (2011). A supervised approach to movie emotion tracking. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2376–2379. New York: IEEE.

  • McKeown, G., Curran, W., McLoughlin, C., Griffin, H. J., & Bianchi-Berthouze, N. (2013). Laughter induction techniques suitable for generating motion capture data of laughter associated body movements. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–5. New York: IEEE.

  • McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.

    Article  Google Scholar 

  • Mendonca, D. J., & Wallace, W. A. (2007). A cognitive model of improvisation in emergency management. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 37(4), 547–561.

    Article  Google Scholar 

  • Metallinou, A., & Narayanan, S. (2013). Annotation and processing of continuous emotional attributes: Challenges and opportunities. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–8. New York: IEEE.

  • Metallinou, A., Katsamanis, A., & Narayanan, S. (2013). Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, Special Issue on Continuous Affect Analysis, 31(2), 137–152.

  • Metallinou, A., Katsamanis, A., Wang, Y., & Narayanan, S. (2011). Tracking changes in continuous emotion states using body language and prosodic cues. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2288–2291. New Yok: IEEE.

  • Metallinou, A., Lee, C. C., Busso, C., Carnicke, S., Narayanan, S., & Tx, D. (2010). The USC CreativeIT database: A multimodal database of theatrical improvisation. In Workshop on Multimodal Corpora, LREC.

  • Narayanan, S., & Georgiou, P. G. (2013). Behavioral signal processing: Deriving human behavioral informatics from speech and language. Proceedings of the IEEE, 101(5), 1203–1233.

    Article  Google Scholar 

  • Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 689–696.

  • Niewiadomski, R., Hofmann, J., Urbain, J., Platt, T., Wagner, J., Piot, B., Cakmak, H., Pammi, S., Baur, T., & Dupont, S., et al. (2013). Laugh-aware virtual agent and its impact on user amusement. In Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp. 619–626. International Foundation for Autonomous Agents and Multiagent Systems.

  • Pelachaud, C., Carofiglio, V., De Carolis, B., de Rosis, F., & Poggi, I. (2002). Embodied contextual agent in information delivering application. In Proceedings of the first international joint conference on autonomous agents and multiagent systems: Part 2, pp. 758–765. New York: ACM.

  • Perlin, K., & Goldberg, A. (1996). Improv: A system for scripting interactive actors in virtual worlds. In Proceedings of the 23rd annual conference on computer graphics and interactive techniques, pp. 205–216. New York: ACM.

  • Sauter, D. A., Eisner, F., Ekman, P., & Scott, S. K. (2010). Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences, 107(6), 2408–2412.

    Article  Google Scholar 

  • Scherer, K. R., Bänziger, T., & Roesch, E. (2010). A blueprint for affective computing: A sourcebook and manual. Oxford: Oxford University Press.

    Google Scholar 

  • Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2012). The belfast induced natural emotion database. IEEE Transactions on Affective Computing, 3(1), 32–41.

    Article  Google Scholar 

  • Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42–55.

    Article  Google Scholar 

  • Szameitat, D. P., Alter, K., Szameitat, A. J., Wildgruber, D., Sterr, A., & Darwin, C. J. (2009). Acoustic profiles of distinct emotional expressions in laughter. The Journal of the Acoustical Society of America, 126(1), 354–366.

    Article  Google Scholar 

  • Wallbott, H. G., & Scherer, K. R. (1986). Cues and channels in emotion recognition. Journal of Personality and Social Psychology, 51(4), 690.

    Article  Google Scholar 

  • Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.

    Article  Google Scholar 

  • Yang, Z., & Narayanan, S. (2014). Analysis of emotional effect on speech-body gesture interplay. In Proceedings of Interspeech.

  • Yang, Z., Metallinou, A., & Narayanan, S. (2013). Towards body language generation in dyadic interaction settings from interlocutor multimodal cues. In Proceedings of ICASSP.

  • Yang, Z., Metallinou, A., Erzin, E., & Narayanan, S. (2014a). Analysis of interaction attitudes using data-driven hand gesture phrases. In Proceedings of ICASSP.

  • Yang, Z., Metallinou, A., & Narayanan, S. (2014b). Analysis and predictive modeling of body language behavior in dyadic interactions from multimodal interlocutor cues. IEEE Transactions on Multimedia, 16, 1766–1778.

  • Yang, Z., Ortega, A., & Narayanan, S. (2014c). Gesture dynamics modeling for attitude analysis using graph based transform. In Proceedings of IEEE international conference on image processing.

  • Yildirim, S., Narayanan, S., & Potamianos, A. (2011). Detecting emotional state of a child in a conversational computer game. Computer, Speech, and Language, 25, 29–44.

    Article  Google Scholar 

Download references

Acknowledgments

This material is based upon work supported by DARPA and Space and Naval Warfare Systems Center Pacific under Contract Number N66001-11-C-4006 and the NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaojun Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Metallinou, A., Yang, Z., Lee, Cc. et al. The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations. Lang Resources & Evaluation 50, 497–521 (2016). https://doi.org/10.1007/s10579-015-9300-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-015-9300-0

Keywords

Navigation