Skip to main content
Log in

The JESTKOD database: an affective multimodal database of dyadic interactions

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In human-to-human communication, gesture and speech co-exist in time with a tight synchrony, and gestures are often utilized to complement or to emphasize speech. In human–computer interaction systems, natural, affective and believable use of gestures would be a valuable key component in adopting and emphasizing human-centered aspects. However, natural and affective multimodal data, for studying computational models of gesture and speech, is limited. In this study, we introduce the JESTKOD database, which consists of speech and full-body motion capture data recordings in dyadic interaction setting under agreement and disagreement scenarios. Participants of the dyadic interactions are native Turkish speakers and recordings of each participant are rated in dimensional affect space. We present our multimodal data collection and annotation process, as well as our preliminary experimental studies on agreement/disagreement classification of dyadic interactions using body gesture and speech data. The JESTKOD database provides a valuable asset to investigate gesture and speech towards designing more natural and affective human–computer interaction systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Flex 13 system—http://www.optitrack.com/products/flex-13/.

  2. Motive—optical motion capture software http://www.optitrack.com/products/motive/.

  3. The JESTKOD project is supported by TÜBİTAK under Grant Number 113E102.

  4. The JESTKOD database—http://mvgl.ku.edu.tr/databases/.

References

  • Bavelas, J. B., Chovil, N., Coates, L., & Roe, L. (1995). Gestures specialized for dialogue. Personality and Social Psychology Bulletin, 21(4), 394–405.

    Article  Google Scholar 

  • Bousmalis, K., Mehu, M., & Pantic, M. (2009) Spotting agreement and disagreement: A survey of nonverbal audiovisual cues and tools. In 3rd International conference on affective computing and intelligent interaction and workshops (pp. 1–9).

  • Bousmalis, K., Morency, L., & Pantic, M. (2011). Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In IEEE international conference on automatic face gesture recognition and workshops (FG 2011) (pp. 746–752).

  • Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335–359.

    Article  Google Scholar 

  • Busso, C., Parthasarathy, S., Burmania, A., Abdel-Wahab, M., Sadoughi, N., & Provost, E. M. (2016). MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 10(99), 1–1.

    Google Scholar 

  • Carletta, J. (2007). Unleashing the killer corpus: Experiences in creating the multi-everything ami meeting corpus. Language Resources and Evaluation, 41(2), 181–190.

    Article  Google Scholar 

  • Cowie, R., Cox, C., Martin, J. C., Batliner, A., Heylen, D., & Karpouzis, K. (2011). Issues in data labelling. In P. Petta, C. Pelachaud, & R. Cowie (Eds.), Emotion-oriented systems: The Humaine handbook (pp. 215–244). Berlin: Springer.

    Chapter  Google Scholar 

  • Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., et al. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In A. Paiva, R. Prada, & R. Picard (Eds.), Affective computing and intelligent interaction. Lecture notes in computer science (Vol. 4738, pp. 488–500). Berlin: Springer.

    Chapter  Google Scholar 

  • Ekman, P., & Friesen, W. (1975). Unmasking the face: A guide to recognizing emotions from facial clues. Spectrum books. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Galley, M., McKeown, K., Hirschberg, J., & Shriberg, E. (2004) Identifying agreement and disagreement in conversational speech: Use of Bayesian networks to model pragmatic dependencies. In Proceedings of the 42nd annual meeting on association for computational linguistics (p. 669). Association for Computational Linguistics, Stroudsburg, PA, USA.

  • Grandjean, D., Sander, D., & Scherer, K. R. (2008). Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization. Consciousness and Cognition, 17(2), 484–495.

    Article  Google Scholar 

  • Grimm, M., Kroschel, K., & Narayanan, S. (2008). The vera am mittag German audio–visual emotional speech database. In IEEE international conference on multimedia and expo (pp. 865–868).

  • Gunes, H., Schuller, B., Pantic, M., & Cowie. R. (2011). Emotion representation, analysis and synthesis in continuous space: A survey. In 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011) (pp. 827–834). IEEE.

  • Heloir, A., Neff, M., & Kipp, M. (2010). Exploiting motion capture for virtual human animation: Data collection and annotation visualization. In Proceedings of the workshop on multimodal corpora: Advances in capturing, coding and analyzing multimodality (pp. 59–62).

  • Hillard, D., Ostendorf, M., & Shriberg, E. (2003). Detection of agreement vs. disagreement in meetings: Training with unlabeled data. In Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology: Companion volume of the proceedings of HLT-NAACL 2003: Short Papers (Vol. 2, pp. 34–36). Association for Computational Linguistics, Stroudsburg, PA, USA.

  • Khaki, H., & Erzin, E. (2016). Use of agreement/disagreement classification in dyadic interactions for continuous emotion recognition. In Proceedings of Interspeech, San Francisco, USA.

  • Kim, S., Valente, F., & Vinciarelli, A. (2012). Automatic detection of conflicts in spoken conversations: Ratings and analysis of broadcast political debates. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5089–5092).

  • McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., & Zhang, D. (2005). Automatic analysis of multimodal group actions in meetings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 305–317.

    Article  Google Scholar 

  • McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.

    Article  Google Scholar 

  • Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.

    Article  Google Scholar 

  • Mehu, M., & Maaten, L. (2014). Multimodal integration of dynamic audio–visual cues in the communication of agreement and disagreement. Journal of Nonverbal Behavior, 38(4), 569–597.

    Article  Google Scholar 

  • Metallinou, A., Lee, C. C., Busso, C., Carnicke, S., & Narayanan, S. S. (2010). The USC CreativeIT database : A multimodal database of theatrical improvisation. In Multimodal corpora: Advances in capturing, coding and analyzing multimodality (MMC).

  • Metallinou, A., Yang, Z., Lee, C. C., Busso, C., Carnicke, S., & Narayanan, S. S. (2015). The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations. Language Resources and Evaluation, 50(3), 497–521.

  • Metallinou, A., Katsamanis, A., & Narayanan, S. (2013). Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, 31(2), 137–152.

    Article  Google Scholar 

  • Poggi, I., D’Errico, F., & Vincze, L. (2010). Agreement and its multimodal communication in debates: A qualitative analysis. Cognitive Computation, 3(3), 466–479.

    Article  Google Scholar 

  • Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.

    Article  Google Scholar 

  • Scherer, K. R., Schorr, A., & Johnstone, T. (2001). Appraisal processes in emotion: Theory, methods, research. Oxford: Oxford University Press.

    Google Scholar 

  • Vinciarelli, A., Dielmann, A., Favre, S., & Salamin, H. (2009). Canal9: A database of political debates for analysis of social interactions. In Proceedings of the international conference on affective computing and intelligent interaction, ACII ’09 (pp. 1–4).

  • Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.

    Article  Google Scholar 

  • Wang, W., Yaman, S., Precoda, K., & Richey, C. (2011). Automatic identification of speaker role and agreement/disagreement in broadcast conversation. In IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 5556–5559).

  • Yang, Z., & Narayanan, S. S. (2014). Analysis of emotional effect on speech-body gesture interplay. In Proceedings of Interspeech (pp. 1934–1938). Singapore.

  • Yang, Z., & Narayanan, S. S. (2016). Modeling dynamics of expressive body gestures in dyadic interactions. IEEE Transactions on Affective Computing. doi:10.1109/TAFFC.2016.2542812.

  • Yang, Z., Metallinou, A., Erzin, E., & Narayanan, S. S. (2014). Analysis of interaction attitudes using data-driven hand gesture phrases. In Proceedings of IEEE international conference on audio, speech and signal processing (ICASSP) (pp. 699–703). Florence, Italy.

Download references

Acknowledgements

This work is supported by TÜBİTAK under Grant Number 113E102.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Engin Erzin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bozkurt, E., Khaki, H., Keçeci, S. et al. The JESTKOD database: an affective multimodal database of dyadic interactions. Lang Resources & Evaluation 51, 857–872 (2017). https://doi.org/10.1007/s10579-016-9377-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-016-9377-0

Keywords

Navigation