Skip to main content

Inter-person Intra-modality Attention Based Model for Dyadic Interaction Engagement Prediction

  • Conference paper
  • First Online:
Social Computing and Social Media (HCII 2023)

Abstract

With the rapid development of artificial agents, more researchers have explored the importance of user engagement level prediction. Real-time user engagement level prediction assists the agent in properly adjusting its policy for the interaction. However, the existing engagement modeling lacks the element of interpersonal synchrony, a temporal behavior alignment closely related to the engagement level. Part of this is because the synchrony phenomenon is complex and hard to delimit. With this background, we aim to develop a model suitable for temporal interpersonal features with the help of the modern data-driven machine learning method. Based on previous studies, we select multiple non-verbal modalities of dyadic interactions as predictive features and design a multi-stream attention model to capture the interpersonal temporal relationship of each modality. Furthermore, we experiment with two additional embedding schemas according to the synchrony definitions in psychology. Finally, we compare our model with a conventional structure that emphasizes the multimodal features within an individual. Our experiments showed the effectiveness of the intra-modal inter-person design in engagement prediction. However, the attempt to manipulate the embeddings failed to improve the performance. In the end, we discuss the experiment result and elaborate on the limitations of our work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdelrahman, A.A., Strazdas, D., Khalifa, A., Hintz, J., Hempel, T., Al-Hamadi, A.: Multimodal engagement prediction in multiperson human-robot interaction. IEEE Access 10, 61980–61991 (2022). https://doi.org/10.1109/ACCESS.2022.3182469

    Article  Google Scholar 

  2. Bernieri, F., Reznick, J., Rosenthal, R.: Synchrony, pseudosynchrony, and dissynchrony: measuring the entrainment process in mother-infant interactions. J. Pers. Soc. Psychol. 54, 243–253 (1988). https://doi.org/10.1037/0022-3514.54.2.243

  3. Bernieri, F., Rosenthal, R.: Interpersonal coordination: behavior matching and interactional synchrony. Fundamentals of Nonverbal Behavior. Studies in Emotion and Social Interaction, January 1991

    Google Scholar 

  4. Bohus, D., Horvitz, E.: Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 244–252. SIGDIAL 2009. Association for Computational Linguistics, USA (2009)

    Google Scholar 

  5. Cafaro, A., et al.: The NoXi database: multimodal recordings of mediated novice-expert interactions. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI 2017, pp. 350–359. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3136755.3136780

  6. Chartrand, T.L., Bargh, J.A.: The chameleon effect: the perception-behavior link and social interaction. J. Pers. Soc. Psychol. 76(6), 893–910 (1999)

    Article  Google Scholar 

  7. Chartrand, T.L., Dalton, A.N.: Mimicry: its ubiquity, importance, and functionality (2009)

    Google Scholar 

  8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Article  Google Scholar 

  9. Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., Cohen, D.: Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans. Affect. Comput. 3(3), 349–365 (2012). https://doi.org/10.1109/T-AFFC.2012.12

    Article  Google Scholar 

  10. Dermouche, S., Pelachaud, C.: Engagement modeling in dyadic interaction. In: 2019 International Conference on Multimodal Interaction, ICMI 2019, pp. 440–445. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3340555.3353765

  11. Glas, N., Pelachaud, C.: Definitions of engagement in human-agent interaction, pp. 944–949, September 2015. https://doi.org/10.1109/ACII.2015.7344688

  12. Hadfield, J., Chalvatzaki, G., Koutras, P., Khamassi, M., Tzafestas, C.S., Maragos, P.: A deep learning approach for multi-view engagement estimation of children in a child-robot joint attention task. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1251–1256 (2019). https://doi.org/10.1109/IROS40897.2019.8968443

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778 (06 2016). https://doi.org/10.1109/CVPR.2016.90

  14. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

  15. Hu, Y., Cheng, X., Pan, Y., Hu, Y.: The intrapersonal and interpersonal consequences of interpersonal synchrony. Acta Psychologica 224, 103513 (2022). https://doi.org/10.1016/j.actpsy.2022.103513

  16. Kaur, A., Mustafa, A., Mehta, L., Dhall, A.: Prediction and localization of student engagement in the wild. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8 (2018). https://doi.org/10.1109/DICTA.2018.8615851

  17. Kendon, A.: Movement coordination in social interaction: some examples described. Acta Psychologica 32, 101–125 (1970). https://doi.org/10.1016/0001-6918(70)90094-6

    Article  Google Scholar 

  18. Kimura, R., Okada, S.: Personality trait classification based on co-occurrence pattern modeling with convolutional neural network. In: Stephanidis, C., et al. (eds.) HCII 2020. LNCS, vol. 12427, pp. 359–370. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60152-2_27

    Chapter  Google Scholar 

  19. Kolesnikov, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale (2021)

    Google Scholar 

  20. Nezami, O.M., Dras, M., Hamey, L., Richards, D., Wan, S., Paris, C.: Automatic recognition of student engagement using deep learning and facial expression (2018). https://doi.org/10.48550/ARXIV.1808.02324

  21. Oertel, C., et al.: Engagement in human-agent interaction: an overview. Front. Robot. AI 7 (2020). https://doi.org/10.3389/frobt.2020.00092

  22. Poggi, I.: Isabella Poggi Mind, Hands, Face and Body A Goal and Belief View of Multimodal Communication, March 2022. https://doi.org/10.1515/9783110261318.627

  23. Reddish, P., Fischer, R., Bulbulia, J.: Let’s dance together: synchrony, shared intentionality and cooperation. PLOS ONE 8(8), 1–13 (2013). https://doi.org/10.1371/journal.pone.0071182

  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556

  25. Sun, X., Nijholt, A.: Multimodal embodied mimicry in interaction. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds.) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. LNCS, vol. 6800, pp. 147–153. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25775-9_14

    Chapter  Google Scholar 

  26. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017)

    Google Scholar 

  27. Wiltermuth, S., Heath, C.: Synchrony and cooperation. Psychol. Sci. 20, 1–5 (2009). https://doi.org/10.1111/j.1467-9280.2008.02253.x

  28. Yu, C., Aoki, P.M., Woodruff, A.: Detecting user engagement in everyday conversations (2004). https://doi.org/10.48550/ARXIV.CS/0410027

Download references

Acknowledgement

This work was also partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (No. 22K21304, No. 22H04860 and 22H00536), JST AIP Trilateral AI Research, Japan (No. JPMJCR20G6) and JST Moonshot R &D program (JPMJMS2237-3).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shogo Okada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Mawalim, C.O., Okada, S. (2023). Inter-person Intra-modality Attention Based Model for Dyadic Interaction Engagement Prediction. In: Coman, A., Vasilache, S. (eds) Social Computing and Social Media. HCII 2023. Lecture Notes in Computer Science, vol 14025. Springer, Cham. https://doi.org/10.1007/978-3-031-35915-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35915-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35914-9

  • Online ISBN: 978-3-031-35915-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics