Inter-person Intra-modality Attention Based Model for Dyadic Interaction Engagement Prediction

Li, Xiguang; Mawalim, Candy Olivia; Okada, Shogo

doi:10.1007/978-3-031-35915-6_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14025))

Included in the following conference series:

International Conference on Human-Computer Interaction

739 Accesses

Abstract

With the rapid development of artificial agents, more researchers have explored the importance of user engagement level prediction. Real-time user engagement level prediction assists the agent in properly adjusting its policy for the interaction. However, the existing engagement modeling lacks the element of interpersonal synchrony, a temporal behavior alignment closely related to the engagement level. Part of this is because the synchrony phenomenon is complex and hard to delimit. With this background, we aim to develop a model suitable for temporal interpersonal features with the help of the modern data-driven machine learning method. Based on previous studies, we select multiple non-verbal modalities of dyadic interactions as predictive features and design a multi-stream attention model to capture the interpersonal temporal relationship of each modality. Furthermore, we experiment with two additional embedding schemas according to the synchrony definitions in psychology. Finally, we compare our model with a conventional structure that emphasizes the multimodal features within an individual. Our experiments showed the effectiveness of the intra-modal inter-person design in engagement prediction. However, the attempt to manipulate the embeddings failed to improve the performance. In the end, we discuss the experiment result and elaborate on the limitations of our work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdelrahman, A.A., Strazdas, D., Khalifa, A., Hintz, J., Hempel, T., Al-Hamadi, A.: Multimodal engagement prediction in multiperson human-robot interaction. IEEE Access 10, 61980–61991 (2022). https://doi.org/10.1109/ACCESS.2022.3182469
Article Google Scholar
Bernieri, F., Reznick, J., Rosenthal, R.: Synchrony, pseudosynchrony, and dissynchrony: measuring the entrainment process in mother-infant interactions. J. Pers. Soc. Psychol. 54, 243–253 (1988). https://doi.org/10.1037/0022-3514.54.2.243
Bernieri, F., Rosenthal, R.: Interpersonal coordination: behavior matching and interactional synchrony. Fundamentals of Nonverbal Behavior. Studies in Emotion and Social Interaction, January 1991
Google Scholar
Bohus, D., Horvitz, E.: Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 244–252. SIGDIAL 2009. Association for Computational Linguistics, USA (2009)
Google Scholar
Cafaro, A., et al.: The NoXi database: multimodal recordings of mediated novice-expert interactions. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI 2017, pp. 350–359. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3136755.3136780
Chartrand, T.L., Bargh, J.A.: The chameleon effect: the perception-behavior link and social interaction. J. Pers. Soc. Psychol. 76(6), 893–910 (1999)
Article Google Scholar
Chartrand, T.L., Dalton, A.N.: Mimicry: its ubiquity, importance, and functionality (2009)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Article Google Scholar
Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., Cohen, D.: Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans. Affect. Comput. 3(3), 349–365 (2012). https://doi.org/10.1109/T-AFFC.2012.12
Article Google Scholar
Dermouche, S., Pelachaud, C.: Engagement modeling in dyadic interaction. In: 2019 International Conference on Multimodal Interaction, ICMI 2019, pp. 440–445. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3340555.3353765
Glas, N., Pelachaud, C.: Definitions of engagement in human-agent interaction, pp. 944–949, September 2015. https://doi.org/10.1109/ACII.2015.7344688
Hadfield, J., Chalvatzaki, G., Koutras, P., Khamassi, M., Tzafestas, C.S., Maragos, P.: A deep learning approach for multi-view engagement estimation of children in a child-robot joint attention task. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1251–1256 (2019). https://doi.org/10.1109/IROS40897.2019.8968443
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778 (06 2016). https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Hu, Y., Cheng, X., Pan, Y., Hu, Y.: The intrapersonal and interpersonal consequences of interpersonal synchrony. Acta Psychologica 224, 103513 (2022). https://doi.org/10.1016/j.actpsy.2022.103513
Kaur, A., Mustafa, A., Mehta, L., Dhall, A.: Prediction and localization of student engagement in the wild. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8 (2018). https://doi.org/10.1109/DICTA.2018.8615851
Kendon, A.: Movement coordination in social interaction: some examples described. Acta Psychologica 32, 101–125 (1970). https://doi.org/10.1016/0001-6918(70)90094-6
Article Google Scholar
Kimura, R., Okada, S.: Personality trait classification based on co-occurrence pattern modeling with convolutional neural network. In: Stephanidis, C., et al. (eds.) HCII 2020. LNCS, vol. 12427, pp. 359–370. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60152-2_27
Chapter Google Scholar
Kolesnikov, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale (2021)
Google Scholar
Nezami, O.M., Dras, M., Hamey, L., Richards, D., Wan, S., Paris, C.: Automatic recognition of student engagement using deep learning and facial expression (2018). https://doi.org/10.48550/ARXIV.1808.02324
Oertel, C., et al.: Engagement in human-agent interaction: an overview. Front. Robot. AI 7 (2020). https://doi.org/10.3389/frobt.2020.00092
Poggi, I.: Isabella Poggi Mind, Hands, Face and Body A Goal and Belief View of Multimodal Communication, March 2022. https://doi.org/10.1515/9783110261318.627
Reddish, P., Fischer, R., Bulbulia, J.: Let’s dance together: synchrony, shared intentionality and cooperation. PLOS ONE 8(8), 1–13 (2013). https://doi.org/10.1371/journal.pone.0071182
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). https://doi.org/10.48550/ARXIV.1409.1556
Sun, X., Nijholt, A.: Multimodal embodied mimicry in interaction. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds.) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. LNCS, vol. 6800, pp. 147–153. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25775-9_14
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Wiltermuth, S., Heath, C.: Synchrony and cooperation. Psychol. Sci. 20, 1–5 (2009). https://doi.org/10.1111/j.1467-9280.2008.02253.x
Yu, C., Aoki, P.M., Woodruff, A.: Detecting user engagement in everyday conversations (2004). https://doi.org/10.48550/ARXIV.CS/0410027

Download references

Acknowledgement

This work was also partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (No. 22K21304, No. 22H04860 and 22H00536), JST AIP Trilateral AI Research, Japan (No. JPMJCR20G6) and JST Moonshot R &D program (JPMJMS2237-3).

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Nomi, Japan
Xiguang Li, Candy Olivia Mawalim & Shogo Okada

Authors

Xiguang Li
View author publications
You can also search for this author in PubMed Google Scholar
Candy Olivia Mawalim
View author publications
You can also search for this author in PubMed Google Scholar
Shogo Okada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shogo Okada .

Editor information

Editors and Affiliations

University of Bucharest, Bucharest, Romania
Adela Coman
University of Tsukuba, Tsukuba, Japan
Simona Vasilache

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Mawalim, C.O., Okada, S. (2023). Inter-person Intra-modality Attention Based Model for Dyadic Interaction Engagement Prediction. In: Coman, A., Vasilache, S. (eds) Social Computing and Social Media. HCII 2023. Lecture Notes in Computer Science, vol 14025. Springer, Cham. https://doi.org/10.1007/978-3-031-35915-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-35915-6_8
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35914-9
Online ISBN: 978-3-031-35915-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Inter-person Intra-modality Attention Based Model for Dyadic Interaction Engagement Prediction