Abstract
Human-human interaction includes synchronizing behaviors, such as nodding and turn-taking. Extracting and implementing these synchronization behaviors is crucial for the communication robot which can do “feeling good” conversations. In this research, we propose a framework for extracting the synchronization behavior from a dyadic conversation based on self-supervised learning. “Lag operation” which is the time-shifting operation for the features of a subject is applied to the conversation data, and a neural network model is trained based on the operating data and label of the amount of operation. The representation space is obtained after the training, and the timing-dependent behaviors are expected to isolate in the space. The proposed method is applied to about four hours of conversation data, and the representation of the test data is calculated. Data with social behaviors such as “eye contact”, “turn-taking”, and “smile” are extracted from the isolated region of the representation. Designing the behavior rules of the communication robot and investigating the proposed framework characteristics are our future projects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baltrusaitis, T., Zadeh, A., Chong Lim, Y., Morency, L.-P.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic face & Gesture Recognition (FG 2018), pp. 59–66. IEEE (2018)
Bartneck, C., Forlizzi, J.: A design-centred framework for social human-robot interaction. In: 13th IEEE International Workshop on Robot and Human Interactive Communication, pp. 591–594 (2004)
Ben-Youssef, A., Clavel, C., Essid, S., Bilac, M., Chamoux, M., Lim, A.: UE-HRI: a new dataset for the study of user engagement in spontaneous human-robot interactions. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 464–472 (2017)
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., Cohen, D.: Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans. Affect. Comput. 3(3), 349–365 (2012)
Doukhan, D., Carrive, J., Vallet, F., Larcher, A., Meignier, S.: An open-source speaker gender detection framework for monitoring gender equality. In: Acoustics Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE (2018)
Feng, Z., Xu, C., Tao, D.: Self-supervised representation learning by rotation feature decoupling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10364–10374 (2019)
Forlizzi, J.: How robotic products become social products: an ethnographic study of cleaning in the home. In: Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, HRI 07, pp. 129–136, New York, NY, USA (2007). Association for Computing Machinery
Frosst, N., Papernot, N., Hinton, G.: Analyzing and improving representations with the soft nearest neighbor loss. In: International Conference on Machine Learning, pp. 2012–2020. PMLR (2019)
Goyal, P., Mahajan, D., Gupta, A., Misra, I.: Scaling and benchmarking self-supervised visual representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6391–6400 (2019)
Grill, J.-B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)
Jaiswal, A., Ramesh Babu, A., Zaki Zadeh, M., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technol. 9(1), 2 (2021)
Kwon, J., Ogawa, K.-I., Ono, E., Miyake, Y.: Detection of nonverbal synchronization through phase difference in human communication. PLoS ONE 10(7), 1–15 (2015)
Li, R., Curhan, J., Hoque, M.E.: Predicting video-conferencing conversation outcomes based on modeling facial expression synchronization. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–6. IEEE (2015)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Osugi, T., Kawahara, J.I.: Effects of head nodding and shaking motions on perceptions of likeability and approachability. Perception 47(1), 16–29 (2018). PMID: 28945151
Riehle, M., Kempkensteffen, J., Lincoln, T.M.: Quantifying facial expression synchrony in face-to-face dyadic interactions: temporal dynamics of simultaneously recorded facial EMG signals. J. Nonverbal Behav. 41(2), 85–102 (2017)
Van der Maaten, L., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. res. 9(11) (2008)
Acknowledgment
The authors would like to thank laboratory members at Osaka University for collecting dyadic conversation data. This work was supported by JSPS KAKENHI Grant Numbers 19H05693 and 23K169770.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Okadome, Y., Nakamura, Y. (2023). Extracting Feature Space for Synchronizing Behavior in an Interaction Scene Using Unannotated Data. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-44198-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44197-4
Online ISBN: 978-3-031-44198-1
eBook Packages: Computer ScienceComputer Science (R0)