Extracting Feature Space for Synchronizing Behavior in an Interaction Scene Using Unannotated Data

Okadome, Yuya; Nakamura, Yutaka

doi:10.1007/978-3-031-44198-1_18

Yuya Okadome^11,12 &
Yutaka Nakamura^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14261))

Included in the following conference series:

International Conference on Artificial Neural Networks

609 Accesses

Abstract

Human-human interaction includes synchronizing behaviors, such as nodding and turn-taking. Extracting and implementing these synchronization behaviors is crucial for the communication robot which can do “feeling good” conversations. In this research, we propose a framework for extracting the synchronization behavior from a dyadic conversation based on self-supervised learning. “Lag operation” which is the time-shifting operation for the features of a subject is applied to the conversation data, and a neural network model is trained based on the operating data and label of the amount of operation. The representation space is obtained after the training, and the timing-dependent behaviors are expected to isolate in the space. The proposed method is applied to about four hours of conversation data, and the representation of the test data is calculated. Data with social behaviors such as “eye contact”, “turn-taking”, and “smile” are extracted from the isolated region of the representation. Designing the behavior rules of the communication robot and investigating the proposed framework characteristics are our future projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baltrusaitis, T., Zadeh, A., Chong Lim, Y., Morency, L.-P.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic face & Gesture Recognition (FG 2018), pp. 59–66. IEEE (2018)
Google Scholar
Bartneck, C., Forlizzi, J.: A design-centred framework for social human-robot interaction. In: 13th IEEE International Workshop on Robot and Human Interactive Communication, pp. 591–594 (2004)
Google Scholar
Ben-Youssef, A., Clavel, C., Essid, S., Bilac, M., Chamoux, M., Lim, A.: UE-HRI: a new dataset for the study of user engagement in spontaneous human-robot interactions. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 464–472 (2017)
Google Scholar
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., Cohen, D.: Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans. Affect. Comput. 3(3), 349–365 (2012)
Article Google Scholar
Doukhan, D., Carrive, J., Vallet, F., Larcher, A., Meignier, S.: An open-source speaker gender detection framework for monitoring gender equality. In: Acoustics Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE (2018)
Google Scholar
Feng, Z., Xu, C., Tao, D.: Self-supervised representation learning by rotation feature decoupling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10364–10374 (2019)
Google Scholar
Forlizzi, J.: How robotic products become social products: an ethnographic study of cleaning in the home. In: Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, HRI 07, pp. 129–136, New York, NY, USA (2007). Association for Computing Machinery
Google Scholar
Frosst, N., Papernot, N., Hinton, G.: Analyzing and improving representations with the soft nearest neighbor loss. In: International Conference on Machine Learning, pp. 2012–2020. PMLR (2019)
Google Scholar
Goyal, P., Mahajan, D., Gupta, A., Misra, I.: Scaling and benchmarking self-supervised visual representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6391–6400 (2019)
Google Scholar
Grill, J.-B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)
Jaiswal, A., Ramesh Babu, A., Zaki Zadeh, M., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technol. 9(1), 2 (2021)
Google Scholar
Kwon, J., Ogawa, K.-I., Ono, E., Miyake, Y.: Detection of nonverbal synchronization through phase difference in human communication. PLoS ONE 10(7), 1–15 (2015)
Article Google Scholar
Li, R., Curhan, J., Hoque, M.E.: Predicting video-conferencing conversation outcomes based on modeling facial expression synchronization. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–6. IEEE (2015)
Google Scholar
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Chapter Google Scholar
Osugi, T., Kawahara, J.I.: Effects of head nodding and shaking motions on perceptions of likeability and approachability. Perception 47(1), 16–29 (2018). PMID: 28945151
Article Google Scholar
Riehle, M., Kempkensteffen, J., Lincoln, T.M.: Quantifying facial expression synchrony in face-to-face dyadic interactions: temporal dynamics of simultaneously recorded facial EMG signals. J. Nonverbal Behav. 41(2), 85–102 (2017)
Article Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. res. 9(11) (2008)
Google Scholar

Download references

Acknowledgment

The authors would like to thank laboratory members at Osaka University for collecting dyadic conversation data. This work was supported by JSPS KAKENHI Grant Numbers 19H05693 and 23K169770.

Author information

Authors and Affiliations

Faculty of Engineering, Tokyo University of Science, Katsushika, Japan
Yuya Okadome
Guardian Robot Project, RIKEN Information R&D and Strategy Headquarters, Seika, Japan
Yuya Okadome & Yutaka Nakamura
Graduate School of Engineering Science, Osaka University, Toyonaka, Japan
Yutaka Nakamura

Authors

Yuya Okadome
View author publications
You can also search for this author in PubMed Google Scholar
Yutaka Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuya Okadome .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okadome, Y., Nakamura, Y. (2023). Extracting Feature Space for Synchronizing Behavior in an Interaction Scene Using Unannotated Data. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-44198-1_18
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44197-4
Online ISBN: 978-3-031-44198-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extracting Feature Space for Synchronizing Behavior in an Interaction Scene Using Unannotated Data