Skip to main content

Extracting Feature Space for Synchronizing Behavior in an Interaction Scene Using Unannotated Data

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14261))

Included in the following conference series:

  • 609 Accesses

Abstract

Human-human interaction includes synchronizing behaviors, such as nodding and turn-taking. Extracting and implementing these synchronization behaviors is crucial for the communication robot which can do “feeling good” conversations. In this research, we propose a framework for extracting the synchronization behavior from a dyadic conversation based on self-supervised learning. “Lag operation” which is the time-shifting operation for the features of a subject is applied to the conversation data, and a neural network model is trained based on the operating data and label of the amount of operation. The representation space is obtained after the training, and the timing-dependent behaviors are expected to isolate in the space. The proposed method is applied to about four hours of conversation data, and the representation of the test data is calculated. Data with social behaviors such as “eye contact”, “turn-taking”, and “smile” are extracted from the isolated region of the representation. Designing the behavior rules of the communication robot and investigating the proposed framework characteristics are our future projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baltrusaitis, T., Zadeh, A., Chong Lim, Y., Morency, L.-P.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic face & Gesture Recognition (FG 2018), pp. 59–66. IEEE (2018)

    Google Scholar 

  2. Bartneck, C., Forlizzi, J.: A design-centred framework for social human-robot interaction. In: 13th IEEE International Workshop on Robot and Human Interactive Communication, pp. 591–594 (2004)

    Google Scholar 

  3. Ben-Youssef, A., Clavel, C., Essid, S., Bilac, M., Chamoux, M., Lim, A.: UE-HRI: a new dataset for the study of user engagement in spontaneous human-robot interactions. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 464–472 (2017)

    Google Scholar 

  4. Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)

    Google Scholar 

  5. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  6. Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., Cohen, D.: Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans. Affect. Comput. 3(3), 349–365 (2012)

    Article  Google Scholar 

  7. Doukhan, D., Carrive, J., Vallet, F., Larcher, A., Meignier, S.: An open-source speaker gender detection framework for monitoring gender equality. In: Acoustics Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE (2018)

    Google Scholar 

  8. Feng, Z., Xu, C., Tao, D.: Self-supervised representation learning by rotation feature decoupling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10364–10374 (2019)

    Google Scholar 

  9. Forlizzi, J.: How robotic products become social products: an ethnographic study of cleaning in the home. In: Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, HRI 07, pp. 129–136, New York, NY, USA (2007). Association for Computing Machinery

    Google Scholar 

  10. Frosst, N., Papernot, N., Hinton, G.: Analyzing and improving representations with the soft nearest neighbor loss. In: International Conference on Machine Learning, pp. 2012–2020. PMLR (2019)

    Google Scholar 

  11. Goyal, P., Mahajan, D., Gupta, A., Misra, I.: Scaling and benchmarking self-supervised visual representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6391–6400 (2019)

    Google Scholar 

  12. Grill, J.-B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)

  13. Jaiswal, A., Ramesh Babu, A., Zaki Zadeh, M., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technol. 9(1), 2 (2021)

    Google Scholar 

  14. Kwon, J., Ogawa, K.-I., Ono, E., Miyake, Y.: Detection of nonverbal synchronization through phase difference in human communication. PLoS ONE 10(7), 1–15 (2015)

    Article  Google Scholar 

  15. Li, R., Curhan, J., Hoque, M.E.: Predicting video-conferencing conversation outcomes based on modeling facial expression synchronization. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–6. IEEE (2015)

    Google Scholar 

  16. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5

    Chapter  Google Scholar 

  17. Osugi, T., Kawahara, J.I.: Effects of head nodding and shaking motions on perceptions of likeability and approachability. Perception 47(1), 16–29 (2018). PMID: 28945151

    Article  Google Scholar 

  18. Riehle, M., Kempkensteffen, J., Lincoln, T.M.: Quantifying facial expression synchrony in face-to-face dyadic interactions: temporal dynamics of simultaneously recorded facial EMG signals. J. Nonverbal Behav. 41(2), 85–102 (2017)

    Article  Google Scholar 

  19. Van der Maaten, L., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. res. 9(11) (2008)

    Google Scholar 

Download references

Acknowledgment

The authors would like to thank laboratory members at Osaka University for collecting dyadic conversation data. This work was supported by JSPS KAKENHI Grant Numbers 19H05693 and 23K169770.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuya Okadome .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Okadome, Y., Nakamura, Y. (2023). Extracting Feature Space for Synchronizing Behavior in an Interaction Scene Using Unannotated Data. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44198-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44197-4

  • Online ISBN: 978-3-031-44198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics