Multimodal Expressive Embodied Conversational Agent Design

Jolibois, Simon; Ito, Akinori; Nose, Takashi

doi:10.1007/978-3-031-35989-7_31

Simon Jolibois⁹,
Akinori Ito⁹ &
Takashi Nose⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1832))

Included in the following conference series:

International Conference on Human-Computer Interaction

932 Accesses

Abstract

Embodied Conversational Agent (ECA) is a term encompassing virtual agents designed to converse with a human user, with a physical representation in its virtual environment. Other types of conversational agents like chatbots feature other functionalities like an image for text-based communication. However, nothing has been proposed to exploit a media such as an image, in the case of Embodied Conversational Agents, which would emphasize said media in the dialogue with an embodied character that can react to its content and display emotions. We propose a design for an Embodied Conversational Agent with multi-modal perception, able to express emotions, with which the conversation revolves around a media available to both the agent and the user. Using a BERT-based model, emotion classification of the user’s words and of his/her facial expressions captured through a webcam are combined to select an expression and an answer on a turn-based basis. The agent features both real-time lip-syncing and expression animation. The application case for the study is a discussion revolving around images of paintings, where the user wants to know more details about the artwork.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vinayagamoorthy, V., et al.: Building Expression into virtual characters. In: Eurographics 2006, Vienna, Austria, 04–08 September 2006 (2006)
Google Scholar
Weizenbaum, J.: ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM. 9, 36–45 (1966). https://doi.org/10.1145/365153.365168
Article Google Scholar
Ruttkay, Z., Pelachaud, C. (eds.): From Brows to Trust: Evaluating Embodied Conversational Agents. Kluwer Academic Publishers, USA (2004)
MATH Google Scholar
Shum, H., He, X., Li, D.: From Eliza to XiaoIce: challenges and opportunities with social chatbots. Front. Inf. Technol. Electron. Eng. 19, 10–26 (2018). https://doi.org/10.1631/FITEE.1700826
Article Google Scholar
de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., Carolis, B.D.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. Int. J. Hum Comput Stud. 59, 81–118 (2003). https://doi.org/10.1016/S1071-5819(03)00020-X
Article Google Scholar
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., Ravi, S.: GoEmotions: a dataset of fine-grained emotions. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4040–4054. Association for Computational Linguistics (2020, online). https://doi.org/10.18653/v1/2020.acl-main.372
DeVito, J., O’Rourke, S., O’Neill, L.: Human Communication: The Basic Course. Longman, New York (2000)
Google Scholar
Navarretta, C.: Mirroring facial expressions and emotions in dyadic conversations. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 469–474. European Language Resources Association (ELRA), Portorož, Slovenia (2016)
Google Scholar
Lee, A., Oura, K., Tokuda, K.: MMDAgent—A fully open-source toolkit for voice interaction systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8382–8385 (2013). https://doi.org/10.1109/ICASSP.2013.6639300
Pelachaud, C.: Multimodal expressive embodied conversational agents. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 683–689. Association for Computing Machinery, New York (2005). https://doi.org/10.1145/1101149.1101301
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.-P.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66 (2018). https://doi.org/10.1109/FG.2018.00019
Wallace, R.S.: The anatomy of A.L.I.C.E. In: Epstein, R., Roberts, G., Beber, G. (eds.) Parsing the Turing Test, pp. 181–210. Springer, Dordrecht (2009). https://doi.org/10.1007/978-1-4020-6710-5_13
Chapter Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262
Li, Y., Yao, T., Hu, R., Mei, T., Rui, Y.: Video ChatBot: triggering live social interactions by automatic video commenting. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 757–758. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2964284.2973835
Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–335 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Ito-Nose Lab, Graduate School of Engineering, Tohoku University, Sendai, Japan
Simon Jolibois, Akinori Ito & Takashi Nose

Authors

Simon Jolibois
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Ito
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Nose
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Jolibois .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology - Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis
University of Crete and Foundation for Research and Technology - Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
University of Crete and Foundation for Research and Technology - Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa
University of Central Florida, Orlando, FL, USA
Gavriel Salvendy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jolibois, S., Ito, A., Nose, T. (2023). Multimodal Expressive Embodied Conversational Agent Design. In: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds) HCI International 2023 Posters. HCII 2023. Communications in Computer and Information Science, vol 1832. Springer, Cham. https://doi.org/10.1007/978-3-031-35989-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-35989-7_31
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35988-0
Online ISBN: 978-3-031-35989-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics