Abstract
Embodied Conversational Agent (ECA) is a term encompassing virtual agents designed to converse with a human user, with a physical representation in its virtual environment. Other types of conversational agents like chatbots feature other functionalities like an image for text-based communication. However, nothing has been proposed to exploit a media such as an image, in the case of Embodied Conversational Agents, which would emphasize said media in the dialogue with an embodied character that can react to its content and display emotions. We propose a design for an Embodied Conversational Agent with multi-modal perception, able to express emotions, with which the conversation revolves around a media available to both the agent and the user. Using a BERT-based model, emotion classification of the user’s words and of his/her facial expressions captured through a webcam are combined to select an expression and an answer on a turn-based basis. The agent features both real-time lip-syncing and expression animation. The application case for the study is a discussion revolving around images of paintings, where the user wants to know more details about the artwork.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vinayagamoorthy, V., et al.: Building Expression into virtual characters. In: Eurographics 2006, Vienna, Austria, 04–08 September 2006 (2006)
Weizenbaum, J.: ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM. 9, 36–45 (1966). https://doi.org/10.1145/365153.365168
Ruttkay, Z., Pelachaud, C. (eds.): From Brows to Trust: Evaluating Embodied Conversational Agents. Kluwer Academic Publishers, USA (2004)
Shum, H., He, X., Li, D.: From Eliza to XiaoIce: challenges and opportunities with social chatbots. Front. Inf. Technol. Electron. Eng. 19, 10–26 (2018). https://doi.org/10.1631/FITEE.1700826
de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., Carolis, B.D.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. Int. J. Hum Comput Stud. 59, 81–118 (2003). https://doi.org/10.1016/S1071-5819(03)00020-X
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., Ravi, S.: GoEmotions: a dataset of fine-grained emotions. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4040–4054. Association for Computational Linguistics (2020, online). https://doi.org/10.18653/v1/2020.acl-main.372
DeVito, J., O’Rourke, S., O’Neill, L.: Human Communication: The Basic Course. Longman, New York (2000)
Navarretta, C.: Mirroring facial expressions and emotions in dyadic conversations. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 469–474. European Language Resources Association (ELRA), Portorož, Slovenia (2016)
Lee, A., Oura, K., Tokuda, K.: MMDAgent—A fully open-source toolkit for voice interaction systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8382–8385 (2013). https://doi.org/10.1109/ICASSP.2013.6639300
Pelachaud, C.: Multimodal expressive embodied conversational agents. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 683–689. Association for Computing Machinery, New York (2005). https://doi.org/10.1145/1101149.1101301
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.-P.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66 (2018). https://doi.org/10.1109/FG.2018.00019
Wallace, R.S.: The anatomy of A.L.I.C.E. In: Epstein, R., Roberts, G., Beber, G. (eds.) Parsing the Turing Test, pp. 181–210. Springer, Dordrecht (2009). https://doi.org/10.1007/978-1-4020-6710-5_13
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262
Li, Y., Yao, T., Hu, R., Mei, T., Rui, Y.: Video ChatBot: triggering live social interactions by automatic video commenting. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 757–758. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2964284.2973835
Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–335 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jolibois, S., Ito, A., Nose, T. (2023). Multimodal Expressive Embodied Conversational Agent Design. In: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds) HCI International 2023 Posters. HCII 2023. Communications in Computer and Information Science, vol 1832. Springer, Cham. https://doi.org/10.1007/978-3-031-35989-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-35989-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35988-0
Online ISBN: 978-3-031-35989-7
eBook Packages: Computer ScienceComputer Science (R0)