Skip to main content

Multimodal Expressive Embodied Conversational Agent Design

  • Conference paper
  • First Online:
HCI International 2023 Posters (HCII 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1832))

Included in the following conference series:

  • 932 Accesses

Abstract

Embodied Conversational Agent (ECA) is a term encompassing virtual agents designed to converse with a human user, with a physical representation in its virtual environment. Other types of conversational agents like chatbots feature other functionalities like an image for text-based communication. However, nothing has been proposed to exploit a media such as an image, in the case of Embodied Conversational Agents, which would emphasize said media in the dialogue with an embodied character that can react to its content and display emotions. We propose a design for an Embodied Conversational Agent with multi-modal perception, able to express emotions, with which the conversation revolves around a media available to both the agent and the user. Using a BERT-based model, emotion classification of the user’s words and of his/her facial expressions captured through a webcam are combined to select an expression and an answer on a turn-based basis. The agent features both real-time lip-syncing and expression animation. The application case for the study is a discussion revolving around images of paintings, where the user wants to know more details about the artwork.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vinayagamoorthy, V., et al.: Building Expression into virtual characters. In: Eurographics 2006, Vienna, Austria, 04–08 September 2006 (2006)

    Google Scholar 

  2. Weizenbaum, J.: ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM. 9, 36–45 (1966). https://doi.org/10.1145/365153.365168

    Article  Google Scholar 

  3. Ruttkay, Z., Pelachaud, C. (eds.): From Brows to Trust: Evaluating Embodied Conversational Agents. Kluwer Academic Publishers, USA (2004)

    MATH  Google Scholar 

  4. Shum, H., He, X., Li, D.: From Eliza to XiaoIce: challenges and opportunities with social chatbots. Front. Inf. Technol. Electron. Eng. 19, 10–26 (2018). https://doi.org/10.1631/FITEE.1700826

    Article  Google Scholar 

  5. de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., Carolis, B.D.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. Int. J. Hum Comput Stud. 59, 81–118 (2003). https://doi.org/10.1016/S1071-5819(03)00020-X

    Article  Google Scholar 

  6. Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., Ravi, S.: GoEmotions: a dataset of fine-grained emotions. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4040–4054. Association for Computational Linguistics (2020, online). https://doi.org/10.18653/v1/2020.acl-main.372

  7. DeVito, J., O’Rourke, S., O’Neill, L.: Human Communication: The Basic Course. Longman, New York (2000)

    Google Scholar 

  8. Navarretta, C.: Mirroring facial expressions and emotions in dyadic conversations. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 469–474. European Language Resources Association (ELRA), Portorož, Slovenia (2016)

    Google Scholar 

  9. Lee, A., Oura, K., Tokuda, K.: MMDAgent—A fully open-source toolkit for voice interaction systems. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8382–8385 (2013). https://doi.org/10.1109/ICASSP.2013.6639300

  10. Pelachaud, C.: Multimodal expressive embodied conversational agents. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 683–689. Association for Computing Machinery, New York (2005). https://doi.org/10.1145/1101149.1101301

  11. Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.-P.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66 (2018). https://doi.org/10.1109/FG.2018.00019

  12. Wallace, R.S.: The anatomy of A.L.I.C.E. In: Epstein, R., Roberts, G., Beber, G. (eds.) Parsing the Turing Test, pp. 181–210. Springer, Dordrecht (2009). https://doi.org/10.1007/978-1-4020-6710-5_13

    Chapter  Google Scholar 

  13. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262

  14. Li, Y., Yao, T., Hu, R., Mei, T., Rui, Y.: Video ChatBot: triggering live social interactions by automatic video commenting. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 757–758. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2964284.2973835

  15. Das, A., et al.: Visual dialog. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 326–335 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Jolibois .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jolibois, S., Ito, A., Nose, T. (2023). Multimodal Expressive Embodied Conversational Agent Design. In: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds) HCI International 2023 Posters. HCII 2023. Communications in Computer and Information Science, vol 1832. Springer, Cham. https://doi.org/10.1007/978-3-031-35989-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35989-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35988-0

  • Online ISBN: 978-3-031-35989-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics