Skip to main content

KRISTINA: A Knowledge-Based Virtual Conversation Agent

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10349))

Abstract

We present an intelligent embodied conversation agent with linguistic, social and emotional competence. Unlike the vast majority of the state-of-the-art conversation agents, the proposed agent is constructed around an ontology-based knowledge model that allows for flexible reasoning-driven dialogue planning, instead of using predefined dialogue scripts. It is further complemented by multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic. The evaluation of the 1st prototype of the agent shows a high degree of acceptance of the agent by the users with respect to its trustworthiness, naturalness, etc. The individual technologies are being further improved in the 2nd prototype.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Due to the lack of space, we cannot present a complete run of an interaction turn. Therefore, we merely introduce in what follows the individual modules and sketch how they interact.

  2. 2.

    Essential is also the recognition of prosody as a means to detect the thematic and emphatic patterns in the move of the user [6, 7].

  3. 3.

    http://www.vocapia.com/.

  4. 4.

    https://www.cereproc.com/.

References

  1. Anderson, K., et al.: The TARDIS framework: intelligent virtual agents for social coaching in job interviews. In: Reidsma, D., Katayose, H., Nijholt, A. (eds.) ACE 2013. LNCS, vol. 8253, pp. 476–491. Springer, Cham (2013). doi:10.1007/978-3-319-03161-3_35

    Chapter  Google Scholar 

  2. Ballesteros, M., Bohnet, B., Mille, S., Wanner, L.: Data-driven sentence generation with non-isomorphic trees. In: Proceedings of the 2015 Conference of the NAACL: Human Language Technologies, pp. 387–397. ACL, Denver, Colorado, May–June 2015. http://www.aclweb.org/anthology/N15-1042

  3. Ballesteros, M., Bohnet, B., Mille, S., Wanner, L.: Data-driven deep-syntactic dependency parsing. Natural Lang. Eng. 22(6), 939–974 (2016)

    Article  Google Scholar 

  4. Baur, T., Mehlmann, G., Damian, I., Gebhard, P., Lingenfelser, F., Wagner, J., Lugrin, B., André, E.: Context-aware automated analysis and annotation of social human-agent interactions. ACM Trans. Interact. Intell. Syst. 5(2) (2015)

    Google Scholar 

  5. Bohnet, B., Wanner, L.: Open soucre graph transducer interpreter and grammar development environment. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17–23 May, Valletta, Malta (2010)

    Google Scholar 

  6. Domínguez, M., Farrús, M., Burga, A., Wanner, L.: Using hierarchical information structure for prosody prediction in content-to-speech application. In: Proceedings of the 8th International Conference on Speech Prosody (SP 2016), Boston, MA (2016)

    Google Scholar 

  7. Domínguez, M., Farrús, M., Wanner., L.: Combining acoustic and linguistic features in phrase-oriented prosody prediction. In: Proceedings of the 8th International Conference on Speech Prosody (SP 2016), Boston, MA (2016)

    Google Scholar 

  8. Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Proc. Nat. Acad. Sci. 111(15), E1454–E1462 (2014)

    Article  Google Scholar 

  9. Ekman, P., Rosenberg, E.L.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford University Press, Oxford (1997)

    Google Scholar 

  10. Fillmore, C.J.: Frame Semantics, pp. 111–137. Hanshin Publishing Co., Seoul (1982)

    Google Scholar 

  11. Gangemi, A.: Ontology design patterns for semantic web content. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 262–276. Springer, Heidelberg (2005). doi:10.1007/11574620_21

    Chapter  Google Scholar 

  12. Gebhard, P., Mehlmann, G.U., Kipp, M.: Visual SceneMaker: a tool for authoring interactive virtual characters. J. Multimodal User Interfaces 6(1–2), 3–11 (2012). Interacting with Embodied Conversational Agents. Springer-Verlag

    Article  Google Scholar 

  13. Gilroy, S.W., Cavazza, M., Niranen, M., André, E., Vogt, T., Urbain, J., Benayoun, M., Seichter, H., Billinghurst, M.: PAD-based multimodal affective fusion. In: Affective Computing and Intelligent Interaction and Workshops (2009)

    Google Scholar 

  14. Gunes, H., Schuller, B.: Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image Vis. Comput. 31(2), 120–136 (2013)

    Article  Google Scholar 

  15. Heckmann, D., Schwartz, T., Brandherm, B., Schmitz, M., Wilamowitz-Moellendorff, M.: Gumo – the general user model ontology. In: Ardissono, L., Brna, P., Mitrovic, A. (eds.) UM 2005. LNCS, vol. 3538, pp. 428–432. Springer, Heidelberg (2005). doi:10.1007/11527886_58

    Chapter  Google Scholar 

  16. Hofstede, G.H., Hofstede, G.: Culture’s Consequences: Comparing Values, Behaviors, Institutions and Organizations Across Nations. Sage, Thousand Oaks (2001)

    Google Scholar 

  17. Hyde, J., Carter, E.J., Kiesler, S., Hodgins, J.K.: Assessing naturalness and emotional intensity: a perceptual study of animated facial motion. In: Proceedings of the ACM Symposium on Applied Perception, pp. 15–22. ACM (2014)

    Google Scholar 

  18. Hyde, J., Carter, E.J., Kiesler, S., Hodgins, J.K.: Using an interactive avatar’s facial expressiveness to increase persuasiveness and socialness. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 1719–1728. ACM (2015)

    Google Scholar 

  19. Lamel, L., Gauvain, J.: Speech recognition. In: Mitkov, R. (ed.) OUP Handbook on Computational Linguistics, pp. 305–322. Oxford University Press, Oxford (2003)

    Google Scholar 

  20. Lingenfelser, F., Wagner, J., André, E., McKeown, G., Curran, W.: An event driven fusion approach for enjoyment recognition in real-time. In: MM, pp. 377–386 (2014)

    Google Scholar 

  21. Mehlmann, G., André, E.: Modeling multimodal integration with event logic charts. In: Proceedings of the 14th International Conference on Multimodal Interaction, pp. 125–132. ACM, New York (2012)

    Google Scholar 

  22. Mehlmann, G., Janowski, K., André, E.: Modeling grounding for interactive social companions. J. Artif. Intell. 30(1), 45–52 (2016). Social Companion Technologies. Springer-Verlag

    Google Scholar 

  23. Mehlmann, G., Janowski, K., Baur, T., Häring, M., André, E., Gebhard, P.: Exploring a model of gaze for grounding in HRI. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 247–254. ACM, New York (2014)

    Google Scholar 

  24. Mori, M., MacDorman, K.F., Kageki, N.: The uncanny valley [from the field]. IEEE Robot. Autom. Mag. 19(2), 98–100 (2012)

    Article  Google Scholar 

  25. Motik, B., Cuenca Grau, B., Sattler, U.: Structured objects in OWL: representation and reasoning. In: Proceedings of the 17th International Conference on World Wide Web, pp. 555–564. ACM (2008)

    Google Scholar 

  26. Ochs, M., Pelachaud, C.: Socially aware virtual characters: the social signal of smiles. IEEE Signal Process. Mag. 30(2), 128–132 (2013)

    Article  Google Scholar 

  27. Posner, J., Russell, J., Peterson, B.: The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development and psychopathology. Dev. Psychopathol. 17(3), 715–734 (2005)

    Article  Google Scholar 

  28. Riaño, D., Real, F., Campana, F., Ercolani, S., Annicchiarico, R.: An ontology for the care of the elder at home. In: Combi, C., Shahar, Y., Abu-Hanna, A. (eds.) AIME 2009. LNCS (LNAI), vol. 5651, pp. 235–239. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02976-9_33

    Chapter  Google Scholar 

  29. Ruiz, A., Van de Weijer, J., Binefa, X.: From emotions to action units with hidden and semi-hidden-task learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3703–3711 (2015)

    Google Scholar 

  30. Sandbach, G., Zafeiriou, S., Pantic, M., Yin, L.: Static and dynamic 3D facial expression recognition: a comprehensive survey. Image Vis. Comput. 30(10), 683–697 (2012)

    Article  Google Scholar 

  31. Savran, A., Sankur, B., Bilge, M.T.: Regression- based intensity estimation of facial action units. Image Vis. Comput. 30(10), 774–784 (2012)

    Article  Google Scholar 

  32. Shaw, R., Troncy, R., Hardman, L.: LODE: linking open descriptions of events. In: 4th Asian Conference on The Semantic Web, Shanghai, China, pp. 153–167 (2009)

    Google Scholar 

  33. Wagner, J., Lingenfelser, F., André, E.: Building a Robust System for Multimodal Emotion Recognition, pp. 379–419. Wiley, Hoboken (2015)

    Google Scholar 

  34. Wagner, J., Lingenfelser, F., Baur, T., Damian, I., Kistler, F., André, E.: The social signal interpretation (SSI) framework-multimodal signal processing and recognition in real-time. In: Proceedings of ACM International Conference on Multimedia (2013)

    Google Scholar 

  35. Wanner, L., Bohnet, B., Bouayad-Agha, N., Lareau, F., Nicklaß, D.: MARQUIS: generation of user-tailored multilingual air quality bulletins. Appl. Artif. Intell. 24(10), 914–952 (2010)

    Article  Google Scholar 

  36. Yasavur, U., Lisetti, C., Rishe, N.: Let’s talk! speaking virtual counselor offers you a brief intervention. J. Multimodal User Interfaces 8(4), 381–398 (2014)

    Article  Google Scholar 

  37. Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

The presented work is funded by the European Commission as part of the H2020 Programme, under the contract number 645012–RIA. Many thanks to our colleagues from the University of Tübingen, German Red Cross and semFYC for the definition of the use cases, constant feedback, and evaluation!

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leo Wanner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wanner, L. et al. (2017). KRISTINA: A Knowledge-Based Virtual Conversation Agent. In: Demazeau, Y., Davidsson, P., Bajo, J., Vale, Z. (eds) Advances in Practical Applications of Cyber-Physical Multi-Agent Systems: The PAAMS Collection. PAAMS 2017. Lecture Notes in Computer Science(), vol 10349. Springer, Cham. https://doi.org/10.1007/978-3-319-59930-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59930-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59929-8

  • Online ISBN: 978-3-319-59930-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics