Skip to main content

A Controller-Based Animation System for Synchronizing and Realizing Human-Like Conversational Behaviors

  • Chapter
Development of Multimodal Interfaces: Active Listening and Synchrony

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5967))

Abstract

The Embodied Conversational Agents (ECAs) are an application of virtual characters that is subject of considerable ongoing research. An essential prerequisite for creating believable ECAs is the ability to describe and visually realize multimodal conversational behaviors. The recently developed Behavior Markup Language (BML) seeks to address this requirement by granting a means to specify physical realizations of multimodal behaviors through human-readable scripts. In this paper we present an approach to implement a behavior realizer compatible with BML language. The system’s architecture is based on hierarchical controllers which apply preprocessed behaviors to body modalities. Animation database is feasibly extensible and contains behavior examples constructed upon existing lexicons and theory of gestures. Furthermore, we describe a novel solution to the issue of synchronizing gestures with synthesized speech using neural networks and propose improvements to the BML specification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cassell, J.: Embodied Conversational Agents. MIT Press, Cambridge (2000)

    Google Scholar 

  2. Lee, J., Marsella, S.: Non-verbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243–255. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Pelachaud, C.: Studies on gesture expressivity for a virtual agent. Speech Communication, Special issue in honor of Bjorn Granstrom and Rolf Carlson (2009) (to appear)

    Google Scholar 

  4. Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: Creating animated conversational characters from recordings of human performance. In: Proceedings of ACM SIGGRAPH 2004, vol. 23, pp. 506–513 (2004)

    Google Scholar 

  5. Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thorisson, K., Vilhjalmsson, H.: Towards a common framework for multimodal generation: The behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Vilhjálmsson, H., Cantelmo, N., Cassell, J., Chafai, N.E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A.N., Pelachaud, C., Ruttkay, Z., Thórisson, K.R., van Welbergen, H., van der Werf, R.J.: The behavior markup language: Recent developments and challenges. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 99–111. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Ekman, P.: About brows: Emotional and conversational signals, pp. 169–202. Cambridge University Press, Cambridge (1979)

    Google Scholar 

  8. McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)

    Google Scholar 

  9. Chovil, N.: Discourse-oriented facial displays in conversation. Research on Language and Social Interaction 25, 163–194 (1991)

    Article  Google Scholar 

  10. Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 1–24 (2008)

    Article  Google Scholar 

  11. Cassell, J., Vilhjalmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: SIGGRAPH 2001: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 477–486. ACM, New York (2001)

    Google Scholar 

  12. Smid, K., Zoric, G., Pandzic, I.S.: [HUGE]: Universal architecture for statistically based hUman gEsturing. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 256–269. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Zoric, G., Smid, K., Pandzic, I.S.: Towards facial gestures generation by speech signal analysis using huge architecture. In: Multimodal Signals: Cognitive and Algorithmic Issues: COST Action 2102 and euCognition International School Vietri sul Mare, Italy, April 21-26, Revised Selected and Invited Papers, pp. 112–120. Springer, Heidelberg (2009)

    Google Scholar 

  14. Albrecht, I., Haber, J., peter Seidel, H.: Automatic generation of non-verbal facial expressions from speech. In: Proc. Computer Graphics International 2002, pp. 283–293 (2002)

    Google Scholar 

  15. Kopp, S., Wachsmuth, I.: Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds 15, 39–52 (2004)

    Article  Google Scholar 

  16. Thiebaux, M., Marshall, A., Marsella, S., Kallmann, M.: Smartbody: Behavior realization for embodied conversational agents. In: Proceedings of Autonomous Agents and Multi-Agent Systems AAMAS (2008)

    Google Scholar 

  17. Microsoft speech API: http://www.microsoft.com/speech/speech2007/default.mspx

  18. Pejsa, T., Pandzic, I.S.: Architecture of an animation system for human characters. In: Proceedings of the 10th International Conference on Telecommunications ConTEL 2009 (2009)

    Google Scholar 

  19. Pandzic, I.S., Ahlberg, J., Wzorek, M., Rudol, P., Mosmondor, M.: Faces everywhere: Towards ubiquitous production and delivery of face animation. In: Proceedings of the 2nd International Conference on Mobile and Ubiquitous Multimedia MUM 2003, pp. 49–55 (2003)

    Google Scholar 

  20. Hartmann, B., Mancini, M., Pelachaud, C.: Formational parameters and adaptive prototype instantiation for mpeg-4 compliant gesture synthesis. In: Proc. Computer Animation, June 19-21, pp. 111–119 (2002)

    Google Scholar 

  21. Van Deemter, K., Krenn, B., Piwek, P., Klesen, M., Schroder, M., Baumann, S.: Fully generated scripted dialogue for embodied agents. Artificial Intelligence 172(10), 1219–1244 (2008)

    Article  MATH  Google Scholar 

  22. Schröder, M., Trouvain, J.: The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. International Journal of Speech Technology 6, 365–377 (2003)

    Article  Google Scholar 

  23. Taylor, P.A., Black, A., Caley, R.: The architecture of the festival speech synthesis system. In: The Third ESCA Workshop in Speech Synthesis, pp. 147–151 (1998)

    Google Scholar 

  24. Rojas, R.: Neural Networks - A Systematic Introduction. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  25. Steinmetz, R.: Human perception of jitter and media synchronization. IEEE Journal on Selected Areas in Communications 14(1) (1996)

    Google Scholar 

  26. Brkic, M., Smid, K., Pejsa, T., Pandzic, I.S.: Towards natural head movement of autonomous speaker agent. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, pp. 73–80. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  27. Huang, H., Cerekovic, A., Pandzic, I.S., Nakano, Y., Nishida, T.: Toward a culture adaptive conversational agent with a modularized approach. In: Proceedings of Workshop on Enculturating Conversational Interfaces by Socio-cultural Aspects of Communication (2008 International Conference on Intelligent User Interfaces, IUI 2008) (2008)

    Google Scholar 

  28. Poggi, I.: Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler (2007)

    Google Scholar 

  29. Posner, R., Serenari, M.: Blag: Berlin dictionary of everyday gestures

    Google Scholar 

  30. Armstrong, N.: Field Guide to Gestures: How to Identify and Interpret Virtually Every Gesture Known to Man. Quirk Books (2003)

    Google Scholar 

  31. Cerekovic, A., Huang, H., Pandzic, I.S., Nakano, Y., Nishida, T.: Towards a multicultural ECA tour guide system. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 364–365. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  32. Kovar, L.: Automated Methods for Data-driven Synthesis of Realistic and Controllable Human Motion. PhD thesis, University of Wisconsin-Madison (2004)

    Google Scholar 

  33. Heck, R., Gleicher, M.: Parametric motion graphs. In: Proceedings of the 2007 symposium on Interactive 3D graphics and games, pp. 129–136. ACM, New York (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Čereković, A., Pejša, T., Pandžić, I.S. (2010). A Controller-Based Animation System for Synchronizing and Realizing Human-Like Conversational Behaviors. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12397-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12396-2

  • Online ISBN: 978-3-642-12397-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics