A Controller-Based Animation System for Synchronizing and Realizing Human-Like Conversational Behaviors

Čereković, Aleksandra; Pejša, Tomislav; Pandžić, Igor S.

doi:10.1007/978-3-642-12397-9_6

Aleksandra Čereković²⁰,
Tomislav Pejša²⁰ &
Igor S. Pandžić²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5967))

2298 Accesses
3 Citations

Abstract

The Embodied Conversational Agents (ECAs) are an application of virtual characters that is subject of considerable ongoing research. An essential prerequisite for creating believable ECAs is the ability to describe and visually realize multimodal conversational behaviors. The recently developed Behavior Markup Language (BML) seeks to address this requirement by granting a means to specify physical realizations of multimodal behaviors through human-readable scripts. In this paper we present an approach to implement a behavior realizer compatible with BML language. The system’s architecture is based on hierarchical controllers which apply preprocessed behaviors to body modalities. Animation database is feasibly extensible and contains behavior examples constructed upon existing lexicons and theory of gestures. Furthermore, we describe a novel solution to the issue of synchronizing gestures with synthesized speech using neural networks and propose improvements to the BML specification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cassell, J.: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Google Scholar
Lee, J., Marsella, S.: Non-verbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243–255. Springer, Heidelberg (2006)
Chapter Google Scholar
Pelachaud, C.: Studies on gesture expressivity for a virtual agent. Speech Communication, Special issue in honor of Bjorn Granstrom and Rolf Carlson (2009) (to appear)
Google Scholar
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: Creating animated conversational characters from recordings of human performance. In: Proceedings of ACM SIGGRAPH 2004, vol. 23, pp. 506–513 (2004)
Google Scholar
Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thorisson, K., Vilhjalmsson, H.: Towards a common framework for multimodal generation: The behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006)
Chapter Google Scholar
Vilhjálmsson, H., Cantelmo, N., Cassell, J., Chafai, N.E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A.N., Pelachaud, C., Ruttkay, Z., Thórisson, K.R., van Welbergen, H., van der Werf, R.J.: The behavior markup language: Recent developments and challenges. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 99–111. Springer, Heidelberg (2007)
Chapter Google Scholar
Ekman, P.: About brows: Emotional and conversational signals, pp. 169–202. Cambridge University Press, Cambridge (1979)
Google Scholar
McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)
Google Scholar
Chovil, N.: Discourse-oriented facial displays in conversation. Research on Language and Social Interaction 25, 163–194 (1991)
Article Google Scholar
Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 1–24 (2008)
Article Google Scholar
Cassell, J., Vilhjalmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: SIGGRAPH 2001: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 477–486. ACM, New York (2001)
Google Scholar
Smid, K., Zoric, G., Pandzic, I.S.: [HUGE]: Universal architecture for statistically based hUman gEsturing. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 256–269. Springer, Heidelberg (2006)
Chapter Google Scholar
Zoric, G., Smid, K., Pandzic, I.S.: Towards facial gestures generation by speech signal analysis using huge architecture. In: Multimodal Signals: Cognitive and Algorithmic Issues: COST Action 2102 and euCognition International School Vietri sul Mare, Italy, April 21-26, Revised Selected and Invited Papers, pp. 112–120. Springer, Heidelberg (2009)
Google Scholar
Albrecht, I., Haber, J., peter Seidel, H.: Automatic generation of non-verbal facial expressions from speech. In: Proc. Computer Graphics International 2002, pp. 283–293 (2002)
Google Scholar
Kopp, S., Wachsmuth, I.: Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds 15, 39–52 (2004)
Article Google Scholar
Thiebaux, M., Marshall, A., Marsella, S., Kallmann, M.: Smartbody: Behavior realization for embodied conversational agents. In: Proceedings of Autonomous Agents and Multi-Agent Systems AAMAS (2008)
Google Scholar
Microsoft speech API: http://www.microsoft.com/speech/speech2007/default.mspx
Pejsa, T., Pandzic, I.S.: Architecture of an animation system for human characters. In: Proceedings of the 10th International Conference on Telecommunications ConTEL 2009 (2009)
Google Scholar
Pandzic, I.S., Ahlberg, J., Wzorek, M., Rudol, P., Mosmondor, M.: Faces everywhere: Towards ubiquitous production and delivery of face animation. In: Proceedings of the 2nd International Conference on Mobile and Ubiquitous Multimedia MUM 2003, pp. 49–55 (2003)
Google Scholar
Hartmann, B., Mancini, M., Pelachaud, C.: Formational parameters and adaptive prototype instantiation for mpeg-4 compliant gesture synthesis. In: Proc. Computer Animation, June 19-21, pp. 111–119 (2002)
Google Scholar
Van Deemter, K., Krenn, B., Piwek, P., Klesen, M., Schroder, M., Baumann, S.: Fully generated scripted dialogue for embodied agents. Artificial Intelligence 172(10), 1219–1244 (2008)
Article MATH Google Scholar
Schröder, M., Trouvain, J.: The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. International Journal of Speech Technology 6, 365–377 (2003)
Article Google Scholar
Taylor, P.A., Black, A., Caley, R.: The architecture of the festival speech synthesis system. In: The Third ESCA Workshop in Speech Synthesis, pp. 147–151 (1998)
Google Scholar
Rojas, R.: Neural Networks - A Systematic Introduction. Springer, Heidelberg (1996)
MATH Google Scholar
Steinmetz, R.: Human perception of jitter and media synchronization. IEEE Journal on Selected Areas in Communications 14(1) (1996)
Google Scholar
Brkic, M., Smid, K., Pejsa, T., Pandzic, I.S.: Towards natural head movement of autonomous speaker agent. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, pp. 73–80. Springer, Heidelberg (2008)
Chapter Google Scholar
Huang, H., Cerekovic, A., Pandzic, I.S., Nakano, Y., Nishida, T.: Toward a culture adaptive conversational agent with a modularized approach. In: Proceedings of Workshop on Enculturating Conversational Interfaces by Socio-cultural Aspects of Communication (2008 International Conference on Intelligent User Interfaces, IUI 2008) (2008)
Google Scholar
Poggi, I.: Mind, hands, face and body: a goal and belief view of multimodal communication. Weidler (2007)
Google Scholar
Posner, R., Serenari, M.: Blag: Berlin dictionary of everyday gestures
Google Scholar
Armstrong, N.: Field Guide to Gestures: How to Identify and Interpret Virtually Every Gesture Known to Man. Quirk Books (2003)
Google Scholar
Cerekovic, A., Huang, H., Pandzic, I.S., Nakano, Y., Nishida, T.: Towards a multicultural ECA tour guide system. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 364–365. Springer, Heidelberg (2007)
Chapter Google Scholar
Kovar, L.: Automated Methods for Data-driven Synthesis of Realistic and Controllable Human Motion. PhD thesis, University of Wisconsin-Madison (2004)
Google Scholar
Heck, R., Gleicher, M.: Parametric motion graphs. In: Proceedings of the 2007 symposium on Interactive 3D graphics and games, pp. 129–136. ACM, New York (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
Aleksandra Čereković, Tomislav Pejša & Igor S. Pandžić

Authors

Aleksandra Čereković
View author publications
You can also search for this author in PubMed Google Scholar
Tomislav Pejša
View author publications
You can also search for this author in PubMed Google Scholar
Igor S. Pandžić
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Second University of Naples, and IIASS, Via Pellegrino, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Centre for Language and Communication Studies, Trinity College, The University of Dublin, Dublin 2, Ireland
Nick Campbell & Carl Vogel &
Department of Computing Science & Mathematics, University of Stirling, FK9 4LA, Stirling, Scotland, UK
Amir Hussain
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Čereković, A., Pejša, T., Pandžić, I.S. (2010). A Controller-Based Animation System for Synchronizing and Realizing Human-Like Conversational Behaviors. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-12397-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics