Skip to main content
Log in

Generating context-sensitive ECA responses to user barge-in interruptions

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

We present an Embodied Conversational Agent (ECA) that incorporates a context-sensitive mechanism for handling user barge-in. The affective ECA engages the user in social conversation, and is fully implemented. We will use actual examples of system behaviour to illustrate. The ECA is designed to recognise and be empathetic to the emotional state of the user. It is able to detect, react quickly to, and then follow up with considered responses to different kinds of user interruptions. The design of the rules which enable the ECA to respond intelligently to different types of interruptions was informed by manually analysed real data from human–human dialogue. The rules represent recoveries from interruptions as two-part structures: an address followed by a resumption. The system is robust enough to manage long, multi-utterance turns by both user and system, which creates good opportunities for the user to interrupt while the ECA is speaking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The graphical user interface for the HWYD? prototype including the animated agent were created by Telefonica ID, Madrid.

  2. Telefonica I+D in Madrid have designed the skin, lip-synching, and gestures for the HWYD? avatar (that we call Matilda) [30, 31], which is driven by a Haptek engine. Loquendo in Turin have innovated emotional speech for the HWYD? TTS module [32].

  3. All inter-component messages sent and received during run-time are recorded in a log file.

  4. The user’s turns are reproduced exactly as was output by the ASR, and therefore include some ungrammatical structures.

  5. Recall that we equate an interruption with overlapping/simultaneous speech (Sect. 2).

  6. http://www.bbc.co.uk/radio4/news/anyquestions_archive_dated.shtml

  7. The domain of the corpus we used was different from that of the HWYD? system, whose domain was social dialogue focusing on one’s day at the office. There was no opportunity at this late stage for us to collect a corpus in that domain, and no corpus available in this domain that was also full of interruptions.

  8. 20070112, 20070119, 20070126.

  9. Owing to the preliminary nature of the work, no inter-annotator agreement is yet in force.

  10. Ignoring turn content (ign-t-content) was the second most frequent way of recovering from an interruption that we observed in the corpus, the most frequent being to supply information that was requested by the interruption (at 30.65%), which we also modelled (see Sect. 5.2.3).

  11. The system did not recognise Turn (b) as a WH question because it did not begin with a WH word. This is one of many undesirable shortcomings that are currently being addressed.

  12. Additionally, we take note that many interruptions are considered by much of the interruptions literature to be hostile, in that the interruptor snatches the conversational floor before it is his turn. By implication it is reasonable to posit that if a user interrupts the system, the system may have said something that has not been well received by the user, and this adds weight to the appropriateness of an apology in an interruption recovery.

References

  1. Cavazza M, Santos de la Cámara R, Turunen M (The COMPANIONS Consortium) (2010) How was your day? A companion ECA. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS2010), Toronto, Canada, May 10–14, 2010, pp 1629–1630

    Google Scholar 

  2. Young S (2010) Still talking to machines (cognitively speaking), 2010. In: Proc Interspeech, Chiba, Japan, 26–30 September, 2010

    Google Scholar 

  3. Lemon O, Georgila K, Henderson J, Stuttle M (2006) An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system. In: Proceedings of the eleventh conference of the European chapter of the association for computational linguistics: posters & demonstrations, EACL ’06, Morristown, NJ, USA. Association for Computational Linguistics, Stroudsburg, pp 119–122

    Chapter  Google Scholar 

  4. Allen J, Chambers N, Ferguson G, Galescu L, Jung H, Swift M, Taysom W (2007) Plow: a collaborative task learning agent. In: Proceedings of the 22nd national conference on artificial intelligence, vol 2. AAAI Press, Menlo Park, pp 1514–1519

    Google Scholar 

  5. West C, Zimmerman D (1983) Small insults: A study of interruptions in cross-sex conversations between unacquainted persons. In: Thorne B, Kramarae C, Henley N (eds) Language, gender and society. Newbury House, Cambridge, pp 102–117

    Google Scholar 

  6. Lakoff RT (1995) Cries and whispers: the shattering of the silence. In: Hall K, Bucholtz M (eds) Gender articulated: language and the socially constructed self. Routledge, New York, pp 25–50

    Google Scholar 

  7. Sacks H, Schegloff EA, Jefferson G (1974) A simplest systematics for the organization of turn-taking for conversation. Language 50(4):696–735

    Article  Google Scholar 

  8. Coates J (1993) Women, men, and language: a sociolinguistic account of gender differences in language, 2nd edn. Longman, London/New York

    Google Scholar 

  9. Bevacqua E, Pammi S, Hyniewska SJ, Schröder M, Pelachaud C (2010) Multimodal backchannels for embodied conversational agents. In: Allbeck JM, Badler NI, Bickmore TW, Pelachaud C, Safonova A (eds) IVA. Lecture notes in computer science, vol 6356. Springer, Berlin, pp 194–200

    Google Scholar 

  10. Morency L-P, de Kok I, Gratch J (2008) Predicting listener backchannels: a probabilistic multimodal approach. In: Prendinger H, Lester JC, Ishizuka M (eds) IVA. Lecture notes in computer science, vol 5208. Springer, Berlin, pp 176–190

    Google Scholar 

  11. Zimmerman D, West C (1975) Sex roles, interruptions and silences in conversation. In: Thorne B, Henly N (eds) Language and sex: difference and dominance. Newbury House, Cambridge, pp 10–129

    Google Scholar 

  12. Murray SO (1985) Toward a model of members’ methods for recognizing interruptions. Lang Soc 14(1):31–40

    Article  Google Scholar 

  13. Raux A, Eskenazi M (2007) A multi-layer architecture for semi-synchronous event-driven dialogue management. In: ASRU, Kyoto, Japan, pp 514–519

    Google Scholar 

  14. Barnett J, Singh M (1996) Designing a portable spoken dialogue system. In: Maier E, Mast M, LuperFoy S (eds) ECAI workshop on dialogue processing in spoken language systems. Lecture notes in computer science, vol 1236. Springer, Berlin, pp 156–170

    Chapter  Google Scholar 

  15. Rose RC, Kim HK (2003) A hybrid barge-in procedure for more reliable turn-taking in human-machine dialog systems. In: Proceedings of the automatic speech recognition and understanding workshop

    Google Scholar 

  16. Balentine B, Morgan DP (1999) How to build speech recognition applications—a style guide for telephony dialogs. Enterprise Integration Group, San Ramon

    Google Scholar 

  17. Setlur AR, Sukkar RA (1998) Recognition-based word counting for reliable barge-in and early endpoint detection in continuous speech recognition. In: Proceeding of the international conference on spoken language processing, pp 2135–2138

    Google Scholar 

  18. Matsuyama K, Komatani K, Ogata T, Okuno HG (2009) Enabling a user to specify an item at any time during system enumeration—item identification for barge-in-able conversational dialogue systems. In: Proceedings of the 10th annual conference of the international speech communication association (INTERSPEECH 2009), Brighton UK, 6–10 September 2009, pp 252–255

    Google Scholar 

  19. Komatani K, Rudnicky AI (2009) Predicting barge-in utterance errors by using implicitly-supervised ASR accuracy and barge-in rate per user. In: Proceedings of the ACL-IJCNLP conference short papers, Suntec, Singapore, August 2009. Association for Computational Linguistics, Stroudsburg, pp 89–92

    Chapter  Google Scholar 

  20. Brooks RA (1985) A robust layered control system for a mobile robot. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA

  21. Brooks RA (1995) Intelligence without representation. In: Computation & intelligence: collected readings. American Association for Artificial Intelligence, Menlo Park, pp 343–362

    Google Scholar 

  22. Moore RK (2007) Presence: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans Comput 56(9):1176–1188

    Article  MathSciNet  Google Scholar 

  23. Reidsma D, de Kok T, Neiberg D, Pammi S, van Straalen B, Truong K, van Welbergen H (2011) Continuous interaction with a virtual human. J Multimodal User Interfaces 4:97–118

    Article  Google Scholar 

  24. Santos de la Cámara R, Turunen M, Hakulinen J, Field D (2010) How was your day? an architecture for multimodal ECA systems, 2010. In: Proc 11th annual meeting of the special interest group on discourse and dialogue (SIGDIAL), 24–25 September, 2010. University of Tokyo, Tokyo, pp 47–50

    Google Scholar 

  25. Vogt T, André E, Bee N (2008) Emovoice—a framework for online recognition of emotions from voice. In: Proceedings of the 4th IEEE tutorial and research workshop on perception and interactive technologies for speech-based systems: perception in multimodal dialogue systems, PIT ’08. Springer, Berlin, pp 188–199

    Google Scholar 

  26. Moilanen K, Pulman S (2007) Sentiment composition. In: Proceedings of the recent advances in natural language processing international conference (RANLP-2007), Borovets, Bulgaria, 27–29 September 2007, pp 378–382

    Google Scholar 

  27. Bremond C (1973) Logique du Récit. Editions du Seuil, Paris

    Google Scholar 

  28. Cavazza M, Smith C, Charlton D, Crook N, Boye J, Pulman S, Moilanen K, Pizzi D, Santos de la Cámara R, Turunen M (2010) Persuasive dialogue based on a narrative theory: an ECA implementation. In: Proceedings of the fifth international conference on persuasive technology (Persuasive 2010), Copenhagen, Denmark, 7–10 June 2010

    Google Scholar 

  29. Smith C, Crook N, Boye J, Charlton D, Dobnik S, Pizzi D, Cavazza M, Pulman S, Santos de la Cámara R, Turunen M (2010) Interaction strategies for an affective conversational agent. In: Proc of the 10th int. conf. on intelligent virtual agents (IVA 2010), Philadelphia, PA, September 2010

    Google Scholar 

  30. Hernández A, López B, Pardo D, Santos R, Hernández L, Relaño Gil J, Rodríguez M (2008) Modular definition of multimodal ECA communication acts to improve dialogue robustness and depth of intention. In: Proc 1st functional markup language workshop, 7th international joint conference on autonomous agents and multiagent systems (AAMAS 2008), Estoril, Portugal, 12–16 May 2008

    Google Scholar 

  31. López B, Hernández A, Pardo D, Santos R, Rodríguez M (2008) ECA gesture strategies for robust SLDS. In: Proc artificial intelligence and simulation behaviour convention (AISB 2008) symposium on multimodal output generation, Aberdeen, UK, 1–4 April, 2008

    Google Scholar 

  32. Danieli M, Zovato E (2010) The affective dimension of speech acts and voice expressiveness. In: Pettorino M, Giannini A, Chiari I, Dovetto Fr (eds) Spoken communication. Cambridge Scholars Publishing, Newcastle upon Tyne, pp 191–204

    Google Scholar 

  33. Stoness S, Tetreault J, Allen J (2004) Incremental parsing with reference interaction. In: ACL workshop on incremental parsing, pp 18–25

    Google Scholar 

  34. Aist G, Allen J, Campana E, Gallo C, Stoness S, Swift M, Tanenhaus M (2007) Incremental understanding in human-computer dialogue and experimental evidence for advantages over nonincremental methods. In: Proceedings of the 11th workshop on the semantics and pragmatics of dialogue, Trento, Italy, 30 May–1 June 2007, pp 149–154

    Google Scholar 

  35. Brick T, Scheutz M (2007) Incremental natural language processing for HRI. In: Proceedings of the ACM/IEEE international conference on Human-robot interaction, Arlington, Virginia, USA, pp 263–270

    Chapter  Google Scholar 

  36. Skantze G, Schlangen D (2009) Incremental dialogue processing in a micro-domain. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Athens, Greece, April 2009, pp 745–753,

    Google Scholar 

  37. Schlangen D, Skantze G (2009) A general, abstract model of incremental dialogue processing. In: Proc of the 12th conference of the European chapter of the ACL (EACL 2009), Athens, Greece, April 2009, pp 710–718

    Google Scholar 

  38. Starkey D (1972) Some signals and rules for taking speaking turns in conversations. J Pers Soc Psychol 23:283–292

    Article  Google Scholar 

  39. Wiemann JM, Knapp ML (1975) Turn-taking in conversations. J Commun 25:75–92

    Google Scholar 

  40. Schegloff EA (2000) Overlapping talk and the organization of turn-taking for conversation. Lang Soc 29(1):1–63

    Article  Google Scholar 

  41. Kennedy CW, Camden CT (1983) A new look at interruptions. West J Commun 47:45–58

    Google Scholar 

  42. Roger D (1989) 4: Experimental studies of dyadic turn-taking behaviour. In: Roger D, Bull P (eds) Conversation: an interdisciplinary perspective. Multilingual Matters, Clevedon

    Google Scholar 

  43. Hutchby I (1992) Confrontation talk: Aspects of interruption in argument sequences on talk radio. Interdiscip J Study Discourse 12:343–372

    Article  Google Scholar 

  44. Walker M, Whittaker S (1990) Mixed initiative in dialogue: An investigation into discourse segmentation. In: Proc. 28th annual meeting of the ACL, pp 70–79

    Google Scholar 

  45. Heins R, Franzke M, Durian M, Bayya A (1997) Turn-taking as a design principle for barge-in in spoken language systems. Int J Speech Technol 2:155–164. doi:10.1007/BF02208827

    Article  Google Scholar 

  46. Oth RKEN, Kieling A, Kuhn T, Mast M, Niemann H, Ott K, Batliner A (1994) Prosody takes over: towards a prosodically guided dialog system. Speech Commun 15(15):155–167

    Google Scholar 

  47. Austin JL (1962) How to do things with words, 2nd edn. Oxford University Press, New York

    Google Scholar 

  48. Bunt HC (2000) Dynamic interpretation and dialogue theory. In: Taylor MM, Neel F, Bouwhuis DG (eds) The structure of multimodal dialogue, vol 2. North-Holland, Amsterdam, pp 139–166

    Google Scholar 

  49. Traum DR (2000) 20 questions on dialogue act taxonomies. J Semant 17:7–30

    Article  Google Scholar 

  50. Jurafsky D, Shriberg E, Biasca D (1997) Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual, draft 13. University of Colorado, Boulder. Institute of Cognitive Science Technical Report 97-02

  51. Thomason RH (1990) Accommodation, meaning, and implicature: interdisciplinary foundations for pragmatics. In: Intentions and communication, pp 325–363

    Google Scholar 

  52. Lewis D (1979) Scorekeeping in a language game. J Philos Log 8:339–359. Reprinted in Lewis, D (1983) Philosophical papers, vol. I. Oxford University Press, New York/Oxford, pp 233–249

    Article  Google Scholar 

  53. Stalnaker R (1972) Pragmatics. In: Davidson D, Harman G (eds) Semantics of natural language. Synthese library, vol 40. Reidel, Dordrecht, pp 380–397

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was partially funded by the COMPANIONS project (http://www.companions-project.org) sponsored by the European Commission (EC) as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434. We thank the University of Augsburg (Prof. Elisabeth André) for supplying a version of the EmoVoice [25] system. Other contributors to the prototype described in this paper are Ramon Granell, Simon Dobnik, Karo Moilanen and Manjari Chandran-Ramesh (University of Oxford), Raúl Santos de la Cámara (Telefonica ID, Madrid), Markku Turunen (University of Tampere) and Enrico Zovato (Loquendo, Torino)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nigel Crook.

Additional information

N. Crook and D. Field are joint 1st authors.

Appendix: Interruption and Recovery Types

Appendix: Interruption and Recovery Types

Table 3 Types of recovery from an interruption and their relative frequencies
Table 4 Types of interruption and their relative frequencies

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crook, N., Field, D., Smith, C. et al. Generating context-sensitive ECA responses to user barge-in interruptions. J Multimodal User Interfaces 6, 13–25 (2012). https://doi.org/10.1007/s12193-012-0090-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-012-0090-z

Keywords

Navigation