Generating context-sensitive ECA responses to user barge-in interruptions

Crook, Nigel; Field, Debora; Smith, Cameron; Harding, Sue; Pulman, Stephen; Cavazza, Marc; Charlton, Daniel; Moore, Roger; Boye, Johan

doi:10.1007/s12193-012-0090-z

Generating context-sensitive ECA responses to user barge-in interruptions

Original Paper
Published: 14 April 2012

Volume 6, pages 13–25, (2012)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Nigel Crook¹,
Debora Field²,
Cameron Smith³,
Sue Harding²,
Stephen Pulman¹,
Marc Cavazza³,
Daniel Charlton³,
Roger Moore² &
…
Johan Boye⁴

282 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

We present an Embodied Conversational Agent (ECA) that incorporates a context-sensitive mechanism for handling user barge-in. The affective ECA engages the user in social conversation, and is fully implemented. We will use actual examples of system behaviour to illustrate. The ECA is designed to recognise and be empathetic to the emotional state of the user. It is able to detect, react quickly to, and then follow up with considered responses to different kinds of user interruptions. The design of the rules which enable the ECA to respond intelligently to different types of interruptions was informed by manually analysed real data from human–human dialogue. The rules represent recoveries from interruptions as two-part structures: an address followed by a resumption. The system is robust enough to manage long, multi-utterance turns by both user and system, which creates good opportunities for the user to interrupt while the ECA is speaking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building Rapport between Human and ECA: A Pilot Study

Towards Emotionally Sensitive Conversational Interfaces for E-therapy

M-Path: A Conversational System for the Empathic Virtual Agent

Notes

The graphical user interface for the HWYD? prototype including the animated agent were created by Telefonica ID, Madrid.
Telefonica I+D in Madrid have designed the skin, lip-synching, and gestures for the HWYD? avatar (that we call Matilda) [30, 31], which is driven by a Haptek engine. Loquendo in Turin have innovated emotional speech for the HWYD? TTS module [32].
All inter-component messages sent and received during run-time are recorded in a log file.
The user’s turns are reproduced exactly as was output by the ASR, and therefore include some ungrammatical structures.
Recall that we equate an interruption with overlapping/simultaneous speech (Sect. 2).
http://www.bbc.co.uk/radio4/news/anyquestions_archive_dated.shtml
The domain of the corpus we used was different from that of the HWYD? system, whose domain was social dialogue focusing on one’s day at the office. There was no opportunity at this late stage for us to collect a corpus in that domain, and no corpus available in this domain that was also full of interruptions.
20070112, 20070119, 20070126.
Owing to the preliminary nature of the work, no inter-annotator agreement is yet in force.
Ignoring turn content (ign-t-content) was the second most frequent way of recovering from an interruption that we observed in the corpus, the most frequent being to supply information that was requested by the interruption (at 30.65%), which we also modelled (see Sect. 5.2.3).
The system did not recognise Turn (b) as a WH question because it did not begin with a WH word. This is one of many undesirable shortcomings that are currently being addressed.
Additionally, we take note that many interruptions are considered by much of the interruptions literature to be hostile, in that the interruptor snatches the conversational floor before it is his turn. By implication it is reasonable to posit that if a user interrupts the system, the system may have said something that has not been well received by the user, and this adds weight to the appropriateness of an apology in an interruption recovery.

References

Cavazza M, Santos de la Cámara R, Turunen M (The COMPANIONS Consortium) (2010) How was your day? A companion ECA. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS2010), Toronto, Canada, May 10–14, 2010, pp 1629–1630
Google Scholar
Young S (2010) Still talking to machines (cognitively speaking), 2010. In: Proc Interspeech, Chiba, Japan, 26–30 September, 2010
Google Scholar
Lemon O, Georgila K, Henderson J, Stuttle M (2006) An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system. In: Proceedings of the eleventh conference of the European chapter of the association for computational linguistics: posters & demonstrations, EACL ’06, Morristown, NJ, USA. Association for Computational Linguistics, Stroudsburg, pp 119–122
Chapter Google Scholar
Allen J, Chambers N, Ferguson G, Galescu L, Jung H, Swift M, Taysom W (2007) Plow: a collaborative task learning agent. In: Proceedings of the 22nd national conference on artificial intelligence, vol 2. AAAI Press, Menlo Park, pp 1514–1519
Google Scholar
West C, Zimmerman D (1983) Small insults: A study of interruptions in cross-sex conversations between unacquainted persons. In: Thorne B, Kramarae C, Henley N (eds) Language, gender and society. Newbury House, Cambridge, pp 102–117
Google Scholar
Lakoff RT (1995) Cries and whispers: the shattering of the silence. In: Hall K, Bucholtz M (eds) Gender articulated: language and the socially constructed self. Routledge, New York, pp 25–50
Google Scholar
Sacks H, Schegloff EA, Jefferson G (1974) A simplest systematics for the organization of turn-taking for conversation. Language 50(4):696–735
Article Google Scholar
Coates J (1993) Women, men, and language: a sociolinguistic account of gender differences in language, 2nd edn. Longman, London/New York
Google Scholar
Bevacqua E, Pammi S, Hyniewska SJ, Schröder M, Pelachaud C (2010) Multimodal backchannels for embodied conversational agents. In: Allbeck JM, Badler NI, Bickmore TW, Pelachaud C, Safonova A (eds) IVA. Lecture notes in computer science, vol 6356. Springer, Berlin, pp 194–200
Google Scholar
Morency L-P, de Kok I, Gratch J (2008) Predicting listener backchannels: a probabilistic multimodal approach. In: Prendinger H, Lester JC, Ishizuka M (eds) IVA. Lecture notes in computer science, vol 5208. Springer, Berlin, pp 176–190
Google Scholar
Zimmerman D, West C (1975) Sex roles, interruptions and silences in conversation. In: Thorne B, Henly N (eds) Language and sex: difference and dominance. Newbury House, Cambridge, pp 10–129
Google Scholar
Murray SO (1985) Toward a model of members’ methods for recognizing interruptions. Lang Soc 14(1):31–40
Article Google Scholar
Raux A, Eskenazi M (2007) A multi-layer architecture for semi-synchronous event-driven dialogue management. In: ASRU, Kyoto, Japan, pp 514–519
Google Scholar
Barnett J, Singh M (1996) Designing a portable spoken dialogue system. In: Maier E, Mast M, LuperFoy S (eds) ECAI workshop on dialogue processing in spoken language systems. Lecture notes in computer science, vol 1236. Springer, Berlin, pp 156–170
Chapter Google Scholar
Rose RC, Kim HK (2003) A hybrid barge-in procedure for more reliable turn-taking in human-machine dialog systems. In: Proceedings of the automatic speech recognition and understanding workshop
Google Scholar
Balentine B, Morgan DP (1999) How to build speech recognition applications—a style guide for telephony dialogs. Enterprise Integration Group, San Ramon
Google Scholar
Setlur AR, Sukkar RA (1998) Recognition-based word counting for reliable barge-in and early endpoint detection in continuous speech recognition. In: Proceeding of the international conference on spoken language processing, pp 2135–2138
Google Scholar
Matsuyama K, Komatani K, Ogata T, Okuno HG (2009) Enabling a user to specify an item at any time during system enumeration—item identification for barge-in-able conversational dialogue systems. In: Proceedings of the 10th annual conference of the international speech communication association (INTERSPEECH 2009), Brighton UK, 6–10 September 2009, pp 252–255
Google Scholar
Komatani K, Rudnicky AI (2009) Predicting barge-in utterance errors by using implicitly-supervised ASR accuracy and barge-in rate per user. In: Proceedings of the ACL-IJCNLP conference short papers, Suntec, Singapore, August 2009. Association for Computational Linguistics, Stroudsburg, pp 89–92
Chapter Google Scholar
Brooks RA (1985) A robust layered control system for a mobile robot. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA
Brooks RA (1995) Intelligence without representation. In: Computation & intelligence: collected readings. American Association for Artificial Intelligence, Menlo Park, pp 343–362
Google Scholar
Moore RK (2007) Presence: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans Comput 56(9):1176–1188
Article MathSciNet Google Scholar
Reidsma D, de Kok T, Neiberg D, Pammi S, van Straalen B, Truong K, van Welbergen H (2011) Continuous interaction with a virtual human. J Multimodal User Interfaces 4:97–118
Article Google Scholar
Santos de la Cámara R, Turunen M, Hakulinen J, Field D (2010) How was your day? an architecture for multimodal ECA systems, 2010. In: Proc 11th annual meeting of the special interest group on discourse and dialogue (SIGDIAL), 24–25 September, 2010. University of Tokyo, Tokyo, pp 47–50
Google Scholar
Vogt T, André E, Bee N (2008) Emovoice—a framework for online recognition of emotions from voice. In: Proceedings of the 4th IEEE tutorial and research workshop on perception and interactive technologies for speech-based systems: perception in multimodal dialogue systems, PIT ’08. Springer, Berlin, pp 188–199
Google Scholar
Moilanen K, Pulman S (2007) Sentiment composition. In: Proceedings of the recent advances in natural language processing international conference (RANLP-2007), Borovets, Bulgaria, 27–29 September 2007, pp 378–382
Google Scholar
Bremond C (1973) Logique du Récit. Editions du Seuil, Paris
Google Scholar
Cavazza M, Smith C, Charlton D, Crook N, Boye J, Pulman S, Moilanen K, Pizzi D, Santos de la Cámara R, Turunen M (2010) Persuasive dialogue based on a narrative theory: an ECA implementation. In: Proceedings of the fifth international conference on persuasive technology (Persuasive 2010), Copenhagen, Denmark, 7–10 June 2010
Google Scholar
Smith C, Crook N, Boye J, Charlton D, Dobnik S, Pizzi D, Cavazza M, Pulman S, Santos de la Cámara R, Turunen M (2010) Interaction strategies for an affective conversational agent. In: Proc of the 10th int. conf. on intelligent virtual agents (IVA 2010), Philadelphia, PA, September 2010
Google Scholar
Hernández A, López B, Pardo D, Santos R, Hernández L, Relaño Gil J, Rodríguez M (2008) Modular definition of multimodal ECA communication acts to improve dialogue robustness and depth of intention. In: Proc 1st functional markup language workshop, 7th international joint conference on autonomous agents and multiagent systems (AAMAS 2008), Estoril, Portugal, 12–16 May 2008
Google Scholar
López B, Hernández A, Pardo D, Santos R, Rodríguez M (2008) ECA gesture strategies for robust SLDS. In: Proc artificial intelligence and simulation behaviour convention (AISB 2008) symposium on multimodal output generation, Aberdeen, UK, 1–4 April, 2008
Google Scholar
Danieli M, Zovato E (2010) The affective dimension of speech acts and voice expressiveness. In: Pettorino M, Giannini A, Chiari I, Dovetto Fr (eds) Spoken communication. Cambridge Scholars Publishing, Newcastle upon Tyne, pp 191–204
Google Scholar
Stoness S, Tetreault J, Allen J (2004) Incremental parsing with reference interaction. In: ACL workshop on incremental parsing, pp 18–25
Google Scholar
Aist G, Allen J, Campana E, Gallo C, Stoness S, Swift M, Tanenhaus M (2007) Incremental understanding in human-computer dialogue and experimental evidence for advantages over nonincremental methods. In: Proceedings of the 11th workshop on the semantics and pragmatics of dialogue, Trento, Italy, 30 May–1 June 2007, pp 149–154
Google Scholar
Brick T, Scheutz M (2007) Incremental natural language processing for HRI. In: Proceedings of the ACM/IEEE international conference on Human-robot interaction, Arlington, Virginia, USA, pp 263–270
Chapter Google Scholar
Skantze G, Schlangen D (2009) Incremental dialogue processing in a micro-domain. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Athens, Greece, April 2009, pp 745–753,
Google Scholar
Schlangen D, Skantze G (2009) A general, abstract model of incremental dialogue processing. In: Proc of the 12th conference of the European chapter of the ACL (EACL 2009), Athens, Greece, April 2009, pp 710–718
Google Scholar
Starkey D (1972) Some signals and rules for taking speaking turns in conversations. J Pers Soc Psychol 23:283–292
Article Google Scholar
Wiemann JM, Knapp ML (1975) Turn-taking in conversations. J Commun 25:75–92
Google Scholar
Schegloff EA (2000) Overlapping talk and the organization of turn-taking for conversation. Lang Soc 29(1):1–63
Article Google Scholar
Kennedy CW, Camden CT (1983) A new look at interruptions. West J Commun 47:45–58
Google Scholar
Roger D (1989) 4: Experimental studies of dyadic turn-taking behaviour. In: Roger D, Bull P (eds) Conversation: an interdisciplinary perspective. Multilingual Matters, Clevedon
Google Scholar
Hutchby I (1992) Confrontation talk: Aspects of interruption in argument sequences on talk radio. Interdiscip J Study Discourse 12:343–372
Article Google Scholar
Walker M, Whittaker S (1990) Mixed initiative in dialogue: An investigation into discourse segmentation. In: Proc. 28th annual meeting of the ACL, pp 70–79
Google Scholar
Heins R, Franzke M, Durian M, Bayya A (1997) Turn-taking as a design principle for barge-in in spoken language systems. Int J Speech Technol 2:155–164. doi:10.1007/BF02208827
Article Google Scholar
Oth RKEN, Kieling A, Kuhn T, Mast M, Niemann H, Ott K, Batliner A (1994) Prosody takes over: towards a prosodically guided dialog system. Speech Commun 15(15):155–167
Google Scholar
Austin JL (1962) How to do things with words, 2nd edn. Oxford University Press, New York
Google Scholar
Bunt HC (2000) Dynamic interpretation and dialogue theory. In: Taylor MM, Neel F, Bouwhuis DG (eds) The structure of multimodal dialogue, vol 2. North-Holland, Amsterdam, pp 139–166
Google Scholar
Traum DR (2000) 20 questions on dialogue act taxonomies. J Semant 17:7–30
Article Google Scholar
Jurafsky D, Shriberg E, Biasca D (1997) Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual, draft 13. University of Colorado, Boulder. Institute of Cognitive Science Technical Report 97-02
Thomason RH (1990) Accommodation, meaning, and implicature: interdisciplinary foundations for pragmatics. In: Intentions and communication, pp 325–363
Google Scholar
Lewis D (1979) Scorekeeping in a language game. J Philos Log 8:339–359. Reprinted in Lewis, D (1983) Philosophical papers, vol. I. Oxford University Press, New York/Oxford, pp 233–249
Article Google Scholar
Stalnaker R (1972) Pragmatics. In: Davidson D, Harman G (eds) Semantics of natural language. Synthese library, vol 40. Reidel, Dordrecht, pp 380–397
Chapter Google Scholar

Download references

Acknowledgements

This work was partially funded by the COMPANIONS project (http://www.companions-project.org) sponsored by the European Commission (EC) as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434. We thank the University of Augsburg (Prof. Elisabeth André) for supplying a version of the EmoVoice [25] system. Other contributors to the prototype described in this paper are Ramon Granell, Simon Dobnik, Karo Moilanen and Manjari Chandran-Ramesh (University of Oxford), Raúl Santos de la Cámara (Telefonica ID, Madrid), Markku Turunen (University of Tampere) and Enrico Zovato (Loquendo, Torino)

Author information

Authors and Affiliations

Department of Computing and Communication Technologies, Oxford Brookes University, Wheatley Campus, Oxford, UK
Nigel Crook & Stephen Pulman
Department of Computer Science, University of Sheffield, Sheffield, UK
Debora Field, Sue Harding & Roger Moore
School of Computing, Teesside University, Middlesbrough, UK
Cameron Smith, Marc Cavazza & Daniel Charlton
School of Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
Johan Boye

Authors

Nigel Crook
View author publications
You can also search for this author in PubMed Google Scholar
Debora Field
View author publications
You can also search for this author in PubMed Google Scholar
Cameron Smith
View author publications
You can also search for this author in PubMed Google Scholar
Sue Harding
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Pulman
View author publications
You can also search for this author in PubMed Google Scholar
Marc Cavazza
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Charlton
View author publications
You can also search for this author in PubMed Google Scholar
Roger Moore
View author publications
You can also search for this author in PubMed Google Scholar
Johan Boye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nigel Crook.

Additional information

N. Crook and D. Field are joint 1st authors.

Appendix: Interruption and Recovery Types

Table 3 Types of recovery from an interruption and their relative frequencies

Full size table

Table 4 Types of interruption and their relative frequencies

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crook, N., Field, D., Smith, C. et al. Generating context-sensitive ECA responses to user barge-in interruptions. J Multimodal User Interfaces 6, 13–25 (2012). https://doi.org/10.1007/s12193-012-0090-z

Download citation

Received: 26 November 2010
Accepted: 13 January 2012
Published: 14 April 2012
Issue Date: July 2012
DOI: https://doi.org/10.1007/s12193-012-0090-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generating context-sensitive ECA responses to user barge-in interruptions

Abstract

Access this article

Similar content being viewed by others

Building Rapport between Human and ECA: A Pilot Study

Towards Emotionally Sensitive Conversational Interfaces for E-therapy

M-Path: A Conversational System for the Empathic Virtual Agent

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Interruption and Recovery Types

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generating context-sensitive ECA responses to user barge-in interruptions

Abstract

Access this article

Similar content being viewed by others

Building Rapport between Human and ECA: A Pilot Study

Towards Emotionally Sensitive Conversational Interfaces for E-therapy

M-Path: A Conversational System for the Empathic Virtual Agent

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Interruption and Recovery Types

Appendix: Interruption and Recovery Types

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation