research-article

Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees

Authors:
Viktor Richter

Bielefeld University (CITEC), 33615 Bielefeld, Germany

Bielefeld University (CITEC), 33615 Bielefeld, Germany
View Profile

,
Birte Carlmeyer

Bielefeld University (CITEC), 33615 Bielefeld, Germany

Bielefeld University (CITEC), 33615 Bielefeld, Germany
View Profile

,
Florian Lier

Bielefeld University (CITEC), 33615 Bielefeld, Germany

Bielefeld University (CITEC), 33615 Bielefeld, Germany
View Profile

,
Sebastian Meyer zu Borgsen

Bielefeld University (CITEC), 33615 Bielefeld, Germany

Bielefeld University (CITEC), 33615 Bielefeld, Germany
View Profile

,
David Schlangen

Bielefeld University, 33615 Bielefeld, Germany

Bielefeld University, 33615 Bielefeld, Germany
View Profile

,
Franz Kummert

Bielefeld University (CITEC), 33615 Bielefeld, Germany

Bielefeld University (CITEC), 33615 Bielefeld, Germany
View Profile

,
Sven Wachsmuth

Bielefeld University (CITEC), 33615 Bielefeld, Germany

Bielefeld University (CITEC), 33615 Bielefeld, Germany
View Profile

,
Britta Wrede

Bielefeld University (CITEC), 33615 Bielefeld, Germany

Bielefeld University (CITEC), 33615 Bielefeld, Germany
View Profile

HAI '16: Proceedings of the Fourth International Conference on Human Agent InteractionOctober 2016Pages 43–50https://doi.org/10.1145/2974804.2974823

Published:04 October 2016Publication History

HAI '16: Proceedings of the Fourth International Conference on Human Agent Interaction

Pages 43–50

ABSTRACT

In this paper, we present our humanoid robot "Meka", participating in a multi party human robot dialogue scenario. Active arbitration of the robot's attention based on multi-modal stimuli is utilised to observe persons which are outside of the robots field of view. We investigate the impact of this attention management and addressee recognition on the robot's capability to distinguish utterances directed at it from communication between humans. Based on the results of a user study, we show that mutual gaze at the end of an utterance, as a means of yielding a turn, is a substantial cue for addressee recognition. Verification of a speaker through the detection of lip movements can be used to further increase precision. Furthermore, we show that even a rather simplistic fusion of gaze and lip movement cues allows a considerable enhancement in addressee estimation, and can be altered to adapt to the requirements of a particular scenario.

References

Timo Baumann and David Schlangen. 2012. The InproTK 2012 Release. In NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data. 29--32. http://nbn-resolving.de/urn:nbn:de:0070-pub-25145558 Google ScholarDigital Library
Dan Bohus and Eric Horvitz. 2010. Facilitating multiparty dialog with gaze, gesture, and speech. In Ieeernational Conference on Multimodal Interfaces, Workshop on Machine Learning for Multimodal Interaction. 1. DOI: http://dx.doi.org/10.1145/1891903.1891910 Google ScholarDigital Library
Dan Bohus and Eric Horvitz. 2011. Multiparty turn taking in situated dialog: Study, lessons, and directions. In Special Interest Group on Discourse and Dialogue. http://dl.acm.org/citation.cfm?id=2132903 Google ScholarDigital Library
Cynthia Breazeal. 2003. Toward sociable robots. Robotics and Autonomous Systems 42, 3 (2003), 167--175.Google ScholarCross Ref
Cynthia Breazeal and Brian Scassellati. 1999. A Context-dependent Attention System for a Social Robot. In International Joint Conference on Artificial Intelligence. 1146--1151. http://dl.acm.org/citation.cfm?id=1624312.1624382 Google ScholarDigital Library
Allison Bruce, Illah Nourbakhsh, and Reid Simmons. 2002. The role of expressiveness and attention in human-robot interaction. In International Conference on Robotics and Automation, Vol. 4. 4138--4142. DOI: http://dx.doi.org/10.1109/ROBOT.2002.1014396Google ScholarCross Ref
Birte Carlmeyer, David Schlangen, and Britta Wrede. 2014. Towards Closed Feedback Loops in HRI: Integrating InproTK and PaMini. In Workshop on Multimodal, Multi-Party, Real-World Human-Robot Interaction (MMRWHRI '14). 1--6. DOI: http://dx.doi.org/10.1145/2666499.2666500 Google ScholarDigital Library
Kerstin Dautenhahn. 2007. Socially intelligent robots: dimensions of human-robot interaction. Philosophical Transactions of the Royal Society of London B: Biological Sciences 362, 1480 (2007), 679--704. DOI: http://dx.doi.org/10.1098/rstb.2006.2004Google ScholarCross Ref
Boris De Ruyter, Privender Saini, Panos Markopoulos, and Albert Van Breemen. 2005. Assessing the Effects of Building Social Intelligence in a Robotic Interface for the Home. Interacting with Computers 17, 5 (2005), 522--541. DOI:http://dx.doi.org/10.1016/j.intcom.2005.03.003 Google ScholarDigital Library
Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. 2003. A survey of socially interactive robots. Robotics and Autonomous Systems 42, 3 (2003), 143--166. DOI:http://dx.doi.org/10.1016/S0921--8890(02)00372-XGoogle ScholarCross Ref
Afina S. Glas, Jeroen G. Lijmer, Martin H. Prins, Gouke J. Bonsel, and Patrick M.M. Bossuyt. 2003. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology 56, 11 (2003), 1129--1135. DOI: http://dx.doi.org/10.1016/S0895--4356(03)00177-XGoogle ScholarCross Ref
Marcel Heerink, Ben Kröse, Vanessa Evers, BJ Wielinga, and others. 2008. The influence of social presence on acceptance of a companion robot by older people. Journal of Physical Agents 2, 2 (2008), 33--40. DOI: http://dx.doi.org/10.14198/JoPha.2008.2.2.05Google Scholar
Patrick Holthaus. 2014. Approaching Human-Like Spatial Awareness in Social Robotics - An Investigation of Spatial Interaction Strategies with a Receptionist Robot. Ph.D. Dissertation. Bielefeld University.Google Scholar
Patrick Holthaus, Christian Leichsenring, Jasmin Bernotat, Viktor Richter, Marian Pohling, Birte Carlmeyer, Norman Köster, Sebastian Meyer zu Borgsen, René Zorn, Birte Schiffhauer, Kai Frederic Engelmann, Florian Lier, Simon Schulz, Philipp Cimiano, Friederike Eyssel, Thomas Hermann, Franz Kummert, David Schlangen, Sven Wachsmuth, Petra Wagner, Britta Wrede, and Sebastian Wrede. 2016. How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment. In International Conference on Language Resources and Evaluation (23--28).Google Scholar
Patrick Holthaus, Karola Pitsch, and Sven Wachsmuth. 2011. How Can I Help? International Journal of Social Robotics 3, 4 (11 2011), 383--393. DOI: http://dx.doi.org/10.1007/s12369-011-0108--9Google Scholar
Dinesh Babu Jayagopi and Jean-Marc Odobez. 2013. Given that, should i respond? Contextual addressee estimation in multi-party human-robot interactions. In Human-Robot Interaction. 147--148. DOI: http://dx.doi.org/10.1109/HRI.2013.6483544 Google ScholarDigital Library
Martin Johansson and Gabriel Skantze. 2015. Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction. In Special Interest Group on Discourse and Dialogue. 305--314.Google Scholar
Martin Johansson, Gabriel Skantze, and Joakim Gustafson. 2014. Comparison of Human-Human and Human-Robot Turn-Taking Behaviour in Multiparty Situated Interaction. In Workshop on Understanding and Modeling Multiparty, Multimodal Interactions. 21--26. DOI:http://dx.doi.org/10.1145/2666242.2666249 Google ScholarDigital Library
Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874. DOI: http://dx.doi.org/10.1109/CVPR.2014.241 Google ScholarDigital Library
Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj, Rita Singh, William Walker, Manfred Warmuth, and Peter Wolf. 2003. The Carnegie Mellon University SPHINX-4 speech recognition system. In IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1. Citeseer, 2--5.Google Scholar
Sebastian Lang, Marcus Kleinehagenbrock, Sascha Hohenner, Jannik Fritsch, Gernot a Fink, and Gerhard Sagerer. 2003a. Providing the basis for human-robot-interaction. In International Conference on Multimodal Interfaces. 28. DOI: http://dx.doi.org/10.1145/958432.958441 Google ScholarDigital Library
Sebastian Lang, Marcus Kleinehagenbrock, Sascha Hohenner, Jannik Fritsch, Gernot A. Fink, and Gerhard Sagerer. 2003b. Providing the Basis for Human-Robot-Interaction: A Multi-Modal Attention System for a Mobile Robot. In International Conference on Multimodal Interfaces. DOI: http://dx.doi.org/10.1145/958432.958441 Google ScholarDigital Library
Liyuan Li, Qianli Xu, and Yeow Kee Tan. 2012. Attention-based addressee selection for service and social robots to interact with multiple persons. In Proceedings of the Workshop at SIGGRAPH WASA, Vol. 1. 131. DOI: http://dx.doi.org/10.1145/2425296.2425319 Google ScholarDigital Library
Bilge Mutlu, Toshiyuki Shiwa, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2009. Footing in human-robot conversations. In Human Robot Interaction, Vol. 2. 61. DOI: http://dx.doi.org/10.1145/1514095.1514109 Google ScholarDigital Library
Julia Peltason and Britta Wrede. 2010. Pamini: A Framework for Assembling Mixed-Initiative Human-Robot Interaction from Generic Interaction Patterns. In Special Interest Group on Discourse and Dialogue (SIGDIAL '10). 229--232. Google ScholarDigital Library
Gill A Pratt and Matthew M Williamson. 1995. Series elastic actuators. In Human Robot Interaction and Cooperative Robots, Vol. 1. IEEE, 399--406. Google ScholarDigital Library
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: an open-source Robot Operating System. In International Conference on Robotics and Automation Workshop on Open Source Software, Vol. 3. 5.Google Scholar
Jonas Ruesch, Manuel Lopes, Alexandre Bernardino, Jonas Hornstein, Jose Santos-Victor, and Rolf Pfeifer. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In International Conference on Robotics and Automation. 962--967. DOI: http://dx.doi.org/10.1109/ROBOT.2008.4543329Google ScholarCross Ref
Lars Schillingmann and Yukie Nagai. 2015. Yet another gaze detector: An embodied calibration free system for the iCub robot. In International Conference on Humanoid Robots. 8--13. DOI: http://dx.doi.org/10.1109/HUMANOIDS.2015.7363515Google ScholarDigital Library
Marc Schröder and Jürgen Trouvain. 2003. The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. International Journal of Speech Technology 6, 4 (2003), 365--377. DOI: http://dx.doi.org/10.1023/A:1025708916924Google ScholarCross Ref
Gabriel Skantze, Martin Johansson, and Jonas Beskow. 2015. Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects. In International Conference on Multimodal Interaction. 67--74. DOI: http://dx.doi.org/10.1145/2818346.2820749 Google ScholarDigital Library
David Traum. 2004. Issues in Multiparty Dialogues. In Workshop on Agent Communication Languages. 201--211. DOI:http://dx.doi.org/10.1007/978-3-540-24608-4_12Google Scholar
Michael L Walters, Kerstin Dautenhahn, Sarah N Woods, Kheng Lee Koay, R Te Boekhorst, and David Lee. 2006. Exploratory studies on social spaces between humans and a mechanical-looking robot. Connection Science 18, 4 (2006), 429--439. DOI: http://dx.doi.org/10.1080/09540090600879513Google ScholarCross Ref
Johannes Wienke and Sebastian Wrede. 2011. A Middleware for Collaborative Research in Experimental Robotics. In IEEE/SICE International Symposium on System Integration (SII). 1183--1190. DOI: http://dx.doi.org/10.1109/SII.2011.6147617Google ScholarCross Ref
Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. Elan: a professional framework for multimodality research. In Language Resources and Evaluation Conference, Vol. 2006. 5th.Google Scholar

Index Terms

Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Robotic planning
    2. Planning and scheduling
      1. Robotic planning

Recommendations

Lexical Entrainment in Multi-party Human–Robot Interaction
Social Robotics
Abstract
This paper reports lexical entrainment in a multi-party human–robot interaction, wherein one robot and two humans serve as participants. Humans tend to use the same terms as their interlocutors while making conversation. This phenomenon is called ...
Read More
From vocal to multimodal dialogue management
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Multimodal, speech-enabled systems pose different research problems when compared to unimodal, voice-only dialogue systems. One of the important issues is the question of how a multimodal interface should look like in order to make the multimodal ...
Read More
Wizard of Oz experiments and companion dialogues
BCS '10: Proceedings of the 24th BCS Interaction Specialist Group Conference

Novel speech systems such as the conversational agents being developed by the Companions Project (www.companions-project.org) can be simulated using the Wizard of Oz methodology. In this approach technologies that are not yet ready for testing by people ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HAI '16: Proceedings of the Fourth International Conference on Human Agent Interaction
October 2016
414 pages
ISBN:9781450345088
DOI:10.1145/2974804
General Chairs:
Wei Yun Yau
A*STAR Institute for Infocomm Research, Singapore
,
Takashi Omori
Tamagawa University, Japan
,
Program Chairs:
Giorgio Metta
Italian Institute of Technology, Italy
,
Hirotaka Osawa
University of Tsukuba, Japan
,
Shengdong Zhao
National University of Singapore, Singapore
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
addressee
attention management
autonomous robot
dialogue systems
interaction
multi-modal
multi-party
speaker
Qualifiers
- research-article
Conference

Acceptance Rates
HAI '16 Paper Acceptance Rate29of182submissions,16%Overall Acceptance Rate121of404submissions,30%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 281
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees

HAI '16: Proceedings of the Fourth International Conference on Human Agent Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lexical Entrainment in Multi-party Human–Robot Interaction

From vocal to multimodal dialogue management

Wizard of Oz experiments and companion dialogues