skip to main content
10.1145/2974804.2974823acmotherconferencesArticle/Chapter ViewAbstractPublication PageshaiConference Proceedingsconference-collections
research-article

Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees

Authors Info & Claims
Published:04 October 2016Publication History

ABSTRACT

In this paper, we present our humanoid robot "Meka", participating in a multi party human robot dialogue scenario. Active arbitration of the robot's attention based on multi-modal stimuli is utilised to observe persons which are outside of the robots field of view. We investigate the impact of this attention management and addressee recognition on the robot's capability to distinguish utterances directed at it from communication between humans. Based on the results of a user study, we show that mutual gaze at the end of an utterance, as a means of yielding a turn, is a substantial cue for addressee recognition. Verification of a speaker through the detection of lip movements can be used to further increase precision. Furthermore, we show that even a rather simplistic fusion of gaze and lip movement cues allows a considerable enhancement in addressee estimation, and can be altered to adapt to the requirements of a particular scenario.

References

  1. Timo Baumann and David Schlangen. 2012. The InproTK 2012 Release. In NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data. 29--32. http://nbn-resolving.de/urn:nbn:de:0070-pub-25145558 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dan Bohus and Eric Horvitz. 2010. Facilitating multiparty dialog with gaze, gesture, and speech. In Ieeernational Conference on Multimodal Interfaces, Workshop on Machine Learning for Multimodal Interaction. 1. DOI: http://dx.doi.org/10.1145/1891903.1891910 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dan Bohus and Eric Horvitz. 2011. Multiparty turn taking in situated dialog: Study, lessons, and directions. In Special Interest Group on Discourse and Dialogue. http://dl.acm.org/citation.cfm?id=2132903 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cynthia Breazeal. 2003. Toward sociable robots. Robotics and Autonomous Systems 42, 3 (2003), 167--175.Google ScholarGoogle ScholarCross RefCross Ref
  5. Cynthia Breazeal and Brian Scassellati. 1999. A Context-dependent Attention System for a Social Robot. In International Joint Conference on Artificial Intelligence. 1146--1151. http://dl.acm.org/citation.cfm?id=1624312.1624382 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Allison Bruce, Illah Nourbakhsh, and Reid Simmons. 2002. The role of expressiveness and attention in human-robot interaction. In International Conference on Robotics and Automation, Vol. 4. 4138--4142. DOI: http://dx.doi.org/10.1109/ROBOT.2002.1014396Google ScholarGoogle ScholarCross RefCross Ref
  7. Birte Carlmeyer, David Schlangen, and Britta Wrede. 2014. Towards Closed Feedback Loops in HRI: Integrating InproTK and PaMini. In Workshop on Multimodal, Multi-Party, Real-World Human-Robot Interaction (MMRWHRI '14). 1--6. DOI: http://dx.doi.org/10.1145/2666499.2666500 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kerstin Dautenhahn. 2007. Socially intelligent robots: dimensions of human-robot interaction. Philosophical Transactions of the Royal Society of London B: Biological Sciences 362, 1480 (2007), 679--704. DOI: http://dx.doi.org/10.1098/rstb.2006.2004Google ScholarGoogle ScholarCross RefCross Ref
  9. Boris De Ruyter, Privender Saini, Panos Markopoulos, and Albert Van Breemen. 2005. Assessing the Effects of Building Social Intelligence in a Robotic Interface for the Home. Interacting with Computers 17, 5 (2005), 522--541. DOI:http://dx.doi.org/10.1016/j.intcom.2005.03.003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. 2003. A survey of socially interactive robots. Robotics and Autonomous Systems 42, 3 (2003), 143--166. DOI:http://dx.doi.org/10.1016/S0921--8890(02)00372-XGoogle ScholarGoogle ScholarCross RefCross Ref
  11. Afina S. Glas, Jeroen G. Lijmer, Martin H. Prins, Gouke J. Bonsel, and Patrick M.M. Bossuyt. 2003. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology 56, 11 (2003), 1129--1135. DOI: http://dx.doi.org/10.1016/S0895--4356(03)00177-XGoogle ScholarGoogle ScholarCross RefCross Ref
  12. Marcel Heerink, Ben Kröse, Vanessa Evers, BJ Wielinga, and others. 2008. The influence of social presence on acceptance of a companion robot by older people. Journal of Physical Agents 2, 2 (2008), 33--40. DOI: http://dx.doi.org/10.14198/JoPha.2008.2.2.05Google ScholarGoogle Scholar
  13. Patrick Holthaus. 2014. Approaching Human-Like Spatial Awareness in Social Robotics - An Investigation of Spatial Interaction Strategies with a Receptionist Robot. Ph.D. Dissertation. Bielefeld University.Google ScholarGoogle Scholar
  14. Patrick Holthaus, Christian Leichsenring, Jasmin Bernotat, Viktor Richter, Marian Pohling, Birte Carlmeyer, Norman Köster, Sebastian Meyer zu Borgsen, René Zorn, Birte Schiffhauer, Kai Frederic Engelmann, Florian Lier, Simon Schulz, Philipp Cimiano, Friederike Eyssel, Thomas Hermann, Franz Kummert, David Schlangen, Sven Wachsmuth, Petra Wagner, Britta Wrede, and Sebastian Wrede. 2016. How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment. In International Conference on Language Resources and Evaluation (23--28).Google ScholarGoogle Scholar
  15. Patrick Holthaus, Karola Pitsch, and Sven Wachsmuth. 2011. How Can I Help? International Journal of Social Robotics 3, 4 (11 2011), 383--393. DOI: http://dx.doi.org/10.1007/s12369-011-0108--9Google ScholarGoogle Scholar
  16. Dinesh Babu Jayagopi and Jean-Marc Odobez. 2013. Given that, should i respond? Contextual addressee estimation in multi-party human-robot interactions. In Human-Robot Interaction. 147--148. DOI: http://dx.doi.org/10.1109/HRI.2013.6483544 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Martin Johansson and Gabriel Skantze. 2015. Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction. In Special Interest Group on Discourse and Dialogue. 305--314.Google ScholarGoogle Scholar
  18. Martin Johansson, Gabriel Skantze, and Joakim Gustafson. 2014. Comparison of Human-Human and Human-Robot Turn-Taking Behaviour in Multiparty Situated Interaction. In Workshop on Understanding and Modeling Multiparty, Multimodal Interactions. 21--26. DOI:http://dx.doi.org/10.1145/2666242.2666249 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874. DOI: http://dx.doi.org/10.1109/CVPR.2014.241 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj, Rita Singh, William Walker, Manfred Warmuth, and Peter Wolf. 2003. The Carnegie Mellon University SPHINX-4 speech recognition system. In IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1. Citeseer, 2--5.Google ScholarGoogle Scholar
  21. Sebastian Lang, Marcus Kleinehagenbrock, Sascha Hohenner, Jannik Fritsch, Gernot a Fink, and Gerhard Sagerer. 2003a. Providing the basis for human-robot-interaction. In International Conference on Multimodal Interfaces. 28. DOI: http://dx.doi.org/10.1145/958432.958441 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sebastian Lang, Marcus Kleinehagenbrock, Sascha Hohenner, Jannik Fritsch, Gernot A. Fink, and Gerhard Sagerer. 2003b. Providing the Basis for Human-Robot-Interaction: A Multi-Modal Attention System for a Mobile Robot. In International Conference on Multimodal Interfaces. DOI: http://dx.doi.org/10.1145/958432.958441 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liyuan Li, Qianli Xu, and Yeow Kee Tan. 2012. Attention-based addressee selection for service and social robots to interact with multiple persons. In Proceedings of the Workshop at SIGGRAPH WASA, Vol. 1. 131. DOI: http://dx.doi.org/10.1145/2425296.2425319 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Bilge Mutlu, Toshiyuki Shiwa, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2009. Footing in human-robot conversations. In Human Robot Interaction, Vol. 2. 61. DOI: http://dx.doi.org/10.1145/1514095.1514109 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Julia Peltason and Britta Wrede. 2010. Pamini: A Framework for Assembling Mixed-Initiative Human-Robot Interaction from Generic Interaction Patterns. In Special Interest Group on Discourse and Dialogue (SIGDIAL '10). 229--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gill A Pratt and Matthew M Williamson. 1995. Series elastic actuators. In Human Robot Interaction and Cooperative Robots, Vol. 1. IEEE, 399--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: an open-source Robot Operating System. In International Conference on Robotics and Automation Workshop on Open Source Software, Vol. 3. 5.Google ScholarGoogle Scholar
  28. Jonas Ruesch, Manuel Lopes, Alexandre Bernardino, Jonas Hornstein, Jose Santos-Victor, and Rolf Pfeifer. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In International Conference on Robotics and Automation. 962--967. DOI: http://dx.doi.org/10.1109/ROBOT.2008.4543329Google ScholarGoogle ScholarCross RefCross Ref
  29. Lars Schillingmann and Yukie Nagai. 2015. Yet another gaze detector: An embodied calibration free system for the iCub robot. In International Conference on Humanoid Robots. 8--13. DOI: http://dx.doi.org/10.1109/HUMANOIDS.2015.7363515Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Marc Schröder and Jürgen Trouvain. 2003. The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. International Journal of Speech Technology 6, 4 (2003), 365--377. DOI: http://dx.doi.org/10.1023/A:1025708916924Google ScholarGoogle ScholarCross RefCross Ref
  31. Gabriel Skantze, Martin Johansson, and Jonas Beskow. 2015. Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects. In International Conference on Multimodal Interaction. 67--74. DOI: http://dx.doi.org/10.1145/2818346.2820749 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. David Traum. 2004. Issues in Multiparty Dialogues. In Workshop on Agent Communication Languages. 201--211. DOI:http://dx.doi.org/10.1007/978-3-540-24608-4_12Google ScholarGoogle Scholar
  33. Michael L Walters, Kerstin Dautenhahn, Sarah N Woods, Kheng Lee Koay, R Te Boekhorst, and David Lee. 2006. Exploratory studies on social spaces between humans and a mechanical-looking robot. Connection Science 18, 4 (2006), 429--439. DOI: http://dx.doi.org/10.1080/09540090600879513Google ScholarGoogle ScholarCross RefCross Ref
  34. Johannes Wienke and Sebastian Wrede. 2011. A Middleware for Collaborative Research in Experimental Robotics. In IEEE/SICE International Symposium on System Integration (SII). 1183--1190. DOI: http://dx.doi.org/10.1109/SII.2011.6147617Google ScholarGoogle ScholarCross RefCross Ref
  35. Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. Elan: a professional framework for multimodality research. In Language Resources and Evaluation Conference, Vol. 2006. 5th.Google ScholarGoogle Scholar

Index Terms

  1. Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          HAI '16: Proceedings of the Fourth International Conference on Human Agent Interaction
          October 2016
          414 pages
          ISBN:9781450345088
          DOI:10.1145/2974804

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 October 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          HAI '16 Paper Acceptance Rate29of182submissions,16%Overall Acceptance Rate121of404submissions,30%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader