ABSTRACT
In this paper, we present our humanoid robot "Meka", participating in a multi party human robot dialogue scenario. Active arbitration of the robot's attention based on multi-modal stimuli is utilised to observe persons which are outside of the robots field of view. We investigate the impact of this attention management and addressee recognition on the robot's capability to distinguish utterances directed at it from communication between humans. Based on the results of a user study, we show that mutual gaze at the end of an utterance, as a means of yielding a turn, is a substantial cue for addressee recognition. Verification of a speaker through the detection of lip movements can be used to further increase precision. Furthermore, we show that even a rather simplistic fusion of gaze and lip movement cues allows a considerable enhancement in addressee estimation, and can be altered to adapt to the requirements of a particular scenario.
- Timo Baumann and David Schlangen. 2012. The InproTK 2012 Release. In NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data. 29--32. http://nbn-resolving.de/urn:nbn:de:0070-pub-25145558 Google ScholarDigital Library
- Dan Bohus and Eric Horvitz. 2010. Facilitating multiparty dialog with gaze, gesture, and speech. In Ieeernational Conference on Multimodal Interfaces, Workshop on Machine Learning for Multimodal Interaction. 1. DOI: http://dx.doi.org/10.1145/1891903.1891910 Google ScholarDigital Library
- Dan Bohus and Eric Horvitz. 2011. Multiparty turn taking in situated dialog: Study, lessons, and directions. In Special Interest Group on Discourse and Dialogue. http://dl.acm.org/citation.cfm?id=2132903 Google ScholarDigital Library
- Cynthia Breazeal. 2003. Toward sociable robots. Robotics and Autonomous Systems 42, 3 (2003), 167--175.Google ScholarCross Ref
- Cynthia Breazeal and Brian Scassellati. 1999. A Context-dependent Attention System for a Social Robot. In International Joint Conference on Artificial Intelligence. 1146--1151. http://dl.acm.org/citation.cfm?id=1624312.1624382 Google ScholarDigital Library
- Allison Bruce, Illah Nourbakhsh, and Reid Simmons. 2002. The role of expressiveness and attention in human-robot interaction. In International Conference on Robotics and Automation, Vol. 4. 4138--4142. DOI: http://dx.doi.org/10.1109/ROBOT.2002.1014396Google ScholarCross Ref
- Birte Carlmeyer, David Schlangen, and Britta Wrede. 2014. Towards Closed Feedback Loops in HRI: Integrating InproTK and PaMini. In Workshop on Multimodal, Multi-Party, Real-World Human-Robot Interaction (MMRWHRI '14). 1--6. DOI: http://dx.doi.org/10.1145/2666499.2666500 Google ScholarDigital Library
- Kerstin Dautenhahn. 2007. Socially intelligent robots: dimensions of human-robot interaction. Philosophical Transactions of the Royal Society of London B: Biological Sciences 362, 1480 (2007), 679--704. DOI: http://dx.doi.org/10.1098/rstb.2006.2004Google ScholarCross Ref
- Boris De Ruyter, Privender Saini, Panos Markopoulos, and Albert Van Breemen. 2005. Assessing the Effects of Building Social Intelligence in a Robotic Interface for the Home. Interacting with Computers 17, 5 (2005), 522--541. DOI:http://dx.doi.org/10.1016/j.intcom.2005.03.003 Google ScholarDigital Library
- Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. 2003. A survey of socially interactive robots. Robotics and Autonomous Systems 42, 3 (2003), 143--166. DOI:http://dx.doi.org/10.1016/S0921--8890(02)00372-XGoogle ScholarCross Ref
- Afina S. Glas, Jeroen G. Lijmer, Martin H. Prins, Gouke J. Bonsel, and Patrick M.M. Bossuyt. 2003. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology 56, 11 (2003), 1129--1135. DOI: http://dx.doi.org/10.1016/S0895--4356(03)00177-XGoogle ScholarCross Ref
- Marcel Heerink, Ben Kröse, Vanessa Evers, BJ Wielinga, and others. 2008. The influence of social presence on acceptance of a companion robot by older people. Journal of Physical Agents 2, 2 (2008), 33--40. DOI: http://dx.doi.org/10.14198/JoPha.2008.2.2.05Google Scholar
- Patrick Holthaus. 2014. Approaching Human-Like Spatial Awareness in Social Robotics - An Investigation of Spatial Interaction Strategies with a Receptionist Robot. Ph.D. Dissertation. Bielefeld University.Google Scholar
- Patrick Holthaus, Christian Leichsenring, Jasmin Bernotat, Viktor Richter, Marian Pohling, Birte Carlmeyer, Norman Köster, Sebastian Meyer zu Borgsen, René Zorn, Birte Schiffhauer, Kai Frederic Engelmann, Florian Lier, Simon Schulz, Philipp Cimiano, Friederike Eyssel, Thomas Hermann, Franz Kummert, David Schlangen, Sven Wachsmuth, Petra Wagner, Britta Wrede, and Sebastian Wrede. 2016. How to Address Smart Homes with a Social Robot? A Multi-modal Corpus of User Interactions with an Intelligent Environment. In International Conference on Language Resources and Evaluation (23--28).Google Scholar
- Patrick Holthaus, Karola Pitsch, and Sven Wachsmuth. 2011. How Can I Help? International Journal of Social Robotics 3, 4 (11 2011), 383--393. DOI: http://dx.doi.org/10.1007/s12369-011-0108--9Google Scholar
- Dinesh Babu Jayagopi and Jean-Marc Odobez. 2013. Given that, should i respond? Contextual addressee estimation in multi-party human-robot interactions. In Human-Robot Interaction. 147--148. DOI: http://dx.doi.org/10.1109/HRI.2013.6483544 Google ScholarDigital Library
- Martin Johansson and Gabriel Skantze. 2015. Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction. In Special Interest Group on Discourse and Dialogue. 305--314.Google Scholar
- Martin Johansson, Gabriel Skantze, and Joakim Gustafson. 2014. Comparison of Human-Human and Human-Robot Turn-Taking Behaviour in Multiparty Situated Interaction. In Workshop on Understanding and Modeling Multiparty, Multimodal Interactions. 21--26. DOI:http://dx.doi.org/10.1145/2666242.2666249 Google ScholarDigital Library
- Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874. DOI: http://dx.doi.org/10.1109/CVPR.2014.241 Google ScholarDigital Library
- Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj, Rita Singh, William Walker, Manfred Warmuth, and Peter Wolf. 2003. The Carnegie Mellon University SPHINX-4 speech recognition system. In IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1. Citeseer, 2--5.Google Scholar
- Sebastian Lang, Marcus Kleinehagenbrock, Sascha Hohenner, Jannik Fritsch, Gernot a Fink, and Gerhard Sagerer. 2003a. Providing the basis for human-robot-interaction. In International Conference on Multimodal Interfaces. 28. DOI: http://dx.doi.org/10.1145/958432.958441 Google ScholarDigital Library
- Sebastian Lang, Marcus Kleinehagenbrock, Sascha Hohenner, Jannik Fritsch, Gernot A. Fink, and Gerhard Sagerer. 2003b. Providing the Basis for Human-Robot-Interaction: A Multi-Modal Attention System for a Mobile Robot. In International Conference on Multimodal Interfaces. DOI: http://dx.doi.org/10.1145/958432.958441 Google ScholarDigital Library
- Liyuan Li, Qianli Xu, and Yeow Kee Tan. 2012. Attention-based addressee selection for service and social robots to interact with multiple persons. In Proceedings of the Workshop at SIGGRAPH WASA, Vol. 1. 131. DOI: http://dx.doi.org/10.1145/2425296.2425319 Google ScholarDigital Library
- Bilge Mutlu, Toshiyuki Shiwa, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2009. Footing in human-robot conversations. In Human Robot Interaction, Vol. 2. 61. DOI: http://dx.doi.org/10.1145/1514095.1514109 Google ScholarDigital Library
- Julia Peltason and Britta Wrede. 2010. Pamini: A Framework for Assembling Mixed-Initiative Human-Robot Interaction from Generic Interaction Patterns. In Special Interest Group on Discourse and Dialogue (SIGDIAL '10). 229--232. Google ScholarDigital Library
- Gill A Pratt and Matthew M Williamson. 1995. Series elastic actuators. In Human Robot Interaction and Cooperative Robots, Vol. 1. IEEE, 399--406. Google ScholarDigital Library
- Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: an open-source Robot Operating System. In International Conference on Robotics and Automation Workshop on Open Source Software, Vol. 3. 5.Google Scholar
- Jonas Ruesch, Manuel Lopes, Alexandre Bernardino, Jonas Hornstein, Jose Santos-Victor, and Rolf Pfeifer. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In International Conference on Robotics and Automation. 962--967. DOI: http://dx.doi.org/10.1109/ROBOT.2008.4543329Google ScholarCross Ref
- Lars Schillingmann and Yukie Nagai. 2015. Yet another gaze detector: An embodied calibration free system for the iCub robot. In International Conference on Humanoid Robots. 8--13. DOI: http://dx.doi.org/10.1109/HUMANOIDS.2015.7363515Google ScholarDigital Library
- Marc Schröder and Jürgen Trouvain. 2003. The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. International Journal of Speech Technology 6, 4 (2003), 365--377. DOI: http://dx.doi.org/10.1023/A:1025708916924Google ScholarCross Ref
- Gabriel Skantze, Martin Johansson, and Jonas Beskow. 2015. Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects. In International Conference on Multimodal Interaction. 67--74. DOI: http://dx.doi.org/10.1145/2818346.2820749 Google ScholarDigital Library
- David Traum. 2004. Issues in Multiparty Dialogues. In Workshop on Agent Communication Languages. 201--211. DOI:http://dx.doi.org/10.1007/978-3-540-24608-4_12Google Scholar
- Michael L Walters, Kerstin Dautenhahn, Sarah N Woods, Kheng Lee Koay, R Te Boekhorst, and David Lee. 2006. Exploratory studies on social spaces between humans and a mechanical-looking robot. Connection Science 18, 4 (2006), 429--439. DOI: http://dx.doi.org/10.1080/09540090600879513Google ScholarCross Ref
- Johannes Wienke and Sebastian Wrede. 2011. A Middleware for Collaborative Research in Experimental Robotics. In IEEE/SICE International Symposium on System Integration (SII). 1183--1190. DOI: http://dx.doi.org/10.1109/SII.2011.6147617Google ScholarCross Ref
- Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. Elan: a professional framework for multimodality research. In Language Resources and Evaluation Conference, Vol. 2006. 5th.Google Scholar
Index Terms
- Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees
Recommendations
Lexical Entrainment in Multi-party Human–Robot Interaction
Social RoboticsAbstractThis paper reports lexical entrainment in a multi-party human–robot interaction, wherein one robot and two humans serve as participants. Humans tend to use the same terms as their interlocutors while making conversation. This phenomenon is called ...
From vocal to multimodal dialogue management
ICMI '06: Proceedings of the 8th international conference on Multimodal interfacesMultimodal, speech-enabled systems pose different research problems when compared to unimodal, voice-only dialogue systems. One of the important issues is the question of how a multimodal interface should look like in order to make the multimodal ...
Wizard of Oz experiments and companion dialogues
BCS '10: Proceedings of the 24th BCS Interaction Specialist Group ConferenceNovel speech systems such as the conversational agents being developed by the Companions Project (www.companions-project.org) can be simulated using the Wizard of Oz methodology. In this approach technologies that are not yet ready for testing by people ...
Comments