ABSTRACT
This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot’s human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.
- Alaa Abd-Alrazaq, Zeineb Safi, Mohannad Alajlani, Jim Warren, Mowafa Househ, Kerstin Denecke, 2020. Technical metrics used to evaluate health care chatbots: scoping review. Journal of medical Internet research 22, 6 (2020).Google ScholarCross Ref
- Jacky Casas, Marc-Olivier Tricot, Omar Abou Khaled, Elena Mugellini, and Philippe Cudré-Mauroux. 2020. Trends & methods in chatbot evaluation. In Companion Publication of ICMI. 280–286.Google Scholar
- Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak. 2021. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review 54 (2021), 755–810.Google ScholarDigital Library
- David DeVault, Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, Gale Lucas, Stacy Marsella, Fabrizio Morbini, Angela Nazarian, Stefan Scherer, Giota Stratou, Apar Suri, David Traum, Rachel Wood, Yuyu Xu, Albert Rizzo, and Louis P. Morency. 2014. SimSensei Kiosk: A Virtual Human Interviewer for Healthcare Decision Support. In AAMAS. 1061–1068.Google Scholar
- Jens Edlund, Joakim Gustafson, Mattias Heldner, and Anna Hjalmarsson. 2008. Towards human-like spoken dialogue systems. Speech Communication 50, 8 (2008), 630–645.Google ScholarDigital Library
- Sangdo Han, Kyusong Lee, Donghyeon Lee, and Gary Geunbae Lee. 2013. Counseling Dialog System with 5W1H Extraction. In SIGDIAL.Google Scholar
- Takamasa Iio, Satoru Satake, Takayuki Kanda, Kotaro Hayashi, Florent Ferreri, and Norihiro Hagita. 2020. Human-like guide robot that proactively explains exhibits. International Journal of Social Robotics 12 (2020), 549–566.Google ScholarCross Ref
- Koji Inoue, Kohei Hara, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, Katsuya Takanashi, and Tatsuya Kawahara. 2020. Job interviewer android with elaborate follow-up question generation. In ICMI. 324–332.Google Scholar
- Koji Inoue, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, Katsuya Takanashi, and Tatsuya Kawahara. 2020. An attentive listening system with android ERICA: Comparison of autonomous and WOZ interactions. In SIGDIAL. 118–127.Google Scholar
- Koji Inoue, Pierrick Milhorat, Divesh Lala, Tianyu Zhao, and Tatsuya Kawahara. 2016. Talking with ERICA, an autonomous android. In SIGDIAL. 212–215.Google Scholar
- Michael Johnston, Patrick Ehlen, Frederick G. Conrad, Michael F. Schober, Christopher Antoun, Stefanie Fail, Andrew Hupp, Lucas Vickers, Huiying Yan, and Chan Zhang. 2013. Spoken Dialog Systems for Automated Survey Interviewing. In SIGDIAL. 329–333.Google Scholar
- Tatsuya Kawahara. 2018. Spoken dialogue system for a human-like conversational robot ERICA. In IWSDS. 65–75.Google Scholar
- Liliana Laranjo, Adam G Dunn, Huong Ly Tong, Ahmet Baki Kocaballi, Jessica Chen, Rabia Bashir, Didi Surian, Blanca Gallego, Farah Magrabi, Annie YS Lau, 2018. Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association 25, 9 (2018), 1248–1258.Google ScholarCross Ref
- Ting-En Lin, Yuchuan Wu, Fei Huang, Luo Si, Jian Sun, and Yongbin Li. 2022. Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems. In SIGKDD. 3299–3308.Google Scholar
- Raveesh Meena, José Lopes, Gabriel Skantze, and Joakim Gustafson. 2015. Automatic Detection of Miscommunication in Spoken Dialogue Systems. In SIGDIAL. 354–363.Google Scholar
- Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, and Erik Cambria. 2023. Recent advances in deep learning based dialogue systems: A systematic survey. Artificial intelligence review 56 (2023), 3055–3155.Google Scholar
- Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Kenneth Funes Mora, Jean-Marc Odobez, and Joakim Gustafson. 2021. Towards an engagement-aware attentive artificial listener for multi-party interactions. Frontiers in Robotics and AI 8 (2021).Google Scholar
- Samira Rasouli, Garima Gupta, Elizabeth Nilsen, and Kerstin Dautenhahn. 2022. Potential applications of social robots in robot-assisted interventions for social anxiety. International Journal of Social Robotics 14 (2022), 1–32.Google ScholarCross Ref
- Marc Schröder, Elisabetta Bevacqua, Roddy Cowie, Florian Eyben, Hatice Gunes, Dirk Heylen, Mark ter Maat, Gary McKeown, Sathish Pammi, Maja Pantic, Catherine Pelachaud, Björn Schuller, Etienne de Sevin, Michel Valstar, and Martin Wöllmer. 2015. Building autonomous sensitive artificial listeners. In ACII. 456–462.Google Scholar
- Arielle AJ Scoglio, Erin D Reilly, Jay A Gorman, and Charles E Drebing. 2019. Use of social robots in mental health and well-being research: Systematic review. Journal of medical Internet research 21, 7 (2019).Google ScholarCross Ref
- Doreen Ying Ying Sim and Chu Kiong Loo. 2015. Extensive assessment and evaluation methodologies on assistive social robots for modelling human–robot interaction – A review. Information Sciences 301 (2015), 305–344.Google ScholarDigital Library
- William Swartout, David Traum, Ron Artstein, Dan Noren, Paul Debevec, Kerry Bronnenkant, Josh Williams, Anton Leuski, Shrikanth Narayanan, Diane Piepol, Chad Lane, Jacquelyn Moriel, Priti Aggarwal, Matt Liewer, Jen-Yuan Chiang, Jillian Gerten, Selina Chu, and Kyle White. 2010. Ada and Grace: Toward realistic and engaging virtual museum guides. In IVA. 286–300.Google Scholar
- Stefan Ultes and Wolfgang Maier. 2021. User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning. Dialogue & Discourse 12, 2 (2021), 81–114.Google ScholarCross Ref
- Zhou Yu, Vikram Ramanarayanan, Patrick Lange, and David Suendermann-Oeft. 2017. An open-source dialog system with real-time engagement tracking for job interview training applications. In IWSDS.Google Scholar
- Chen Zhang, João Sedoc, Luis Fernando D’Haro, Rafael Banchs, and Alexander Rudnicky. 2021. Automatic evaluation and moderation of open-domain dialogue systems. arXiv preprint, 2111.02110 (2021).Google Scholar
Index Terms
- Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors
Recommendations
Designing effective multimodal behaviors for robots: a data-driven perspective
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionRobots need to effectively use multimodal behaviors, including speech, gaze, and gestures, in support of their users to achieve intended interaction goals, such as improved task performance. This proposed research concerns designing effective multimodal ...
Multimodal User Satisfaction Recognition for Non-task Oriented Dialogue Systems
ICMI '21: Proceedings of the 2021 International Conference on Multimodal InteractionMultimodal dialogue systems (MDSs) are needed to allow users to converse with virtual agents that use natural language by sensing the multimodal behavior of users. One crucial step in the development of an MDS is measuring how well the dialogue system ...
Generative Model of Agent’s Behaviors in Human-Agent Interaction
ICMI '19: 2019 International Conference on Multimodal InteractionA social interaction implies a social exchange between two or more persons, where they adapt and adjust their behaviors in response to their interaction partners. With the growing interest in human-agent interactions, it is desirable to make these ...
Comments