Interactive Visual Task Learning for Robots

Authors

  • Weiwei Gu Arizona State University
  • Anant Sah Arizona State University
  • Nakul Gopalan Arizona State University

DOI:

https://doi.org/10.1609/aaai.v38i9.28896

Keywords:

ROB: Human-Robot Interaction, CV: Visual Reasoning & Symbolic Representations, NLP: Language Grounding & Multi-modal NLP, ML: Neuro-Symbolic Learning

Abstract

We present a framework for robots to learn novel visual concepts and tasks via in-situ linguistic interactions with human users. Previous approaches have either used large pre-trained visual models to infer novel objects zero-shot, or added novel concepts along with their attributes and representations to a concept hierarchy. We extend the approaches that focus on learning visual concept hierarchies by enabling them to learn novel concepts and solve unseen robotics tasks with them. To enable a visual concept learner to solve robotics tasks one-shot, we developed two distinct techniques. Firstly, we propose a novel approach, Hi-Viscont(HIerarchical VISual CONcept learner for Task), which augments information of a novel concept to its parent nodes within a concept hierarchy. This information propagation allows all concepts in a hierarchy to update as novel concepts are taught in a continual learning setting. Secondly, we represent a visual task as a scene graph with language annotations, allowing us to create novel permutations of a demonstrated task zero-shot in-situ. We present two sets of results. Firstly, we compare Hi-Viscont with the baseline model (FALCON) on visual question answering(VQA) in three domains. While being comparable to the baseline model on leaf level concepts, Hi-Viscont achieves an improvement of over 9% on non-leaf concepts on average. Secondly, we conduct a human-subjects experiment where users teach our robot visual tasks in-situ. We compare our model’s performance against the baseline FALCON model. Our framework achieves 33% improvements in success rate metric, and 19% improvements in the object level accuracy compared to the baseline model. With both of these results we demonstrate the ability of our model to learn tasks and concepts in a continual learning setting on the robot.

Published

2024-03-24

How to Cite

Gu, W., Sah, A., & Gopalan, N. (2024). Interactive Visual Task Learning for Robots. Proceedings of the AAAI Conference on Artificial Intelligence, 38(9), 10297-10305. https://doi.org/10.1609/aaai.v38i9.28896

Issue

Section

Intelligent Robots (ROB)