Object–Action Complexes: Grounded abstractions of sensory–motor processes

https://doi.org/10.1016/j.robot.2011.05.009Get rights and content

Abstract

This paper formalises Object–Action Complexes (OACs) as a basis for symbolic representations of sensory–motor experience and behaviours. OACs are designed to capture the interaction between objects and associated actions in artificial cognitive systems. This paper gives a formal definition of OACs, provides examples of their use for autonomous cognitive robots, and enumerates a number of critical learning problems in terms of OACs.

Highlights

► A formal definition of Object–Action Complexes (OACs) is given. ► OACs capture the interaction between objects and associated actions. ► OACs provide a framework for grounded symbolic representations of behaviours. ► OACs enumerate a number of critical learning problems in artificial cognitive systems. ► OACs realise a paradigm of ‘ongoing learning at all levels at all times’.

Introduction

Autonomous cognitive robots must be able to interact with the world and reason about the results of those interactions, a problem that presents a number of representational challenges. On the one hand, physical interactions are inherently continuous, noisy, and require feedback (e.g., consider the problem of moving forward by 42.8 cm or until a sensor indicates an obstacle). On the other hand, the knowledge needed for reasoning about high-level objectives and plans is more conveniently expressed in a symbolic form, as predictions about discrete state changes (e.g., going into the kitchen enables retrieving the coffee pot). Bridging the gap between low-level control knowledge and high-level abstract reasoning has been a fundamental concern of autonomous robotics [1], [2], [3], [4]. However, the task of providing autonomous robots with the ability to build symbolic representations of continuous sensory–motor experience de novo has received much less attention, even though this capability is crucial if robots are ever to perform at levels comparable to humans.

To address this need, this paper proposes a formal entity called an Object–Action Complex (OAC, pronounced “oak”) as the basis for symbolic representations of sensory–motor experience. The OAC formalism is designed to achieve two ends. First, OACs provide a computational account that brings together several existing concepts from developmental psychology, behavioural and cognitive robotics, and artificial intelligence. Second, by formalising these ideas together in a shared computational model, OACs allow us to enumerate and clarify a number of learning problems faced by embodied agents. Some of these learning problems are known and have been well studied in the literature, while others have received little or no attention.

OACs are designed to formalise adaptive and predictive behaviours at all levels of a cognitive processing hierarchy. In particular, the formalism ensures that OACs are grounded in real-world experiences: all learning and refinement of OACs will be based on statistics gained from an agent’s ongoing interaction with the world. To relate OACs operating at different processing levels, we will also allow OACs to be defined as combinations of other OACs in a hierarchy, in order to produce more complex behaviours. As a result, this formalism enables consistent, repeatable hierarchies of behaviour to be learnt, based on statistics gained during real-world interaction, that can also be used for probabilistic reasoning and planning. It also provides a framework that allows the OAC designer to focus on those design ideas that are essential for developing cognitive agents.

The goal of the OAC formalism is to provide a unifying framework for representing diverse interactions, from low-level reactive behaviour to high-level deliberative planning. To this end, we will build our computational models on existing assumptions and ideas that have been shown to be productive in previous research, in a number of different fields. In particular, we note six design ideas (DI) that have helped motivate our formalism:

  • DI-1

    Attributes: Actions, objects, and interactions must be formalised over an appropriate attribute space, defined as a collection of properties with sets of associated values. An agent’s expectations and predictions (see [DI-2]) as to how the world will change if an action is performed must also be defined over such an attribute space. Different representations may require different attribute spaces, plus a method of mapping between them if they are to be used together.

  • DI-2

    Prediction: A cognitive agent performing an action to achieve some effect must be able to predict how the world will change as a result of this action. That is, it must know which attributes of the world must hold for an action to be possible (which will typically involve reasoning about the presence of objects), which attributes will change when the action is performed, and how those attributes will change.

  • DI-3

    Execution: Many previous efforts to produce fully autonomous robotic agents have been limited by simplifying assumptions about sensor, action, and effector models. We instead take the approach that complete robotic systems must be built with the ability to actually execute actions in the world and evaluate their success. This requires agents to be embodied within physical systems that can interact with the physical world.

  • DI-4

    Verification: In order to improve its performance in a nondeterministic physical world, an agent must be able to evaluate the effectiveness of its actions, by recognising the difference between the states it predicted would arise from its actions, and those states that actually resulted from action execution.

  • DI-5

    Learning: State and action representations are dynamic entities that can be extended by learning in a number of ways: continuous parameters can be optimised, attribute spaces can be refined or extended, new control programs can be added, and prediction functions can be improved. Embodied physical experiences characterised in terms of actions, predictions, and outcomes provide data for learning at all levels of a system.

  • DI-6

    Reliability: It is not sufficient for an agent to merely have a model of the changing world. It must also learn the reliability of this model. Thus, our representations must measure and track the accuracy of their predictions over past executions.

These design ideas are widely accepted in the literature, where they have been discussed by various authors (see, e.g., [5], [6]). For example, a STRIPS-style planning operator [7] can be seen as a prediction function [DI-2] built from action preconditions and effects defined over an attribute space [DI-1]. Significant work has also been done on learning such prediction functions given an appropriate attribute space [8], [9]. The importance of embodiment [DI-3] in real-world cognitive systems has been pointed out by Brooks [1], [2]. Richard Sutton [10] has discussed the necessity of verifying the expected effects of actions [DI-4] to arrive at meaningful knowledge in AI systems. The interplay between execution [DI-3] and verification [DI-5] is associated with the grounding problem [11]. For example, Stoytchev [5] defines grounding as “successful verification”, and discusses the importance of evaluating the success of actions [DI-6] and maintaining “probabilistic estimates of repeatability”. We will discuss the relation of our work to prior work further in Section 3.

In the remainder of the paper we will develop the OAC concept using the above design ideas. In particular, this paper will:

  • formally define OACs for use by autonomous cognitive agents,

  • identify problems associated with learning OACs, and

  • provide examples of OACs and their interaction within embodied systems.

The rest of the paper is organised as follows. Section 2 further motivates this work and provides some basic terminology. Section 3 discusses the relation to prior research. Section 4 provides a formal definition of OACs, based on the above design ideas. Section 5 characterises a number of learning problems in terms of OACs. Section 6 describes how OACs are executed within a physical robot system. Section 7 provides detailed examples of OACs. Finally, Section 8 demonstrates a set of OACs interacting with each other to realise cognitive behaviour, including object grounding and associated grasping affordances, as well as planning with partly grounded entities.

Section snippets

Prerequisites for modelling OACs

To achieve its goals in the real world, an embodied agent must develop predictive models that capture the dynamics of the world and describe how its actions affect the world. Building such models, by interacting with the world, requires facing a number of representational challenges resulting from

  • the continuous nature of the world,

  • the limitations of the agent’s sensors, and

  • the stochasticity of real-world environments.

These problems make the task of efficiently predicting the results of

Relation to other approaches

The OAC concept provides a framework for formalising actions and their effects in artificial cognitive systems, while ensuring that relevant components and prerequisites of action acquisition, refinement, chaining and execution are defined (e.g., the attribute space, a prediction of the change of the attribute space associated with an action together with an estimate of the reliability of this prediction, an execution function, and a means of verifying the outcome of an action). In particular,

Defining OACs

Our OAC definition is split into two parts, (1) a symbolic description consisting of a prediction function [DI-2] defined over an attribute space [DI-1], together with a measure of the reliability of the OAC [DI-6], and (2) an execution specification [DI-3] defining how the OAC is executed by the embodied system and how learning is realised [DI-5] by verification [DI-4].

This separation is intended to capture the difference between the knowledge needed for cause and effect reasoning (represented

Learning OACs

The definition of an OAC as an entity that captures both symbolic and control knowledge for actions gives rise to a number of learning problems that must be considered for OACs to be effective. We note that each of these learning problems can be addressed by recognising that differences can exist between predicted states and actual sensed states. In practice, these problems may require different learning algorithms (e.g., Bayesian, neural network-like, parametric, non-parametric, etc.), and it

Representational congruency and hierarchical execution

Before we introduce hierarchical executions of OACs in Sections 6.2 Towers of OACs, 6.3 One-to-many execution, we begin by discussing a fundamental problem connected to OAC modelling, and a structural property for OACs that was earlier referred to as representational congruency.

Examples of OACs

In this section, we give formal descriptions for a number of OACs. Some of these OACs have already been discussed informally as part of our running examples (Ex-1–Ex-4), while others are new. For each OAC, we provide a definition of its attribute space (S), prediction function (T), success measure (M), and execution specification (E). We also discuss learning in these OACs, and show how they can be embedded within procedural structures to produce more complex behaviour. In Section 8, we will

Interacting OACs

In this section, we describe two examples of OACs interacting in a single architecture. In Section 8.1, we illustrate the grounding of objects and object-related grasp affordances. In Section 8.2, we describe how such grounded representations can be used to execute plans.

Conclusion

This paper introduced Object–Action Complexes (OACs) as a framework for modelling actions and their effects in artificial cognitive systems. We provided a formal definition of OACs and a set of concrete examples, showing how OACs operate and interact with other OACs, and also how certain aspects of an OAC can be learnt.

The importance of OACs lies in their ability to combine the properties of multiple action formalisms, from a diverse range of research fields, to provide a dynamic, learnable,

Acknowledgements

The research leading to these results received funding from the European Union through the Sixth Framework PACO-PLUS project (IST-FP6-IP-027657) and the Seventh Framework XPERIENCE project (FP7/2007–2013, Grant No. 270273). We thank Frank Guerin for fruitful discussions and his input to Piaget’s understanding of sensory–motor schemas.

Norbert Krüger is a Professor at the Mærsk McKinney Møller Institute, University of Southern Denmark. He holds an M.Sc. from the Ruhr-Universität Bochum, Germany and Ph.D. from the University of Bielefeld. Norbert Krüger leads the Cognitive Vision Lab which focuses on computer vision and cognitive systems, in particular, the learning of object representations in the context of manipulation. He has also been working in the areas of computational neuroscience and machine learning.

References (59)

  • R.A. Brooks et al.

    The Cog project: building a humanoid robot

    Lecture Notes in Computer Science

    (1999)
  • V. Braitenberg

    Vehicles: Experiments in Synthetic Psychology

    (1986)
  • M. Huber, A hybrid architecture for adaptive robot control, Ph.D. Thesis, University of Massachusetts Amherst,...
  • A. Stoytchev

    Some basic principles of developmental robotics

    IEEE Transactions on Autonomous Mental Development

    (2009)
  • J. Modayil, B. Kuipers, Bootstrap learning for object discovery, in: Proceedings of the IEEE/RSJ International...
  • K. Mourão, R. Petrick, M. Steedman, Using kernel perceptrons to learn action effects for planning, in: Proceedings of...
  • E. Amir et al.

    Learning partially observable deterministic action models

    Journal of Artificial Intelligence Research

    (2008)
  • R. Sutton, Verification, the key to AI, 2001 [Online]. Available from:...
  • J. Piaget

    The Origins of Intelligence in Children

    (1936)
  • F.J. Corbacho, M.A. Arbib, Schema-based learning: towards a theory of organization for biologically-inspired autonomous...
  • A. Newell et al.

    GPS, a program that simulates human thought

  • C. Green

    Application of theorem proving to problem solving

  • E.D. Sacerdoti

    The nonlinear nature of plans

  • A. Samuel

    Some studies in machine learning using the game of checkers

    IBM Journal of Research and Development

    (1959)
  • N.J. Nilsson

    Learning Machines

    (1965)
  • T. Mitchel

    Machine Learning

    (1997)
  • H. Pasula et al.

    Learning symbolic models of stochastic domains

    Journal of Artificial Intelligence

    (2007)
  • D. Vernon et al.

    A survey of artificial cognitive systems: implications for the autonomous development of mental capabilities in computational agents

    IEEE Transactions on Evolutionary Computation

    (2007)
  • J. Pearl

    Probabilistic Reaasoning in Intelligent Systems: Networks of Plausible Inference

    (1988)
  • Cited by (150)

    • A survey of Semantic Reasoning frameworks for robotic systems

      2023, Robotics and Autonomous Systems
      Citation Excerpt :

      Prior work has explored techniques for utilizing different subsets of structures to organize tasks. Incorporating hierarchical structure into task representations, which can be learned from a repository of tasks [163,201,202] and encoded by an expert [119,199,203] promotes robustness and efficiency of task execution and planning. Decomposing tasks in hierarchies helps promote robustness to stochastic task outcomes [199,204] and representational efficiency through the re-use of tasks [21,201].

    • Fine-grained action plausibility rating

      2020, Robotics and Autonomous Systems
    • ON AFFORDANCES AND THEIR ENTAILMENT FOR AUTONOMOUS ROBOTIC SYSTEMS

      2024, The Modern Legacy of Gibson’S Affordances for the Sciences of Organisms
    View all citing articles on Scopus

    Norbert Krüger is a Professor at the Mærsk McKinney Møller Institute, University of Southern Denmark. He holds an M.Sc. from the Ruhr-Universität Bochum, Germany and Ph.D. from the University of Bielefeld. Norbert Krüger leads the Cognitive Vision Lab which focuses on computer vision and cognitive systems, in particular, the learning of object representations in the context of manipulation. He has also been working in the areas of computational neuroscience and machine learning.

    Christopher Geib is a Research Fellow at the University of Edinburgh School of Informatics. He holds an M.S. and Ph.D. from the University of Pennsylvania. His research focuses broadly on decision making and reasoning about actions under conditions of uncertainty, including planning, scheduling, constraint-based reasoning, human–computer interaction, human–robot interaction, and probabilistic reasoning. His recent research has focused on probabilistic intent recognition through weighted model counting and planning based on grammatical formalisms.

    Justus Piater is a Professor of computer science at the University of Innsbruck, Austria. He earned his Ph.D. Degree at the University of Massachusetts Amherst, USA, where he held a Fulbright graduate student fellowship. After a European Marie-Curie Individual Fellowship at INRIA Rhône-Alpes, France, he was a Professor at the University of Liège, Belgium, and a Visiting Research Scientist at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany. His research in computer vision and machine learning is motivated by intelligent and interactive systems, where he focuses on visual learning, closed-loop interaction of sensory–motor systems, and video analysis.

    Ronald Petrick is a Research Fellow in the School of Informatics at the University of Edinburgh. He received an M.Math. Degree in computer science from the University of Waterloo and a Ph.D. in computer science from the University of Toronto. His research interests include planning with incomplete information and sensing, cognitive robotics, knowledge representation and reasoning, generalised planning, and natural language dialogue. He is currently the Scientific Coordinator of the EU JAMES project.

    Mark Steedman is a Professor of Cognitive Science in Informatics at the University of Edinburgh, working in computational linguistics, artificial intelligence, the communicative use of prosody, tense and aspect, and wide-coverage parsing using Combinatory Categorial Grammar (CCG). Prof. Steedman is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), the Royal Society of Edinburgh (FRSE), and the British Academy (FBA). He is a member of the Academy of Europe and a former President of the Association for Computational Linguistics (ACL).

    Florentin Wörgötter has studied Biology and Mathematics in Düsseldorf. He received his Ph.D. in 1988 in Essen working experimentally on the visual cortex before he turned to computational issues at the Caltech, USA (1988–1990). After 1990, he was a researcher at the University of Bochum concerned with experimental and computational neuroscience of the visual system. Between 2000 and 2005, he had been Professor for Computational Neuroscience at the Psychology Department of the University of Stirling, Scotland where his interests strongly turned towards “Learning in Neurons”. Since July 2005, he leads the Department for Computational Neuroscience of the Bernstein Center at the University of Göttingen. His main research interest is information processing in closed-loop perception–action systems, which includes aspects of sensory processing, motor control and learning/plasticity. These approaches are tested in walking as well as driving robotic implementations. His group has developed the RunBot, a fast and adaptive biped walking robot.

    Aleš Ude studied applied mathematics at the University of Ljubljana, Slovenia, and received his doctoral degree from the Faculty of Informatics, University of Karlsruhe, Germany. He was awarded the STA fellowship for postdoctoral studies in ERATO Kawato Dynamic Brain Project, Japan. He has been a visiting researcher at ATR Computational Neuroscience Laboratories, Kyoto, Japan, for a number of years and is still associated with this group. Currently he is a senior researcher at the Department of Automatics, Biocybernetics, and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia. His research focuses on imitation and action learning, perception of human activity, humanoid robot vision, and humanoid cognition.

    Tamim Asfour received his diploma degree in electrical engineering and his Ph.D. Degree in computer science from the University of Karlsruhe, Germany in 1994 and 2003, respectively. He is leader of the humanoid robotics research group at the institute for Anthropomatics at the Karlsruhe Institute of Technology (KIT). His research interests include humanoid robotics, grasping and manipulation, imitation learning, system integration and mechatronics.

    Dirk Kraft obtained a diploma degree in computer science from the University of Karlsruhe (TH), Germany in 2006 and a Ph.D. Degree from the University of Southern Denmark in 2009. He is currently employed as an assistant Professor at the Mærsk McKinney Møller Institute, University of Southern Denmark. His research interests lie within cognitive systems, robotics and computer vision.

    Damir Omrčen received his Ph.D. in robotics from the University of Ljubljana, Slovenia, in 2005. He is employed as a research assistant at the Department of Automation, Biocybernetics and Robotics at the “Jozef Stefan” Institute in Ljubljana. His fields of interest include vision and robot control where he combines classical model-based approaches and more advanced approaches based on exploration and learning.

    Alejandro Agostini received the B.S. Degrees in Bioengineering with honours from the National University of Entre Ríos, Argentina, and in Electronic Engineering from the National University of Catalonia (UPC), Spain. He is currently a senior Ph.D. student in Artificial Intelligence at the UPC, and is working at the Institut de Robòtica Industrial (IRI) under a work contract drawn up by the Spanish Research Council (CSIC). His research interests include machine learning, robotics, decision making, and cognitive systems. He performed several research stays at the Bernstein Center for Computational Neuroscience, Göttingen, Germany, and at the University of Karlsruhe, Germany.

    Rüdiger Dillmann is a Professor at the Computer Science Faculty, Karlsruhe Institute of Technology (KIT), Germany. He is director of the Research Centre for Information Science (FZI), Karlsruhe. He is scientific leader of German collaborative research centre Humanoid Robots (SFB 588). His research interests include humanoid robotics, technical cognitive systems, machine learning, computer vision, robot programming by demonstration, and medical applications of informatics.

    View full text