Open access peer-reviewed chapter

Machine Learning and Cognitive Robotics: Opportunities and Challenges

Written By

Thomas Tawiah

Submitted: 16 May 2022 Reviewed: 17 August 2022 Published: 25 October 2022

DOI: 10.5772/intechopen.107147

From the Edited Volume

Cognitive Robotics and Adaptive Behaviors

Edited by Maki K. Habib

Chapter metrics overview

309 Chapter Downloads

View Full Metrics

Abstract

The chapter reviews recent developments in cognitive robotics, challenges and opportunities brought by new developments in machine learning (ML) and information communication technology (ICT), with a view to simulating research. To draw insights into the current trends and challenges, a review of algorithms and systems is undertaken. Furthermore, a case study involving human activity recognition, as well as face and emotion recognition, is also presented. Open research questions and future trends are then presented.

Keywords

  • neural networks
  • cognitive control architectures
  • software frameworks
  • imitation learning
  • reinforcement learning

1. Introduction

Cognitive robotics aim at endowing robots with intelligent behaviour by providing processing architecture that allows them to interact with the environment, learn, understand and reason about the environment, and behave like humans in response to complex world dynamics. These are problem-solving, intentional (planning), reactive, learning, understanding and explaining behaviours. Behaviours are based on modelling biological systems, optimal control theory (engineering), neurosciences, and other behavioural sciences. Typical applications where cognitive capabilities are important in manufacturing are pick and placement, machine inspection, and collaboration and assistance. Service robots are specialized robots [1], which operate either semi or fully automatically to perform services useful to humans (excluding manufacturing operations), such as caring for the elderly and rehabilitation. The autonomy of such robots is fully oriented towards navigation in human environments and/or human-robot interactions. Enabling more autonomous object manipulation with some level of eye-hand coordination and high precision in a complex environment is a challenge [2]. To embed systems with more sense of intelligence, collaborations between AI, machine learning and robotics communities are essential to achieve remarkable progress. Robot learning refers to the robot learning about itself and the effect of its motor commands and action. Examples include learning sensorimotor skills (locomotion, grasping and object manipulation) or interactive skills (manipulation of an object in collaboration with a human being). The field of developmental robotics and evolutionary robotics has also emerged to deal with how robots learn. In cognitive robotics, an integrated view is taken of the robots, their motor, perceptual subsystems and the body’s interaction with the environment. The main challenge is a lack of adequate knowledge of the human brain at different stages of development to enable adequate modelling.

Mobile agents are the principal means of embedding cognitive processing capabilities in robotic systems. These are software components that can carry out functions autonomously on behalf of another entity to realize tasks and can migrate from one robot to another through Wi-Fi networks. Embedded cognitive robotics focuses on understanding and modelling perception, cognition and action in artificial agents through bodily interactions with the environment to be able to perform cognitive tasks autonomously [3]. Several authors have reported works using mobile agents [4, 5]. From a technical point of view, there are several open challenges in the implementation of motor and cognitive skills in artificial agents. State-of-the-art robots are still not properly able to learn, adapt, react to unexpected conditions and exhibit a level of intelligence to operate in an unconstrained environment.

Machine learning (ML) algorithms are computationally intensive data-driven analysis, modelling and inference techniques based on statistical (clustering), evolutionary computing, neural networks (deep neural networks) and mathematical optimization [6]. The processing pipeline given a set of data sequentially consists of pre-processing, feature extraction, modelling, inference and prediction. The modelling stage may involve iterative minimization of the criterion of the model fit between a discriminant and the data. It focuses on the development of algorithms that allow computers to automatically discover patterns in the data and improve with experience, without being given a set of explicit instructions. ML has been applied in experimental robotics to acquire new skills; however, the need for carefully gathered data, clever initialization and conditioning limits the autonomy with which behaviours can be learned. In particular, deep learning neural networks with several levels of composition have achieved remarkable performance in vision and natural language processing. It can be leveraged via transfer learning to generalize from simulation to the real world via domain randomization [7, 8, 9] to learn end-to-end visuomotor controllers [10, 11]. The limitations of deep neural network (DNN) techniques such as interpretability, susceptibility to adversarial attacks, privacy issues and stability under perturbations in designing end-to-end control policies are worth addressing. In particular, reliable long-term prediction is desirable to enable re-planning to adapt to the changing environment [12].

Machine learning techniques embedded within current AI systems (via agents) have increasingly shown sophisticated cognitive capabilities. For example, an existing approach in machine learning to lexicon acquisition is focused on symbol grounding problems on how to connect sound information from a human and sensor information from robots captured from the environment. A multi-sensory approach based on co-occurrence probabilities between words and visual features that is observed by a robot [13] improves as a result of using an active selection of motion based on saliency [14, 15]. Several developments in cognitive robotics underlying its multi-disciplinary nature are presented.

Traditional approaches of processing are based on a bottom-up approach with the processing pipeline starting sequentially from sensing, perception, cognition and action under control architecture such as in ref. [16], which is essentially behaviour based and, later on with high-level decision processing [17], incorporated to enable more autonomy. Fundamental to robotics are the control policies that guide the behaviour of a robot. It is mainly based on control theory and mathematical optimization or biologically inspired models with control relying on vision in combination with other sensing modalities (olfactory) [18, 19]. A lot of models have been developed governing the behaviour of a robot itself and how it interacts with its environment [12, 20, 21].

There are three main control architectures, namely, logic-based, subsumption and hybrid architectures [16]. The logic-based architecture uses a set of rules and provides pro-active behaviour, whilst the latter incorporates intelligence and interaction with the environment as a means of introducing cognition. Behaviour is organized hierarchically. The hybrid architecture achieves modularity and interactivity between layers. Because the models used were relatively simple, it suffers from the problem of scalability and modelling of complex scenarios. Instead of providing all information to the robot a priori, for example, possible motions to reach a certain target position, the agent will, through some process, ‘learn’ which motor commands lead to what action. For autonomous systems, a decision level incorporating capacities of producing plans and supervising their execution, whilst at the same time being reactive to events from the previous layer has been added to the top-level hierarchy [6]. They are typically used in controlling the robot (motion control) or in carrying out tasks. Different multi-robot configurations including robotic swarm use multi-agent systems to carry out complex tasks.

Predictive processing (PP) [3], a processing approach in cognitive sciences, is increasingly being used in cognitive robotics. It is a top-down approach that aims at unifying perception, cognition and action as a single inference processing. It is predominantly based on the free-energy principle [22], which is associated with frameworks such as predictive coding, active inference and perceptual inference. The free-energy principle seeks to minimize prediction errors [20]. It asserts that through bodily interaction with the environment, agents are expected to learn and then be capable of performing cognitive tasks autonomously [23]. The core of information flow is top-down and the bottom-up flow of prediction error. Control motor commands are replaced by proprioceptive top-down prediction using the forward model [24]. PP is typically used in motor control and estimation of body states of a robot [25, 26]. A neural network is typically used as the generative model. Active inference, a related frame work, aims at minimizing prediction error or free energy using variational inference. It involves constructing a forward model involving hidden states to reduce proprioceptive noise for control [21].

To address issues in cognitive robotics, researchers in developmental robotics build artificial systems capable of acquiring motor and cognitive capabilities by interacting with the environment inspired by human development [27]. Traditionally mobile agents, simulated robots, humanoids or specially designed apparatus are used for research into higher-order cognitive capabilities (learning, communication and understanding) mimicking the functionalities of the human brain like its internal structure, infrastructure and social structures. The model starts from foetal sensorimotor mapping (mechanisms of dynamic motions and motor skill development) in the womb, body and motor representation and spatial perception through to social behaviour learning (communication, action execution and understanding) and spatial perception. Important insights have been gained; for example, ref. [28] indicates that control and body structure are strongly connected, with the body having the role of controlling its motion. In ref. [29], dynamic walkers realize walking on slopes without any explicit control or actuation, saving energy.

Models of human communication mechanisms have been used in developing interactions such as between caregivers and robots, action execution and understanding, development of vocal imitation and joint attention [27] in human-robot communications. Sumioku et al. [30] proposed an open-ended learning loop of social action by which artificial infants reproduce experience contingency using information-theoretic measure of contingency. Typically, gaze-following or utterances about the focus of attention are used for joint attention. Human-like robots able to show distinct facial expressions to be used in specific situations have been developed [31, 32], but the robots are unable to adapt to non–pre-specified situations. From control perspective, some of the capabilities required [33] for collaboration and assistance between robots and humans are as follows: the ability to perceive the world in a similar way to humans; the ability to communicate with humans using natural language; the ability to develop cognition through sensorimotor association; the ability to use attention and emotion to control behaviours and the ability to produce appropriate behaviours in a variety of situations. Clearly, this calls for a multidisciplinary approach involving neural sciences, developmental robotics, psychology, and engineering. Kawamura and Brown [34] approached the problem using working memory-based multi-agent systems for robot behaviour generation.

Evolution has equipped humans with a wide range of tools for collaboration, including the use of language, gestures, touch, and facial expressions, to facilitate interactions. Robots must support many of these communication methods to effectively collaborate or assist humans. In particular, for robots working in human environment, there is an urgent need to anticipate and recognize bodily movements and facial expressions, to offer timely and effective assistance when needed. To this end, a case study involving facial and action recognition to illustrate some capabilities in this regard is presented. The rest of the chapter is structured as follows: Section 1.1 introduces computational architecture and platforms for cognitive robotic systems. Sections 1.2 and 1.3 cover the roles of technology and software, respectively. Section 1.4 deals with the role of decision-making in cognitive systems. Sections 1.5, 1.5.1 and 1.5.2 briefly introduce the main algorithms used in cognitive robotics, namely, reinforcement learning and imitation learning algorithms, highlighting developments in ML that have made it possible for renewed interest in these algorithms. Section 1.5.3 reviews deep learning networks for feature learning and classification. Sections 1.6 and 1.6.1 provide a case study on human activity recognition. Section 1.7 briefly reviews current trends, whilst Section 1.8 discusses successes, challenges and research directions. Finally, Section 1.9 concludes the chapter.

1.1 Architecture and platforms for cognitive robot research

To facilitate the development of mature cognitive robotic systems, several computing platforms including real robots like humanoids (icub), panda and Hobo, simulators, and middleware like ROS and YARP are available. Particularly, to facilitate the development of mature cognitive systems, robots must continuously interact with the environment, know where objects are in the scene and understand the consequences of their generated actions. The icub [35] humanoid robot is a 53-degree-of-freedom humanoid robot of approximately the same size as a three-year-old child. It can crawl on all four limbs and sit up. Its hand allows dexterous manipulation, and its head and eyes are fully articulated. It is an open systems platform available for research under GNU general public license. Its capabilities are built based on an ontogenetic pathway of human development. Figure 1 shows different postures of icub. Robotic simulators are of interest despite not being able to provide a full model of the complexity present in the real environment. For example, the icub simulator [35] has been designed to reproduce as accurately as possible the physics and dynamics of the robot and its environment with the constraint of running approximately in real-time. It is composed of multiple rigid bodies connected via joint structures. It consists of the following components: physics and rendering engines, YARP protocol for simulated icub and body model. All commands sent to and from the robot are based on YARP instructions. More details are provided in ref. [36]. Besides, there are several platforms for humanoids and other robots in studies reported in ref. [37, 38, 39]. Details of Pioner3-AT bender robotic platform are provided in ref. [40]. There are also several European Union funded research projects on cognitive robotics that have resulted in several architectures, system concepts and benchmark datasets [41]. Several simulators for robotic systems are provided in ref. [42]. To build cognitive systems, several computational architectures have been designed and built to realise different cognitive platforms.

Figure 1.

iCub robot in different postures from ref. [35].

The following are representative architectures: The Clarion [43, 44, 45] architecture is a broadly scoped computational psychological model based on the dual theory of the mind, capturing essential structures mechanisms and processes of the mind. It provides a framework, essential structures and computational model for realising processes of the mind. It also facilitates detailed exploration of the mind and psychological theories. Clarion consist of four subsystems, namely, action-centred subsystem (ACS), non–action-centred subsystem (NACS), motivational subsystem (MS) and metacognition subsystem (MC). MS provides the impetus for action and cognition, whilst MC provides for monitoring and regulating other processes. Together, these subsystems address action, skill learning, memory, reasoning, motivation, personality, emotions and their interactions. Figure 2 is a high-level diagram of Clarion. Each subsystem consists of two levels, which is a dual representation structure. The top-level encodes explicit knowledge, potentially corresponding to ‘conscious’, and the bottom layer encodes implicit knowledge corresponding to ‘unconscious’ knowledge and also corresponds to symbolic versus connectionist representation.

Figure 2.

CLARION architecture [44].

Computationally, ACS is realised with multilayer perceptron or reinforcement learning, whilst NACS implements implicit declarative processes with associative memory. Explicit declarative processes are captured as symbolic associative rules. Implicit processes deal with drive activations captured by MLP and explicit processes deal with goals. More details of the architecture are provided in ref. [44]. Other general purpose architectures include Soar [46], which integrate knowledge, intensive reasoning, reactive execution, hierarchical reasoning and learning from experience. It has the goal of creating systems with cognitive capabilities like humans. Several other projects that target specific robotic platforms have produced application-specific cognitive architectures. These include the HAMMER [47, 48] and ArmarX and Xperience architectures on Armar humanoid robot [49]. The HAMMER architecture is for assistive robotic agents cooperating with humans to carry out tasks. It provides for sensing user states and actions, modelling skills and predicting intentions and personalising to maximise assistance effectiveness over extended periods of interactions. ArmarX is a hybrid architecture, proposed for human observation and experience. Interactions with humans occur in natural language. It recognises the need for help and reason about the world. The original architecture proposed has been continuously extended in several projects. It consists of three layers, namely, high-level layer for planning and reasoning, mid-level layer for mediating symbolic knowledge and sensory-motor data; and low-level layer for robotic behaviour focusing on functions and skills, hardware abstraction layer and bridging middleware to other robot software frameworks.

Using virtual environments for simulation is very important to ensure the safety of robots, humans and other objects in the environment; the slow wall clock time makes it a too slow method to generate enough data in a reasonable time frame, and physical trials are slow and costly and the learned behaviours are limited. Increasingly, the use of complex simulation environments is being used for experimentation and research. By training a virtual robot in countless situations, such as low-probability scenarios, it is the objective of the system to learn to generalize from the scenarios and safely handle future yet unseen scenarios. When the physical properties of the environment, such as gravity, friction coefficients and the object’s visual appearance, are used and randomized, it becomes apparent that the learned models transfer successfully to the physical robot using domain randomization [50]. One such platform is the Unity [51, 52] 3-D rendering platform, a cloud scalable infrastructure for generating thousands of frames per second. For video games, Arcade learning environment (ALE) [53] is a standard test bed for deep reinforcement learning (DRL) algorithms, and it supports discrete actions. TORCS car racing simulator [54], on the other hand, supports continuous actions for deep reinforcement algorithms.

1.2 The role of technology

The pervasiveness of information communication technology (ICT) is evident everywhere in our daily lives. In industrial settings, the following are some examples:

  • Monitoring and control of all tools of production;

  • Collecting data for many sensors for monitoring, control and predictive maintenance of equipment;

  • Use of machine intelligence, wireless connectivity, cloud computing to integrate physical systems and processes (Industry 4.0 effort);

  • Use of key enabling networking technologies, namely, edge, fog and 5G using AI agents;

  • Applications of service robots (non-industrial) include shopping, travelling, home assistance and elderly care.

In our daily lives, examples include the numerous gadgets in our homes to assist in our daily lives and care for the elderly. Robotics network and cloud robotics have evolved to connect robots and allow a central or distributed intelligence to command and control any set of robots. Advantages include flexibility, simplification of hardware and software about the robot, ease of re-planning and task management of complex robots. Several configurations exist for robots, namely, stand-alone robots, networked robots and cloud robotics [55]. Networked robots address the problem associated with stand-alone robotic systems by sharing perceived data with each other and executing tasks in a cooperative and coordinated manner. Cloud computing empowers robots by providing faster and more powerful computational capabilities through massively parallel computation (using CPUs, GPUs, and clusters and data centres) and higher storage facilities, as well as access to open source, big datasets and software cooperative learning capabilities.

Typical applications include human-assisted driving and self-driving vehicles for safe transportation, Industrial 4.0 drives to create cyber-physical systems for industrial processes based on cloud by creating a replica in the cyberspace for closed-loop feedback [56] and support for autonomous and smarter processes. It also caters for the convergence of sensing, computation and communication by providing a common platform for integrating data acquisition, processing, storage and decision making. AI agents for digital twin 4.0 provide movement prediction, tasking learning, risk reduction and predictive maintenance. Fundamental to most of these developments is AI and ML for continuous decision-making.

CR are expected to continuously learn and adapt to their environments and make decisions in real-time when required under conditions of uncertainties in sensor data, processing complexities, privacy and security constraint to arrive at timely and effective decision-making. AI and ML empowered agents is one approach to realising this goal. Current robotics have made significant progress in sensing perception and control problems but find it challenging to provide integrated thinking, feeling and knowing [57]. It is still very challenging for two-legged robots to walk naturally in unconstrained environments. Several challenges exist in using robotic platforms such as the high cost of prototyping, steep learning curve and programming robots to carry out complex tasks like autonomous driving in unconstrained dynamic environments.

1.3 The role of software

Closely related to cognitive robotics is cognitive computing (CC), which is a multidisciplinary field aiming at devising computational models and decision-making mechanisms based on neurobiological processes of the brain, cognitive sciences and psychology. It aims to endow computers with the ability to think, feel and know. Since there is no commonly accepted definition of cognition, there are several definitions of cognitive computing [58, 59]. Wang [59] defines cognitive computing in terms of cognitive informatics that applies how the brain processes information and copes with decision-making to information sciences. CC is defined as an emerging paradigm of intelligent computing methodologies and systems based on cognitive informatics that implements computational intelligence by autonomous inferences and perceptions mimicking the mechanisms of the brain. Research in cognitive computing is focused on three thematic areas, namely, computer systems with a faculty of knowing, thinking and feeling. Applications of CC include education, healthcare, commerce and industry.

When software adds intelligence to information-intensive processes, it is known as robotic process automation. The process uses AI to extend and improve action and saves cost and customer satisfaction. It is typically used in completing a complex business process that uses unstructured data or persists over a long period [57]. Typically a bot (an agent for a user of a program) observes the process to automate the process.

One of the requirements for robust and effective CR is software integration frameworks. This is justified when one considers the following:

  1. Cognitive models are derived from a large spectrum of computational paradigms that are not necessarily compatible when considering the underlying software architecture;

  2. Changes in application requirements due to hardware interfaces, computational and network latencies and the need for integration;

  3. Cognitive research projects utilize robotic systems as demonstrators, and therefore serve as an important proof of concepts and might also require integration;

  4. The need to provide common interfaces and functions;

  5. Specific software frameworks may be required to take advantage of innovations in hardware (new development of brain-like hardware architecture) and the development of relationships among concepts of a given domain.

Software frameworks enable thinking by taking advantage of brain-like computer machinery or determine causal relationships among concepts of a given domain. There have been several published works on software frameworks [60] prototyping, development of middleware, sustainable software design and architectural paradigms. MARIE [61] is a component-based software architecture for integrating and combing heterogeneous software and computational paradigms. It adapts the mediator design pattern to create a mediator interoperability layer (MIL). MIL is implemented as a virtual space where applications can interact together using a common language. ROS (robot operating system) [62], an open-source robotic middleware suite, is frequently used in robotic projects. ROS provides a set of software frameworks for software development. ROS provides the following services: hardware abstraction, low-level device control, message passing between two processing, package management and other functions; ROS 2 [63] and above provide real-time support and an embedded system. ROS is made up of three components: language and platform independent tools for building and distributing ROS-based systems; ROS client implementations (Roscpp, rospy, roslisp, etc) and packages containing application-related code. Ros typically connects to robots via webSockets and operates on cloud servers. There are several platforms on which ROS runs including ROSbot, Nao Humanoid [64] and Raven II surgical robotic research platforms. Peira et al. [65] provide a framework for using ROS on the cloud. Davinci [66] is another software framework that is cloud-based for service robots exploiting parallelism and scalability. It is based on the Hadoop cluster combined with ROS as the messaging framework. Fast SLAM algorithm, an environmental mapping algorithm for large-scale mapping, was implemented on this platform with significant performance improvement.

A framework for unifying multi-level computing platforms and orchestrating heterogeneous edge, fog and cloud computing resources compliant with MEC [67] was proposed in ref. [68]. It is suitable for integrating different computing, communication and software technologies.

1.4 The role of decision-making

At the core of most ML tasks is decision-making based on information fed to the decision maker, for example steering or breaking a car. Decision-makers used to be either a human or a group of humans; now it can be AI using different combinations of ML and traditional algorithms via agents technology. According to Kahneman [69], there are two modes, namely, system 1 and 2 modes of the human brain, and most ML methods emulate the mode of operation in system 1. ML establishes empirical associations through training and learning. When given scenarios resembling training scenarios, ML yields results in a fast way. However, it struggles when given scenarios not covered during training or the training was inadequate. In human decision-making, when system 2 fails to intervene because it is fooled by an apparent coherent picture created by system 1 tends to result in decision-making. Thus, if ML is to be used in decision-making, the ability to detect difficult and dangerous situation tend to trigger system 2.

In cardiovascular medicine, ML is routinely used to perceive an individual by collecting and interpreting his/her clinical data, and clinicians would reason on them to suggest actions to maintain or improve the individual’s health. Thus, it mimics the clinicians’ approach when examining and treating sick patients [70]. Big data leveraged by ML can provide well-curated information to clinicians so that they can make better informed diagnosis and treatment. ML analyses have demonstrated human-like performance in low-level tasks in robotics and cardiology.

There have been studies reporting on the success of sensing-perception-control/action loop in autonomous vehicles [56].

Higher-level tasks involving reasoning such as patient status interpretation and decision support, and reasoning under uncertainties and dynamic environment in robotics have proven to be challenging. Intention predictions in a dynamic environment are also challenging. Similarly, human-robot cooperation for safe road transportation includes challenges in infrastructure [71] (sensor, communication subsystem, computing and storage) and predicting behaviour when driving, motion prediction and gesture recognition.

1.5 Review of algorithms

From the cognitive architecture descriptions discussed, at the high level, the actions of a robot are goal-directed, with the middle layer responsible for intermediate organisation, planning and execution using some memory hierarchy. The bottom layer is reactive and deals with the environment. For a robot to be able to interact with other objects and its environment, it needs to know how to predict the consequences of its actions using typically a forward model: X=(S, πθ), where s is the state of the robot, and πθ: S->A is a parameterized action policy (A) to the space of effect or task space. Similarly, the inverse model computes the action policies that can generate a given effect (S, Y)-> πθ. Some examples are mapping of movements of the hand in the visual field to the movement of the end point of a tool, and oscillation of the legs to body translation of a robot. There are two main approaches, analytical approach based on control engineering and learning-based approach. The main challenge is to model a prior all the possible interactions between a robot and its environment. Learning is additionally confronted with multimodal sensing perception, high dimensional spaces, continuous and highly non-stationary spatially, and temporary state spaces. Typically, statistical regression is used to guide autonomous exploration and data collection. Alternatively, an approach for learning and constraining the environment is active learning. Several learning paradigms have been used including reinforcement learning and imitation learning. Several machine learning techniques such as deep learning have been used to model robotic agents in the real world. Deep learning networks build a model that produces end-to-end learning and inference system driven purely by data. Most of the approaches reported in the literature make use of neural networks to construct forward and inverse modules. To overcome the problem of catastrophic forgetting (training a model with new information interferes with previously learned knowledge [72]) in neural networks, special memory architectures may be used [34] besides pure algorithmic approaches. Additionally, other cognitive approaches from developmental robotics, neuroscience and other behavioural science approaches have been used. Active learning and inference approaches constrain the search space and allows self-exploration. These methods generally begin using random and sparse exploration, build meta-models of the performances of the motor learning mechanism and concurrently guide the exploration of various subspaces for which the notion of interest is defined [73]. Interest is defined in terms of variants of information gain (variance, entropy or uncertainty). Motivational and goal-driven approaches where exploration and search are goals/curiosity or attention driven [74, 75, 76] to reduce the large search spaces. Cognitive processing techniques can be split into two main approaches, namely, the control theory approach and the free energy-based approach. Although both of them use optimization techniques, the latter approach seeks to minimize free energy prediction error using variational or Bayesian approaches.

1.5.1 Review of reinforcement learning

There are three main classes of algorithms for machine learning, namely, supervised, unsupervised and reinforcement learning. In supervised learning, data defining the input and corresponding output (often called ‘labelled’ data) are available. In unsupervised learning, only the input is available and the structure of the underlying data is typically solicited. It is used to explore the hidden structure of the data. In reinforcement learning (RL), learning takes place by trial and error interactions with the environments. It is goal-directed learning that constructs a learning model specifying output to maximize long-term profit. Deep RL (DRL) uses deep learning methods (multi-layer neural network) to learn models and representations at different levels of abstraction [77] in an unsupervised manner. It leverages deep learning as a function approximator to deal with high-dimensional data. DRL algorithms have been applied to robotics allowing control policies for robots to be learned directly from camera inputs in the real world [11]. The basic model of RL is shown in Figure 3.

Figure 3.

RL algorithm using a single agent.

At time t, the agent receives state st from the environment. The agent uses its policy to choose an action at. Once the action is executed, the environment transitions a step providing the next state St+1, as well as feedback in the form of reward Rr+1. The agent uses knowledge of state transitions of the form (St, At, St+1, Rt+1) to learn to improve its policy. A policy (π) is a mapping function from any perceived state s to action taken from that state. Alternatively, a policy can be interpreted as a probability distribution of candidate actions that will be selected from state (s) as in Eq. (1):

π=ϕs=paisaiΔπΛpais=1E1

π denotes candidate actions on policy π, and p(ai|s) denotes the probability of taking action ai given the state s. A policy is deterministic if the probability of choosing an action a from s is p(a/s)=1 for all state s, otherwise, stochastic, i.e, p(a|s)<1. A value function is used to evaluate how good a certain state or state-action pair (s,a) is. For this purpose, a generalized return value Rt, defined by Eq. (2) is used, where γ (0< γ<1) is the discounted factor.

Rt=rt+1+γrt+2+γrt+3γTt1rt=i=0Tt1γirt+i+1E2

The value of a state under policy π is evaluated as the expectation of Rt defined by Eqs. (3) and (4) for the state and state-action pair, respectively. E denotes expectation operation.

Vπ=ERrst=sπE3
Qssa=ERrst=sat=aE4

Underlying RL is dynamic programming [78] and bellman equations for optimality under Markov decision process modelling. RL algorithms have been successfully applied to several real-world problems with limited state spaces to problems in control and navigation. However, it faces the following challenges:

  • The optimal policy must be inferred by trial and error interaction with the environment with the only learning signal being the reward.

  • Since the observations of the agent depend on its action, it may contain strong temporal dependencies.

  • Long-range dependencies may only emerge after many transitions.

  • Balancing exploitation versus exploration.

Underlying RL is the Markov property that the current state affects the next state or is conditionally independent of the past given the present state. Partially observable Markov decision processes (POMDP) are Markov decision processes (MDP) in which agent receives an observation p(ot+1|st+1,at) where the distribution is dependent on the current state and previous action [79]. An episodic MDP resets after each episode of length T, and the sequence of states, actions and rewards in an episode constitute a trajectory or rollout of the policy. There are three main types of reinforcement algorithms, namely policy-search, value-function based and those that combine both policy and value function approaches. They include actor-critic method, temporal difference and Monte Carlo-based methods [80, 81]. The increasing use of deep reinforcement learning (DRL) algorithms has been attributed to the low-dimensional representation of deep neural network representation and the powerful functional approximation of neural networks. The following significant recent developments in DRL have made it possible to scale to large dimensional state space:

  • The combination of duelling DQN architecture with prioritized experience replay in providing better estimates of expected return functions [82, 83].

  • The use of an experience replay and target network that initially contains weights of the network enacting the policy, but is kept frozen for a large period [83, 84, 85].

  • Introduction of hierarchical reinforcement learning.

  • Improvements in guided policy algorithms.

  • Asynchronous advantage actor-critic (A3C) algorithm [86] developed for both single and distributed machine settings. A3C combines the advantages of updates with actor-critic formulation and relies on asynchronous update policy and value networks in parallel.

1.5.2 Review of imitation learning

Imitation learning (IL) aims to mimic human behaviour in a given task by facilitating the teaching of complex tasks with minimal knowledge through demonstration. There are three main classes of ML algorithms for imitation, namely, behaviour cloning, inverse reinforcement learning and generative adversarial learning [87]. Behaviour cloning applies supervised learning by learning a mapping between the input observation and the corresponding actions, provided there is enough data. Generative adversarial imitation is inspired by generative adversarial networks [88]. Typically, an agent uses instances of performed action to learn a policy that solves a given task using ML techniques. The agent could learn from trial and error or observe other agents. It has been applied to problems in real-time perception and reaction, such as humanoid robots, self-driving cars, human-computer interfaces and computer games. The assumption is that an expert (teacher) is more efficient than the agent learning from scratch when given a task [89]. Imitation learning is an interdisciplinary field of research, and it is sometimes difficult to define suitable reward function for complex tasks. For example, it is often the case that direct imitation of an expert’s motion does not suffice due variations in the task such as the position of the object, environmental conditions and inadequate demonstrations [90]. Therefore, it is difficult to learn policies given demonstrations that generalized to unseen scenarios. The policy must be able to adapt to variations in the task and surrounding environment. Argall et al [91] address different challenges in the process of IL, such as computational methods used to learn from demonstrated behaviour and the processing pipeline. A typical representation of a sample for IL consists of pairs of action and state, such as position, velocity and geometric information, and modelling the process as MDP. The learning process is with pre-processing, sample creation and direct or indirect imitation.

The following are some of the challenges of IL [90]: Noisy or unreliable sensing, correspondence problem and observability where the kinematics of the teacher is not unknown to the learner. Further, complex behaviour is often viewed as a trajectory of dependent micro-actions, which violates independent and identically distributed assumptions in machine learning. Lastly, safety concerns in human-robot interactions, the ability of the robot to react to human force and adapt to the task. A typical flow chart [90] is shown in Figure 4.

Figure 4.

Imitation learning flowchart [89].

There are different methods from demonstrations, namely, structured predictions [92], dynamic movement primitives [93], inverse optimal control (inverse reinforcement learning [94], active learning [95], transfer learning and other techniques. Active learning needs a dedicated oracle that can be queried for demonstration. Inverse RL techniques use demonstrations to learn cost functions over extracted features. It first recovers a utility function that makes the demonstration near-optimal and searches for the optimal policy using a cost function as an optimization objective. Closely related is apprenticeship learning, which uses demonstrations from an expert or observation to learn a reward function. A policy that optimises the reward function is then learned through experience (trial and error). Transfer learning use experience from old tasks or knowledge from other agents to learn a new policy. The reader is referred to refs. [87, 96] for details of imitation learning and its applications in robotics. Learning a direct mapping between state and action is not enough to achieve the required behaviour in most cases due to cascade errors, insufficient demonstrations and the difficulty in reproducing the conditions and settings. The learner has to learn actions and re-optimise policies with respect to quantifiable reward functions. Figure 4 is a flowchart showing different variants of imitation learning. The following are some recent developments:

  • Use of goal-directed (motivation or curiosity-driven) learning to exploit and explore multi-task spaces;

  • Use of developmental robotics concept of goal babbling for visuomotor coordination tasks for coordination of multiple subsystems (head and arms) [97].

  • Use of predictive processing techniques [3].

  • Use of memory systems for storage of knowledge of agents’ beliefs, goals and short and long-term memory, together with efficient integration with other components of cognitive architectures.

  • Use of machine learning for integration of perceptual processing, feature extraction, learning and control.

1.5.3 Review of deep learning algorithms

The recent success of deep neural networks (DNN) in computer vision and natural language processing has led to its application in cognitive robotics. Traditionally, cognitive robotics architecture has been built with artificial intelligence at the top level using a restricted form of natural language and gestures for communication, and biologically inspired mechanisms at the lower levels. Deep learning using DNN has been applied to perceptual processing, motor control, object manipulation and different cognitive processing level of the generic architecture discussed earlier on. A deep learning survey focusing on deep reinforcement learning and imitation has been provided by Tai et al [81], including applications in ML in robotics. Perception processing is passive since an intelligent agent receives observations from the environment and then infers the desired properties from the sensory input. Guo et al [98] provide a comprehensive overview of deep learning for perception. Similarly, for manipulation applications, Gu et al. [99] present on deep reinforcement learning for robotic manipulations. Gupta et al. [100] also present on robotic manipulations using human demonstrations. Several works relating to deep reinforcement learning in robotic navigation [101, 102, 103] have been published including those using SLAM [104, 105]. Zhang et al. [104] propose neural SLAM based on a neural map proposed by Parisotto and Salakhutdinov [106], which in turn uses a neural turing machine for the deep RL agent to interact with. The main challenge with DRL is the reality gap, which refers to discrepancies between models trained with data from simulated environment, transferred to the real world, and deployed on real robotic platforms. It is due to unrealistic environmental conditions such as lighting conditions, noise patterns, texture, etc., synthetic rendering and real-world sensory readings. It is particularly several with visual data (images and videos). Domain adaptations are typical to use to mitigate the problem [107] based on generative adversarial networks (GANS).

Other DNN architectures include convolutional autoencoders for low-dimensional image representation [108], deep recurrent neural networks [109] and deep convolutional networks [11, 110]. To improve robustness of deep learning networks, several strategies have been adapted, including the following: Use of auxiliary tasks in either supervised or unsupervised fashion; experience replay, hindsight experience, curriculum learning, curiosity-driven exploration, self-replay and noise in parameter space for exploration. Table 1 provides a summary of representative research works covering different ML approaches to solving cognitive problems and the functionality provided. For industrial 4.0, initiative typical ML algorithms are provided in ref. [56].

Machine learning paradigmReinforcement learningImitation learningDeep learningEnd-to-end processing task
Transfer learning[111]
[112] DQN
Games
Games
Representational learning[113]
[114]
Object recognition
Navigation
Feature extraction[115] k-means
[116] autoencoder
[117] autoencoder
[118, 119] recurrent
neural networks
[120] LSTM
[121, 122] CNN
Language and
behaviour learning
Trajectory planning
Object grasping
ML plus other techniques[123][124, 125]

Table 1.

Comparison of different ML techniques reported in the literature.

1.6 Use case

For robots acting as human companions, autonomy is fully-oriented towards navigation in a human-centred environment and human-robot interactions. It is facilitated if the robot’s behaviour is as natural as possible. Some requirements are that robot independent movement must appear familiar and predictable to humans and have similar appearance to humans. Human–robot interactions include the following: use of natural language or subset for communication, gesture or activity interpretation that involves tracking and action recognition; gesture imitation that involves tracking and reproduction and the person following which involves 2-D or 3-D based tracking. Acceptable performance at the task level requires real-time processing constraint of 50 milliseconds per second. Safety is also very important as robots are expected to evolve in a dynamic environment, well populate with humans. The main challenge is that robotic systems lack learning representation, and interactions are often limited to pre-programmed actions. One solution strategy is to conceptualize cognitive robots as permanent learners, who evolve and grow their capacities in close interactions with users [86]. Robots must learn new tasks and actions relative to humans by observing and imitating (imitation learning). Thus human detection and tracking, activity recognition and face detection are some basic tasks that must be performed robustly in real-time. A use case is presented next, which deals with daily activity recognition at home and face recognition using publicly available dataset. These typically fit in several robotic studies investigated in human-centred environments [40]. The algorithms are first described, followed by an evaluation.

1.6.1 Activity recognition

Research activities in domestic service robots have increased in recent years. Some of the main drivers are the projected future use of domestic robots for improving elderly people’s quality of life, childcare, entertainment and education. Several benchmark datasets [126, 127, 128, 129] and methodologies for evaluating the capabilities and performance of robotic platforms are available. Action recognition is used in several application domains such as surveillance, patient monitoring systems, human–computer interface, housekeeping activities and human assistance by robots (guiding humans). There are two processing techniques: spatial approach, which allows recognizing activities from images, and spatio-temporal approach for detecting specific activity as space-time volume.

The HMDB51 [130] is an action dataset whose action categories mainly differ in motion rather than static points. It contains 51 distinct action categories, each containing at least 101 video clips. Video clips are extracted from a wide range of sources. The clips have been annotated and validated by at least two human observers. Additionally, meta information tags allow for a precise selection of tags for training, testing and validation. Meta-data tags include information on camera viewpoint, presence or absence of camera motion, video quality and a number of actors involved. The training procedure is also described.

A simulation study on activity recognition based on spatio-temporal analysis of a large video database of human motion recognition [130] is provided. The main processing steps are shown in Figure 5. The algorithm consists of six main processing steps, namely, pre-processing, spatio-temporal analysis in the wavelet domain, class model construction (class dictionary), batch singular value factorization (BSVF), similarity feature computation and classification. The pre-processing step involves filtering for noise removal and optionally contrast enhancement using histogram equalization. The wavelet analysis step applies orthogonal or biorthogonal wavelet (9/7 or 5/3 filter) to produce subband frames. A silhouette feature map is constructed by combining low–low and high–low subbands as described in Tawiah et al [130]. The map is a tiling of rectangular features describing the dominant objects in the frame. Sparse dictionary is constructed for each activity as described in refs. [131, 132]. Spatial frame resizing and temporal frame subsampling by interpolation are applied to construct an action volume of 64 × 32 × 100 pixels for each action volume. It is then reshaped to a vector of size of 51200. Batch singular value prediction (BSVP) is based on the classical singular value decomposition [133] used in signal processing with batch data input (matrix). Each column of the input matrix represents a sample action. The output is a decomposition consisting of left and right-hand singular vectors (or matrices) for vector (or matrix) input and a covariance matrix as the diagonal matrix.

Figure 5.

Similarity-based feature construction and classification.

BSVP prediction step consists of two sub-steps: first, apply singular value decomposition to the same batch training sample used in constructing the dictionary, replacing one column (e.g, the first) with an incoming action sample. Then, apply the computation step in Eq. (5). The class dictionary is constructed using a batch sample matrix, with each sample representing an action volume. The prediction for an input action sample is computed using Eq. (5):

Estrj=j=1nsampleφr:j=1dimsLHSriαjj+j=1dimsRHSrjαjjE5

Ф denotes the class dictionary matrix, N sample denotes the number of samples in the batch dataset, Dim S denotes the dimension of each sample, RHS (r,i) denotes the right-hand singular vector, LHS(r,i) denotes the left-hand singular vector, α denotes the covariance matrix and Est denotes the estimate of the sample. The indices, r and i, are used to identify specific elements in a matrix. The similarity between the input spatio-temporal volume and Est (refer to Eq. 5) is computed using five similarity measures, namely, canonical correlation [134], Bhattacharyya distance [135], modified Bhattacharyya distance, histogram intersection [136] and cityblock. A similarity vector is formed by concatenating all the similarity values. A multi-class feed-forward classifier [137], consisting of 51 all versus one classifier, is constructed. The classifier is able to assign an action volume to multiple classes. Samples of input video frames and the corresponding object outline maps are shown in Figures 6 and 7.

Figure 6.

Brush hair sample video clip, showing frames 1, 2 and 3.

Figure 7.

Cartwheel sample video clip, showing frames 1, 3 and 5.

BSVP does not reconstruct a sample using the sparsest representation as is the case in classical sparse coding but instead uses one-time reconstruction from batch sample whose representations are known (represented as LHS and RHS singular matrices with known covariance) and applies BSVP algorithm. This provides a representation for a sample taking into consideration statistical characteristics of all samples in the batch. It is computationally efficient and avoids solving L1-norm optimization, and it is suitable for real-time classification problems. The result on applying the proposed algorithm to all the fifty-one action classes is summarised in Figure 8, using the action categories provided by HMDB51 dataset (Table 2).

Figure 8.

Confusion matrix for HMDB51 dataset.

Action classScene content precision (%)Meta descriptors precision(%)
Facial expression7525
Facial action7823
Body movement8637
Body movement with object interaction9743
Body movement with human interaction9743

Table 2.

HMDB 51 action classification.

The confusion matrix is also shown in Figure 8 to illustrate action classes prone to misclassification.

For robotics applications, facial expression recognition and gesture recognitions are also very important. Reference [138] provides a good review of facial expression recognition.

1.7 Trends in cognitive robotics

Early approaches to imitation aim to reproduce reaching or grasping with simple grippers. Imitation learning provides a desired sequencing of basic sub-skills to achieve an observed task behaviour. Later on, more sophisticated system including modules for visual attention, speech recognition, and integration of visual and linguistic inputs for instructing robots to grasp everyday objects [139]. Online learning and machine learning techniques, such neural networks, have been used in low-level and reactive tasks from trajectory learning and adaptive control of multi-DOF robots, and tasks learning from demonstrations. ML provides different paradigms of learning from transfer learning, representation learning curriculum learning, etc, which provides for systematic means of acquiring systematic models for making inferences [140]. The following are some trends that are apparent from the literature review:

  • The use of neuroscience and behavioural psychology to synthesize computational models for high-order cognitive skills in artificial agents.

  • The use of neural networks as functional approximators.

  • Use of motivations or goal-directed mechanisms to balance exploration and exploitation in tasks space rather than in motor space.

  • Use of robotic platforms for research in higher order research (social robots).

  • Use of predictive coding mechanisms to synthesise higher-order cognitive behaviours.

  • Classical control theory is unable to handle complex scenarios with many parameters.

  • Use of swarm robotics to study social behaviours in robotic swarms.

  • The increasing use of networked and cloud robotics and cyber-physical systems.

The use of artificial intelligence, especially machine learning, wireless connectivity and cloud computing, is increasing to integrate physical systems and processes, including robotics. At the core of most ML tasks, decision-making is based on information fed to the decision maker. The study of decision-making is closely connected with psychology and cognitive sciences.

1.8 Success, challenges and research directions

Several projects involving the use of cognitive robotics have been reported in industrial settings (Industry 4.0), service robots, robotic surgery, cardiovascular surgery [70], assistive technology [141] and several other fields. In ref. 70, ML methods are used in perceiving an individual’s health by collecting and interpreting his/her clinical data and would reason to suggest actions to maintain or improve the individual’s cardiovascular health. ML augmented decisions point to potential to improve the outcome at a lower cost of care and increase satisfaction. In assistive technology [141], vision-based hand wheelchair control using kinect sensor system enables the user to control without wearing or touching.

As cognitive robotics continues to make some remarkable progress in industrial process automation with Industry 4.0 initiative, cloud robotics and service robots, it has resulted in more challenges [142, 143, 144]. For example, standardisation effort [56] has ushered in a new era of robotics linked to cyber–physical system for effective control and monitoring of industrial processes. Classical approaches to robotics have made significant progress in control-based applications in stand-alone robotic applications, but there are challenges in multi-robot and multi-agent systems applied to complex tasks in dynamic environments.

The main goal of integrating thinking, knowing and feeling in an artificial intelligent system as cognitive process has not been realised today despite advances [57]. In particular, integrating feeling into the existing system has proved very challenging.

The trends towards Industry 4.0 of providing cyber–physical framework for unifying industrial processes and cyber–physical system would be extended to service robots domain as well. The need to develop more robust and sophisticated ML algorithms to enable AI agents to carry out complex tasks in a coordinated and cooperative fashion to ensure reliability and cost-effectiveness. The robustness of ML algorithms under adversarial learning would also have to be investigated.

The need is for more research into the decision-making process (using ML) to make it robust, timely and relevant to situation, as well as meet real-time requirements. For multi-robot systems, the need for cooperation and coordination of tasks is very challenging to improve the effectiveness and improved utilisation of resources. Underlying these problems is the need for research into more robust ML algorithms and transparent model interpretation, and guarantees against adversarial attacks [145].

The need is for cost-effective management of resources (computing, network, storage and devices), all interconnected for ambient intelligence. The problem of scheduling, recovering from unexpected events and scalability issues require urgent attention. Similarly, the integration of heterogeneous platforms (software and hardware) into processes is required. Investigations into robust and generic processing architecture for social robotics are another area worthy of investigation.

Investigations into protocols to ensure effective and robust cooperation between humans and robots via human-machine interfaces to ensure trust and autonomy, as well as ethical considerations, ought to be investigated.

To meet privacy and security concerns distributed learning [146, 147] approaches to train models on the cloud keeping data localized and apply privacy-preserving analysis. However, this has raised the issues of network latency and model consistencies, which has been proven very challenging. Approaches to solving the challenges include MEC-based training, federated learning and capsule network for internet of vehicles. Other persistent challenges are latency, security and management of network infrastructure. Autonomic systems [148] seem very attractive for managing problems related to network and computing infrastructure.

1.9 Conclusion

The chapter has presented a review of recent development in ML techniques for cognitive robotic systems in the overall context of artificial intelligence. The main algorithms for learning, namely, reinforcement and imitation learning techniques, have been discussed.

The recent initiative in Industry 4.0 initiative, increasing trend in research in service robots, telemedicine and computer-assisted medical delivery system means that the industry holds lots of promise for research and personal applications.

Several processing architectures, as well as software frameworks for integrating heterogeneous hardware and software components, have also been presented. Towards simulating further research, current trends and research issues have also been highlighted. An example scenario involving action recognition of humans and facial expression has also been presented.

References

  1. 1. Hacker M. Humanoid Robots: Human-like Machines. Vienna, Austria; 2007. pp. 367-396
  2. 2. Jurgen J. A bottom-up integration of vision and actions to create cognitive humanoids. In: Samani H, editor. Cognitive Robotics. Boca Raton, FL: CRC; 2015. pp. 191-214
  3. 3. Ciria A, Schillaci G, Pezzulo G, Hafner VV, Lara B. Predictive processing in cognitive Robotics: A review, 2021
  4. 4. Kambayashi Y, Yajima H, Shiyoji T, Oikawa R, Takimoto M. Formation control of swarm robots using mobile agents. Vietnam Journal of Computer Science 2019;6(2):193-222
  5. 5. Schillaci G, Hafner V, Lara B. Exploration behaviors, body representations, and simulation processes for the development of cognition in artificial agents. Frontiers in Robotics and AI;3:39
  6. 6. Alami R, Chatila R, Fleury S, Ghallab M, Ingrand F. An architecture for autonomy. International Journal of Robotics Research (Special Issue on Integrated Architecture for Robot Control and Programming). 1998;17:315-337
  7. 7. Sun B, Saenko K. From virtual to reality: Fast adaptation of virtual object detectors to real domains. BMVC. 2014;1
  8. 8. Tobin et al. Domain randomization for transferring deep neural networks from simulation to the real world, March 2017
  9. 9. Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion. Maximizing for domain invariance. 2014
  10. 10. Tzeng E, Devin C, Hoffman J. et al. Adpating deep visuomotor representations with pairwise constraints. 2017
  11. 11. Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research. 2016;17:1-40
  12. 12. Guido S. Sensorimotor Learning and Simulation of Experience as a Basis for the Development of Cognition in Robotics. Germany: Humboldt University of Berlin; 2013
  13. 13. Hayamizu S, Hasegawa O, Itou K, Yoshimura T, Akiba T, Asoh H, Kurita T, Sakaue K. Multimodal interaction systems that integrates speech and visual information. Bulletin of the Electrotechnical Laboratory 2000;64(4-5):37-44
  14. 14. Steels L, Kaplan F. Aibos first words. The social learning of language and meaning. Evolution Communication. 2001;4(1):3-21
  15. 15. Iwahashi N. Language acquisition through a human-robot interface by combining speech, visual, and behaviour information. Information Science. 2003;156:109-121
  16. 16. Brooks RA. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation. 1986;RA-2:1
  17. 17. Agostini A, Torra C, Worgotter F. Efficient interactive decision-making framework for robotic applications. Artificial Intelligence 2017;247:187-212
  18. 18. Yeon ASA, Visvanathan R, Mamdah SM, Kamarudin K, Kamarusin LM, Zakaria A. Implementation of behavior based robot with sense of smell and sight. In: 2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS 2015). 2015. pp. 119-125
  19. 19. Zucker M, Ratliff N, Stolle M, et al. Optimization and learning for rough terrain legged locomotion. The International Journal of Robotics Research;30(2):175-191
  20. 20. Schillaci G, Ciria A, Lara B. Tracking emotions: Intrinsic motivation grounded on multi-level prediction error dynamics. In: Proceedings of the 10th Joint International Conference on Development and Learning and Epigenetic Robotics (IEEE ICDL-EpiRob 2020). 2020
  21. 21. Pio-Lopez L, Ange N, Fristorn K, Pezzulo G. Active inference and robot control: A case study. Journal of Royal Society Interface. 2016;12:616
  22. 22. Buckley C, Kim CS, Mcgregor S, Seth AK. The free energy principle for action and perception: A mathematical review. Journal of Mathematical Psychology;81:55-79
  23. 23. Lara B, Astorga D, Mendoza-Bock E, Pardo M, Escobar E, Ciria A. Embedded Cognitive robotics and the learning of sensorimotor schemes. Adaptive Behaviour;26(5):225-238
  24. 24. Pickering M, Clark A. Getting ahead: Forward models and their place in cognitive architecture. Trends in Cognitive Sciences;18(9):451-454
  25. 25. Lanillos P, Cheng G. Adpative robot body learning and estimation through predictive coding. In: Proceedings 2018 IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS). 2018. pp. 4083-4090
  26. 26. Lanillos P, Cheng G et al. Robot self/other distinction: Active inference meets neural networks in a mirror. 2004
  27. 27. Asada M, Hosoad K, Kuniyoshi Y, et al. Cognitive developmental robotics: A survey. IEEE Transactions on Autonomous Mental Development. 2009;1(1):12-34
  28. 28. Pfeifer R, Lida F, Gomez G. Morphological computation for adaptive behaviour and cognition. International Congress Series. 2006;1291:22-29
  29. 29. Mcgeer T. Passie walking with knees. In: Proc. 1990 IEEE Int. Conf. Robot Autom. 1990
  30. 30. Sumioka H, Yoshikawa Y, Asada M. Development of joint attention related actions based on reproducing contingency. In: Proceedings of 7th International Conference on Developmental Learning. 2008
  31. 31. Hashimoto T, Senda M, Kobayashi H. Realization of realistic and rich facial expressions by face robot. In: Proceedings of 2004 IEEE Techn. Exhib. Based Conf. Robot Autom. 2004. pp. 37-38
  32. 32. Matsui D, Minato T, MacDorman KF, Ishiguro H. Generating natural motion in an android by mapping human motion. In: Proceedings IEEE/RSJ Int. Conf. Intell. Robots Sys. 2005. pp. 1089-1096
  33. 33. Tobin J, Fang A, Scheider R, Zaremba W, Abbeel P. Domain randomization for transferring deep neural networks from simulation to real world. 2017
  34. 34. Kawamua K, Brown W. Cognitive robotics’ Chapter in Springer Encyclopedia of Complexity and System Science. Springer Science; 2010. pp. 1109-1126
  35. 35. Melta G, Fitzpatrick P, Natale T. YARP: Yet Another Robot Platform. International Journal of Advanced Robotic Systems, Special Issue on Software Development and Integration in Robotics. 2006;3(1)
  36. 36. Frank M, Leitner J, Stollenga M, Harding S, Forster A, Schmidhuber J. The modular behavioural envirnment for humanoids and other robots (MoBeE). In: Proceedings of the International Conference on Informatics in Control, Automation & Robotics (ICINCO). 2012
  37. 37. Stollenga M, Pape L, Frank M, Leitner J, Forster A, Schmidhuber J. Task-relevant roadmaps: A framework for humanoid motion planning. In: Proceedings of the International Conference on Intelligent Robotics and Systems (IROS). 2013
  38. 38. Leitner et al. A modular software framework for hand-eye coordination in humanoid robots. Frontiers in Robitics and AI. 2016;2016:1-16
  39. 39. Courtney et al Cognitive systems platforms using open source. 2009
  40. 40. Correa M, Hemosilla G, Verschae R, Ruiz-del-solar J. Human detection and identification by robots using therma and visual information in domestic environments. Journal of Intelligent Robotic Systems;66:223-243
  41. 41. Cheraghi AR, Shahzad S, Graffi K. Past, present, and future of swarm robotics. 2021
  42. 42. Baranes A, Oudeyer P-Y. Intrinsically motivated goal exploration for active motor learning in robots: A case study. In: Proc: IEEE/RSJ International Conference on Intelligent Robots and Systems. 2010. pp. 1766-1173. DOI: 10.1109/IROS.2010.5651385
  43. 43. Sun R. The importance of cognitive architecture: An analysis based on CLARION. Journal of Experimental and Theoretical Artificial Intelligence. 2007;19(2):159-193
  44. 44. Sun R. Anatomy of the Mind. Oxford University Press; 2016
  45. 45. John E. The Soar Cognitive Architecture. MIT Press; 2012. p. 390
  46. 46. Demiris Y. Predicition of Intent in Robotics and Multi-agent systems. Cognitive Processing. 2007;8:151-158
  47. 47. Demiris Y, Khadhouri B. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems. 2006;54:361-369
  48. 48. Vahrenkamp N, Wachter M, Krohnert M, Welke K, Asfour T. The robot software framework Armarx. Information Technology. 2015;57(2):99-111
  49. 49. Meta G et al. The iCub humanoid robot: An open-systems platform for research in cognitive development. Neural Networks. 2010
  50. 50. Unity Technologies. 2019 Available: https://unity.com
  51. 51. Juliani A, Berges V-P, Teng E, et al. Unity: A general platform for intelligent agents. 2020
  52. 52. Bellemare MG, Naddaf Y, Veness J, Bowling M. The arcade learning environment: An evaluation platform for general agents. In: Proc. International Joint. Conference on Artificial Intelligence. 2015. pp. 253-279
  53. 53. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, et al. Asynchronous methods for deep reinforcement learning. In: Proc. Int. Conf. Learning Representation. 2016
  54. 54. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998
  55. 55. Kehoe et al. A survey of research on cloud robotics and automation. IEEE Transactions on Automation Science and Engineering. 2015;12(2):398-409
  56. 56. Groshev M et al. Toward Intellignt Cyber-Physical Systems: Digital twin meets artificial Intelligence. IEEE Communications Magazine. 2021;59(8):14-20
  57. 57. Gutierrez-Garcia J. and O. Lopez-Neri. Cognitive computing: A brief survey and open research challenges. In Proceedings of 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on computational Science and Intelligence, Japan, 2015.
  58. 58. Brasil L et al. Hybrid expert systems for decision support in the medical area: Complexity and cognitive computing. International Journal of Medical Informatics. 2001;63(11):19-30
  59. 59. Wang Y. Towards the synergy of cognitive informatics, neural informatics, brain informatics, and cognitive computing. In: Cognitive Information for Revealing Human Cognition: Knowledge Manipulations in Natural Intelligence. First ed. Hershe, PA, USA: IGI Global; 2012. pp. 159-177
  60. 60. Cote et al. Prototyping cognitive models with MARIE. In: Proceedings on (IEEE/RSJ 2008 International Conference on Intelligent Robots and Systems), IROS Workshop on Current Software Frameworks in Cognitive Robotics Integrating Different Computational Paradigms. Nice, France; 2008
  61. 61. Cote C, Letournrau D, Raievsky C, Michaud F. Robotic software integration using MARIE. International Journal of Advanced Robotic Systems. 2006;3(1):55-60
  62. 62. Robot operating system (ROS) https://en.wikipedia.org/wiki/Robot_Operating_System
  63. 63. ROS 2 for Realtime applications. https://discourse.ros.org/t/ros2-for-real-time-applications/6493. ROS.org. Open Robotics, 17 October 2018
  64. 64. Nao ROS Wiki. http://www.ros.org/wiki/naoROS.org. Open Robotics, 28 October 2013
  65. 65. Pereira A, Bastos GS. ROSRemote, using ROS on cloud to access robots remotely. In: Proceedings of the 2017 IEEE 18th International Conference on Advanced Robotics (ICAR). Hongkong, China; 2017
  66. 66. Arumugan R et al. Davinci: A cloud computing framework for service robots. In: Proceedings of the 2010 IEEE International Conference on Robotics and Automation (ICRA). Anchorage, AK, USA; 2010. pp. 3084-3089
  67. 67. Multi-Access Edge Computing (MEC): Framework and reference architecture; https://www.etsi.org/deliver/etsi gs/MEC/001099/003/02.02.0160/gsmec003v020201p.pdf
  68. 68. Borsatti et al. Enabling industrial IOT as a service with multi-access edge computing. IEEE Communication Magazine. 2021;59(8):21-27
  69. 69. Kahneman D. Thinking Fast and Slow. first ed. Farrar, Straus and Giroux; 2011
  70. 70. Sanchez-Martinez M et al. Machine learning for clinical decision making: Challenges and opportunities in cardiovascular imaging. Frontiers in Cardiovascular Medicine. 2022
  71. 71. Aoki et al. Human-Robot cooperation for autonomous vehicles and human drivers: Challenges and solutions. IEEE Communications Magazine. 2022;59(8):36-41
  72. 72. Schillaci G, Villapando AP, Hafner VV, Hanaper P, Colliaux D, Wintz T. Intrinsic motivation and episodic memories for robots exploration of high-dimensional sensory spaces. Adaptive Behaviour. 2020;29(6):549-566
  73. 73. Baranes A, Oudeyer P. R-IAC: Robust intrinsically motivated exploration and active learning. IEEE Transactions on Autonomous Mental Development, IEEE. 2009;1(3):155-169. DOI: 10.1109/TAMD.2009.2037513
  74. 74. Lee K, Ognibene D, Chang HJ, Kim TK, Demiris Y. STARE: Spatio-Temporal attention relocation for multiple structured activities detection. IEEE Transactions on image processing. 2015;24(12):5916-5927
  75. 75. Demiris Y. Prediction of intent in robotics and multi-agent systems. Cognitive Processing. 2007;8:151-158
  76. 76. Mcleland J, McNaughton L. Why there are complementary learning systems in the hippocampus and Neocortex: Insights from the success and failures of connectionist models of learning and memory. Psychological Review. 1995;102(3):419-457
  77. 77. Bellman R. On the theory of dynamic programming. Proceedings of the National Academy Science. 1952;38(8):716-719
  78. 78. Arulkumaran K, Peter M, et al. Deep reinforcement learning: A brief survey. IEEE Signal Processing. 2017;34(6):28-38
  79. 79. Bilard A, Calinan S, Dillman RR, et al. Robot Programming by Demonstration. Springer. pp. 1371-1391
  80. 80. Tai L, Zhang J, Liu M et al. A survey of deep network solutions for learning control in robotics: From reinforcement learning to imitation. 2016
  81. 81. Schad T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. In: Proc. Int. Conf. Learning Representations. 2016
  82. 82. Hasselt HV. Double Q-learning. In: Proc. Neural Information Processing Systems. 2010. pp. 2613-2621
  83. 83. Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. In: Proc. Int. Conf. Learning Representations. 2016
  84. 84. Mnih V, Silver D, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529-533
  85. 85. Kaelbling LP, Littman ML, Cassandra AR. Planning and acting in partially observable stochastic domains. Artificial Intelligence. 1998;101(1):99-134
  86. 86. Nachum O, Nourouzi M, Xu K, Schuurmans D. Bridging the gap between value and policy based reinforcement learning
  87. 87. Ho J, Ermon S. Generative adversarial imitation learning. 2016
  88. 88. Schmerling M, Schillaci G, Hafner V. Goal-directed learning of hand-eye coordination in a humanoid robot. In: Proceeding of 5th International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EpiRob). 2015
  89. 89. Hussein A, Gaber MM, Elgan E, Jayne C. Imitation learning: A survey of learning methods. ACM Computing Surveys. 2017;50(2):35
  90. 90. Argall B, Browning B, Veloso M. Learning by demonstration with critique from a human teacher. In: Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, SCM. 2007. pp. 57-64
  91. 91. Bitzer S, Vijayakumar S. Latent spaces for dynamic movement primitives. In: Proc. 9th IEEE-RAS International Conference on Humanoid Robots (Humanoids ’09). 2009
  92. 92. Bagnell JA. An invitation to mitiation. Pittsbrgh, PA: Carnegie-Mellon University; 2015
  93. 93. Abbeel P and Ng A. Y. Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML), 2004
  94. 94. Baraness A, Ouder P-Y. Active learning of inverse models with intrinsically motivated goal exploration in robots. 2013
  95. 95. Daume H, Langford J, Marco D. Search-based structured prediction. Machine Learning. 2009;75:297
  96. 96. Duan Y, et al. One-shot imitation learning. In: Advances in Neural Information Processing Systems. 2017. pp. 1087-1098
  97. 97. Guo Y, Liu Y, Oerlemans A, Laos S, Wu S, Lew MS. Deep learning for visual understanding: A review. NeuroComputing. 2016;187:27-48
  98. 98. Gu S, Holy E, Lillicrap T, Levine S. Deep reinforcement learning for robotic manipulation with asynchronous off policy updates. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA). 2017. pp. 3386-3396
  99. 99. Gupta A, Eppner C, Levine S, Abbeel P. Learning dexterous manipulations for a soft robotic hand from human demonstrations. In: Proceedings of IEEE/RSI International Conference on Intelligent Robots and Systems (IROS). 2016. pp. 3786-3793
  100. 100. Zhang J, Springenberg JT, Boedecker J, Burgard W. Deep reinforcement learning with successor features for navigation across similar environments. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2017. pp. 2371-2378
  101. 101. Chen Y, Everett M, Liu M, How JP. Socially aware motion planning with deep reinforcement learning
  102. 102. Tai L, Paolo G, Liu M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for maples navigation. In: Proceedings of IEEE?RSJ International Conference on Intelligent Robots and Systems (IROS). 2017. pp. 31-36
  103. 103. Zhang J, Tai L, Boedecker J, Burgard W, Liu M. Neural SLAM
  104. 104. Khan A, Zhang C, Atanasov N, Karydis K, Kumar V, and Lee D. Memory augmented control networks
  105. 105. Parisotto E, Salakhutdinov R. Neural map: Structured memory for deep reinforcement learning
  106. 106. Tzeng F, Hoffman J, Zhang N, Saenko K, and Darrel T. Deep domain confusion: Maximizing for domain invariance
  107. 107. Chen L-C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs. In: Proceedings of International Conference on Learning Representations (ICLR). 2015
  108. 108. Fritzpatrick P, Metta G, Natale L, Rao S. Learning about objects through action-initial steps towards artificial cognition. In: Proceedings of International Conference on Robbotics and Automation (ICRA ’03). Taipei, Taiwan. pp. 3140-3145
  109. 109. Masci J, Meier U, Cirecsan D, Schmidhuber J. Stacked convolutional auto-encoders for hierarchical feature extraction. In: Proceedings of International Conference on Artificial Neural Networked. Springer; 2011. pp. 52-59
  110. 110. Kanitscheider I, Fiete I. Training recurrent networks to generate hypothesis about how the brain solves hard navigation problems. Advances in Neural Information Processing Systems:4532-4541
  111. 111. Taylor M, Stone P. Cross-domain transfer for reinforcement learning. In: Proc. 24th International Conference on Machine Learning (ICML’07). 2007. pp. 879-886
  112. 112. Wang et al. Dueling network architectures for Deep reinforcement learning. In: Proc. 33rd International Conference on Machine Learning (ICML’16). 2016. pp. 1995-2003
  113. 113. Radford R, Luke M. Unsupervised representation learning with deep convolutional generative adversarial networks.2016
  114. 114. Ran et al. Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors. 2017;17:1341
  115. 115. Coates A, Ng AY. Learning Feature Representations with k-means. Springer; 2012. pp. 561-580
  116. 116. Schillaci G et al. Intrinsic motivation and episodic memories for robot exploration of high-dimensional sensory spaces. 2020
  117. 117. Chen R, Jin Y. A social learning particle swarm optimization algorithm for scalable optimization. Information Science. 2015;291:43-60
  118. 118. Rahmatizadeh R, Abolghasemi P, Bolani L. Learning manipulation trajectories using recurrent neural networks. 2016
  119. 119. Yamada T, Murata S, Aric H, Ogata T. Dynamic integration of language and behavior in a recurrent neural network for human-robot interaction. Frontiers in Neurobotics. 2016
  120. 120. Molina-Leal A et al. Trajectory planning for mobile robot in a dynamic environment using LSTN neural network. Applied Science. 2021;11(22):10689
  121. 121. Redmon J, Angelova A. Real-time grasp detection using convolutional neural networks, 2015
  122. 122. Levine S et al. Learning hand-eye coordination for robotic grasping with deep learning and large scale data collection. 2016
  123. 123. Li C, Lowe R, Ziemke T. Humanoids learning to walk: A natural CPG-actor critic architecture. Frontiers in Neurobiotics. 2013
  124. 124. Calinion S, Li S, Alizadeh T, Tsagarakis G. Statistical dynamical systems for skills acquisition in humanoids. In: Proc. 2012 IEEE-RAS International Conference on Humanoids Robots (Humanoids ’12). pp. 232-329
  125. 125. Triesch J, Wirghardt J, Mael E. Towards imitation learning of Grasping movements by an autonomous robot. In: Proc. of the Interantional Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction (GW ‘99). 1999. pp. 73-84
  126. 126. Wisspeinter T, Van der Zant T, Ioccchi I, Schiffers S. Robocupmome: Scientific competition as benchmarking for domestic robots. Interaction Studies. 2009;930:392-426
  127. 127. RoboCupHome Official Website. Available on December 2010
  128. 128. Vrigkas M, Christophorous N, Ioannis A. A Review of human activity recognition methods. Frontiers in Robotics and AI. 2015;2:28
  129. 129. Kuehne H, Jhuang H, Garrote E, Poggio T, Sierre T. A large video database for human motion recognition. In: Proceedings of the IEEE International Conference on Computer Vision. 2011
  130. 130. Andzi-Quainoo TT, Mike LR. A bank of classifiers for robust object modeling in wavelet domain. In: Proceedings of IEEE International Conference on Industrial Technology. Busan, South Korea; 2014
  131. 131. Lee H, Alexis B, Rajat R, Ng Andrew Y. Efficient sparse coding algorithms. In: Advances in Neural Information Processing (NIPS). 2007
  132. 132. Aharon M, Elad M, Bruckstein AM. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representations. IEEE Transactions on Signal Processing. 2006;54(11):4311-4322
  133. 133. Golub GH, Reinsch C. Singular value decomposition and least squares solutions. Numerical Mathematics. 1970;14:403-420
  134. 134. Hardle HW, Leopold S. Applied Multivariate Statistical Analysis. Berline, Heidelberg: Springer; 2007. pp. 321-330
  135. 135. Kailath T. The dirvergence and Bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technologies. 1967;15(1):54-60
  136. 136. Swain M, Ballard DH. Color indexing. International Journal of Computer Vision. 1991;7(1):11-32
  137. 137. Cheng-Liu L. One-versus_all training of prototype classifier for pattern classification and retrieval. In: Proceedings of 2010 20th International Conference on Pattern Recognition. 2010. pp. 3328-3331
  138. 138. Pramerdorf C and Kampel M. Facial expression recognition using convolutional neural networks: State of the art. 2016.
  139. 139. Steil J, Wersing H. Recent trends in online learning for cognitive robotics. In: Proceedings of ESANN ‘2006 –European Symposium on Artificial Neural networks. Bruges, Belgium; 2006
  140. 140. Tani J. Learning to generate articulated behavior through bottom-up and top-down interaction processes. Neural Computation. 2003;16(1):11-23
  141. 141. Trigueiros P, Ribeiro F. Vision-based hand wheelchair control. In: Proc. of 12th International Conference on Autonomous Robot Systems and Competitions (Robotics 2012). Guimaraes, Portugal; 2012. pp. 39-43
  142. 142. Saha O, Dasgupta P. A comprehensive survey of recent trends in cloud robotics architectures and Applications. Robotics. 2018;7:47
  143. 143. ISO 8373. (en). Robots and robotic devices-vocabulary: ISO/TC299. 2012. https://www.iso.org/obp/ui/#iso:std:iso:8373:ed-2:v1:en [Accessed: [November 2020]
  144. 144. Fong T, Noorbakhsh I, Dautenhahn K. A survey of socially interactive robots. Robotics and Autonomous Systems;42:143-166
  145. 145. Modas A, Sanchez-Matilla R, Frossard P, Cavallaro A. Toward robust sensing for autonomous vehicles, an adversarial perspective. IEEE Signal Processing Magazine. 2020;47:14-24
  146. 146. Xu et al. Capsule network distributed learning with multi-access edge computing for internet of vehicles. IEEE Communications Magazine. 2021;59(8):52-57
  147. 147. Li L et al. A survey on federated learning. In: 2020 IEEE International Conference on Control & Automation (ICCA). 2020. pp. 791-796
  148. 148. Maidana RG et al. Autonomic computing towards resource management in embedded mobile robots. In: Proceedings of 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE). 2019. pp. 192-197

Written By

Thomas Tawiah

Submitted: 16 May 2022 Reviewed: 17 August 2022 Published: 25 October 2022