Gesture Learning by Imitation Architecture for a Social Robot

Learning by imitation allows people to teach social robots new tasks using natural and intuitive interaction channels. Vision is the main of these channels. This chapter describes a learning-by-imitation architecture that uses stereo vision to perceive, recognize, learn, and imitate social gestures. This description is based on the identification of a set of generic components, which can be found in any learning by imitation architecture. It highlights the main contribution of the proposed architecture: the use of an inner human model to help perceiving, recognizing and learning human gestures. This allows different robots to share the same perceptual and knowledge modules. Experimental results show that the proposed architecture is able to meet the requirements of learning by imitation scenarios. It can also be integrated in complete software structures for social robots, which involve complex attention mechanisms and decision layers.


INTRODUCTION
Robots have been massively used in industrial environments for the last fifty years.Industrial robots are designed to perform repetitive, predictable tasks, but are not able to easily adapt or learn new behaviours (Craig, 1986).In order to execute their programmed tasks, they have to sense only a constrained set of environmental parameters, thus perceptual systems mounted on industrial robots are simple, practical and taskoriented.On the other hand, they are designed to work in environments in which human presence is limited and controlled, if allowed.Thus, while their usefulness is evident, industrial robots are strongly limited.In order to remove these limitations, a new generation of robots began to appear more than thirty-five years ago (Inoue, Tachi, Nakamura, Hirai, Ohyu, Hirai, Tanie Yokoi & Hirukawa, 2001).These robots were designed to cooperate with people in everyday activities, to adapt to uncontrolled environments and new tasks, and to become engaging companions for people to interact with.They usually benefit from sharing human perceptual and motor capabilities, and thus the term humanoid robot was used to name these agents.In the last decade, however, the difficulties of creating robots that resemble human beings have favoured the use of the more generic term social robot.Thus, today it is accepted that, although humanoid robots are certainly designed to be social, social robots do not need to be humanoid.
According to an early definition of social robot (Dautenhahn & Billard, 1999) social robots are agents designed to be part of a heterogeneous group.They should be able to recognize, explicitly communicate with and learn from other individuals in this group.They also possess history (i.e. they sense and interpret their environment in terms of their own experience).While this is a generic definition, in practice social robots are designed to work in human societies.Thus, later definitions of social robots present them as agents that have to interact with people (Breazeal, Brooks, Gray, Hancher, McBean, Stiehl & Strickon 2003).In this chapter the same ideas are followed, and social robots are understood as "robots that work in real social environments, and that are able to perceive, interact with and learn from other individuals, being these individuals people or other social agents" (Bandera, 2010, pp.9).
Social robots have different options to achieve learning.Individual learning mechanisms (e.g.trial-and-error, imprinting, classical conditioning, etc.) are one of these options.However, their application to a social robot may lead it to learn incorrect, disturbing or even dangerous behaviours.Thus, they should be restricted to specific scenarios and tasks (e.g.games based on controlled stigmergy) (Breazeal et al., 2003;Bandera, 2010).Social learning mechanisms are a different option, which allows the human teacher to supervise the learning process avoiding most issues of individual learning.Among different social learning strate-gies, learning by imitation appears as one of the most intuitive and powerful ones.
This chapter describes a RLbI architecture that provides a social robot with the ability to learn and imitate upper-body social gestures.This architecture, that is the main topic of the first author's Thesis (Bandera, 2010), uses an interface based on a pair of stereo cameras, and a model-based perception component to capture human movements from input image data.Perceived human motion is segmented into discrete gestures and represented using features.These features are subsequently employed to recognize and learn gestures.One of the main differences of this proposal with respect to previous approaches is that all these processes are executed in the human motion space, not in the robot motion space.This strategy avoids constraining the perceptual capabilities of the robot due to its physical limitations.It also eases sharing knowledge among different robots.Only if the social robot needs to perform physical imitation, a translation module is used that combines different strategies to produce valid robot motion from learned human gestures.
The rest of the chapter is organized as follows: A Background section firstly details state-ofthe-art solutions and discusses their advantages and drawbacks.Then, the main corpus of the chapter starts introducing the set of components that can be identified in any RLbI architecture.These components are used to describe the RLbI architecture proposed in this chapter, which is deeply analyzed in the following sections.Thus, the Human Motion Perception section gives a description of the perceptual modules used to capture human gestures.In the Gesture Representation and Recognition section the methods used by the robot to encode and recognize gestures are detailed.The Learning section describes the learning system used to update the repertoire of the robot.As commented above, the differences between the bodies of the human and the robot suggest the use of a translation module to map the movements of the first to the later.The Mo-