Modeling maturational constraints for learning biped humanoid locomotion

This paper outlines a new developmental approach to motor learning in very high-dimensions, applied to learning biped locomotion in humanoid robots. This approach relies on the formal modeling and coupling of several advanced mechanisms for actively controlling the growth of complexity and harnessing the curse of dimensionality: 1) Adaptive, multi-objective and staged fitness functions; 2) Maturational constraints for the progressive release of new degrees of freedoms; 3) Artificial curiosity; 4) Motor synergies; 5) Morphological computation. An experimental setup involving both the Acroban humanoid robot and a simulated version of the robot is presented.


I. INTRODUCTION
Fundamental in humans, bipedal walking in robots poses a real challenge to Robotics. Indeed, the body and brain of human has evolved over two million years before allowing him to have a bipedal walking as easy and efficient than it is today. It appears that this unique ability in mammals is very difficult to reproduce especially among robots that have no bio-organic materials. The management of gait and balance of biped robots in a versatile environment arises as a major challenges of robotics whatsoever for the understanding of human walking or to design robots that can move with a human-like and robust manner.
Since more than twenty years, several techniques have been proposed to generate the gait needed to biped robots to move. Most humanoids solve the existing problems of bipedal walking using dynamic control based on the calculation of the ZMP, let us mention the famous examples Hondas Asimo or HRP-4 but they are generally heavy and powerful robots, which are actually quite dangerous for the user, sensitive to soil type and who are not really robust to external disturbances. Other robots, smaller, use an open loop primitive motor tuned by engineer consisting of a succession of positions taken by the legs of the robot (NAO, the Qrio ...). These primitives are very efficient but not really robust, if soil type is different from that provided for the walk is very unsafe. In addition, about personal robotics, it is not possible to deliver an engineer with the robot to tune primitive in function on the living environment of the client. Finally, some teams have explored learning techniques so that the robot can be self tuned and adapted to its environment. Nevertheless, learning becomes a challenge when it involves large and high dimensions, which is the case of learning to walk. It is then necessary to reduce the exploration space. One can for example use [2] or [3].
We decided to explore other ways of generating human gait by learning. For this, human and especially the young child inspire us. In terms of control, morphology or learning, the study of walking in humans is a source of fundamental informations and inspiration. The aim of this paper is to present a new way to solve the problem of bipedal locomotion. Developmental learning algorithm based on the work of psychologist who highlighted some existing process in children to explain their ability to learn walking in a robust way.

II. MODEL
Several solutions for learning biped walking have been investigated, already bringing positive results. This paper propose an alternative way to model the learning to walk in the case of humanoid robots with many DOF using developmental constraints .

A. Objectives
Learning new skills in a high dimensional space is still a challenge in robotics.Given the time needed, a random exploration of all dimensions is not conceivable. In the animal kingdom, many species are able to quickly learn new skills involving many joints and sensors. A perfect example is human, especially children; they learn many basic mechanisms necessary for their survival. However they have much more degrees of freedom than any robot, either in number of joints or sensors (the human body contains 230 movable or slightly movable joints and thousand of sensors). However he is able to learn with relative facility new skills and with a much more robust and faster way. There are in humans, physical and mental maturation that allows him not to manage all its mobility at the same time. It learns step by step to control his body. So, our main objective is to propose a model of process discovered in children in order to help the robots to solve the problem of learning to walk in a high dimensional space.

B. Description
We focused on 3 processes existing in humans and which seem particularly interesting to solve some issues learning in high dimensional spaces. 1) Learn following stages: Before walking, children should learn to keep their postural balance then there are 2 stages in learning to walk. [4] suggested that two phase development could be due to the use of two successive learning mechanisms. The first one is integration of posture and movement, lasts between 3 and 6 months depending on the child. The second phase lasts several years until the child develops the pattern of the adult gait, which is expressed by a vertical acceleration of the foot posed reflecting the passage of the fall controlled to propulsion. In the beginning, when the baby takes its first steps, his biomechanical strategy is to minimize the risk of falls. It then makes small steps, spreads feet to keep lateral balance, spends few time on one foot and has a small step rate. During the first 80 months of his life, his approach will improve to become more rapid and effective, the child will take more and more risks as its central nervous system (CNS) becomes mature [1]. Finally, acquiring its adult gait, its objective will be to minimize energy consumption during the walk. Learning objectives change during grow up of the child. At the beginning, he try to minimize head movements and risk of falling. Gradually, he takes more risks to reach distant points as quickly as possible. Finally, he try to minimize energy consumption during his gait. To model this kind of behavior, we do a multi-objectives optimization using 6 fitness functions: f1: Minimization of head movement, f2: Maximize time of balance before falling, f3: Maximize distance done by the center of gravity, f4: Minimization of torso joints torques, f5: Minimization of Forces between hands and trolley, f6: Minimization of legs joints torque The optimization can manage priorities between fitness functions. We can define weights depending of the maturation time for each fitness values to model changes of priorities.
2) CPG Generation: We model each joint CPG by a periodic bezier spline giving the angular position of the joint over time. These splines are generated with 9 parameters: one for the period time, seven for the angular position linearly distributed on the entire period and one for the phase. The existence of these rhythmic oscillators multi-joint integrated units that control the activity of members has now been demonstrated at the spinal level [5].
3) Release degree of freedom: Children gradually become able to master the various degrees of freedom of the articulated body which have to be controlled simultaneously in dynamic equilibrium situations. The release of degrees of freedom is a main point of the learning of walking in children. It allows to reduce dramatically the size of dimensions space. In our model, the robot begins with 4 DoF (knees and Hips) and increase the complexity of its gait by release new DoF, up to 17. The release of DoF follows the principle of curiosity. If the robot is progressing (the derivate of fitness function is negative), the robot continues to learn in his way. When progression stops or is very low, the curiosity of the robot release some degree to explore new space potentially interesting III. EXPERIMENTS Experiments are done using Matlab for the main algorithum. The simulation is done with a new physical simulator Vrep (Virtual Robot Experimentaion Platform). The communication between this two software is done with Urbi which is also responsible for motor control of the robot.
Concerning, the robot, we add a trolley to avoid some problems with balance, and be focus on the learning of a set of CPG in a space with many dimensions able to rapidly and effectively achieve a robust and secure gait.
The robot is a virtual model of Acroban platform, developed in our Laboratory. This article comes with accompanying videos available on www.youtube.com/watch?v=zHbl-ozA h0