Elsevier

Signal Processing

Volume 108, March 2015, Pages 136-146
Signal Processing

Whole-body humanoid robot imitation with pose similarity evaluation

https://doi.org/10.1016/j.sigpro.2014.08.030Get rights and content

Highlights

  • Whole-body imitation of humanoid robot with balance control.

  • Exploration of the shared structure between human motion space and robot motion space.

  • Quantitative pose similarity evaluation of the imitation performance.

Abstract

Imitation is considered to be a kind of social learning that allows the transfer of information, actions, behaviors, etc. Whereas current robots are unable to perform as many tasks as human, it is a natural way for them to learn by imitations, just as human does. With the humanoid robots being more intelligent, the field of robot imitation has getting noticeable advance.

In this paper, we focus on the pose imitation between a human and a humanoid robot and learning a similarity metric between human pose and robot pose. In contrast to recent approaches that capture human data using expensive motion captures or only imitate the upper body movements, our framework adopts a Kinect instead and can deal with complex, whole body motions by keeping both single pose balance and pose sequence balance. Meanwhile, different from previous work that employs subjective evaluation, we propose a pose similarity metric based on the shared structure of the motion spaces of human and robot. The qualitative and quantitative experimental results demonstrate a satisfactory imitation performance and indicate that the proposed pose similarity metric is discriminative.

Introduction

With the development of robotics, robots are getting much smarter than they used to, especially for humanoid robots. However, they are not ready to perform many tasks as naturally as human beings. Imitation is considered as an effective solution to the problem. Specifically, imitation is an advanced behavior whereby an individual observes and replicates the behaviors of others. Robots have replaced humans in the assistance of performing repetitive and dangerous tasks in some fields, such as construction industry, medical surgery, toxic substances cleaning and space exploration, where they can take advantage of imitating human to some degree.

Imitation is about generating stable humanoid movements from the human motions, an overview and computational approaches to this problem can be found in [1]. Many of the imitation researches focus on the upper body and employ complex system setting. In [2], an analytical method was proposed to transfer the upper body motion from human to humanoid robot. Riley et al. [3] made use of some colored marks on a human upper body in order to be abstracted by a vision system based on external cameras and a head-mounted one of a humanoid. These marks were used to estimate the angular range of some joints with a kinematic model of the human to perform the imitation. Similar to [3], with the help of 34 markers placed on the human upper body and 2 markers attached on a conductor stick, Ott et al. [4] applied the data obtained by a motion capture system to allow a humanoid robot to mimic the human motion regarding to a Cartesian control approach. Aleotti et al. [5] adopted neural networks to learn a mapping between the positions of a human arm and an industrial robot arm. Based on Aleotti׳s work, Stanton et al. [6] extended it to a humanoid robot by training a feed forward neural network with particle swarm optimization for each degree of freedom (DOF). In the data collecting process, a robot was used to lead a human operator through series of paired synchronized movements captured by a motion capture, which was time-consuming and tedious. As they were mentioned, in order to ensure robot stability, the position of the robot׳s ankles did not employ neural networks. Since the neural networks could not always output ideal angles, the robot, as a rigid body, was apt to lose its balance. Meanwhile, a unified neural network training for the whole body was infeasible, considering convergence trouble. Whereas training with separate networks would cause correlation loss among the DOFs. Other imitation researches are mainly dedicated to humanoid gait or walking movements [7], [8], [9]. In conclusion, existing works have the following limitations:

  • Imitation of the upper body or a single part is insufficient to meet the needs of humanoid robot [2], [3], [4], [5].

  • With requiring motion capture equipment, it is expensive and inconvenient for general use and unnatural for human–robot interaction [5], [6], [10].

  • Lack of balance control and the whole body control [3], [4], [6].

  • The imitation results are not qualitatively evaluated [6], [10], [11].

After performing the pose imitation, another important issue is “how can we evaluate the imitation similarity between a robot slave and the master”. In [11], Zuher et al. gave a subjective evaluation by taking persons to mark the quality of an imitation with bad, poor, fair, good and excellent. Other existing research efforts are basically concentrated on the pose similarity of a single agent. The simplest metric is L 2 distance, which does not sufficiently utilize the data dependency between DOFs. In [12], different weights were learned for DOFs, in correspondence with the fact that some DOFs had more influences on determining the similarity. Chen et al. [13] proposed a new rich pose feature set to effectively encode the pose similarity by utilizing features on geometric relations among body parts. Based on the pose feature set, a distance metric was learned in a semi-supervised manner. By matching the related DOFs of a robot and a human, we can apply these methods to evaluate the imitation similarity. However, robots and humans are different in DOF dimensions and physical constrains, i.e., they have different motion spaces. It is inappropriate to compare them directly.

The problem we are facing here can be regarded as a metric learning problem. Learning a good distance metric in feature space is crucial in real-world applications. Good distance metrics are important to many computer vision tasks, such as image classification [14], [15], [16], content-based image retrieval [17], [18] and their applications [19], [20]. Many useful algorithms and ideas were proposed in these papers to combine multiple feature sets, such as high-order distance-based multiview stochastic learning (HD-MSL [14]) and semi-supervised multiview distance metric learning (SSM-DML [16]). In our case, we believe that the human poses and the humanoid robot poses have much in common for their highly similar skeleton structures. Their differences depend on the number of DOFs and physical constrains. As a consequence, the shared motion space between the two agents can be a good metric space to study the pose similarity.

This paper proposes a novel humanoid robot imitation framework with pose similarity metric learning between human pose and robot pose, using a consumer camera (the Microsoft Kinect) and a humanoid robot (the Aldebaran Nao H25). The proposed framework summarized in Fig. 1 adopts dynamic balance control with realtime imitation performance. A shared representation of both robot pose and human pose is learned to evaluate the imitation similarity. Both qualitative and quantitative experimental results demonstrate a satisfactory imitation performance and indicate that the proposed pose similarity evaluation is discriminative.

Our main contributions are the following: (a) we propose a novel framework to perform pose imitation on the whole body motions rather than the upper body. (b) We actively keep single pose balance and introduce transient poses to achieve smooth pose sequence balance. (c) We demonstrate how shared structure can provide a quantitative evaluation to define the similarity between a human pose and a robot pose.

Section snippets

Pose representation

The Kinect consists of a RGB camera, a depth sensor and provides 3D human skeleton tracking at 30 frames per second. Based on the position data obtained, we can calculate 20 DOF angles listed in Table 1, which are angles between pairs of related vectors. For example,θHeadPitchH=DV(PosSpine,PosShoulderCenter),DV(PosShoulderCenter,PosHead)where DV stands for the direction vector of two 3D points. Then we get 20 angles in total for a skeleton (or a human pose), denoted as θH={θdH} where d is the

Motion space

As mentioned before, we believe the motion spaces of humans and Nao robots are different and it is inappropriate to compare human pose with robot pose directly. The reasons can be roughly concluded as follows:

  • The bones of humans are pliable while those of Nao robots are not.

  • The weight distributions of the two agents are different.

  • Compare with Nao robots, humans are better at coordinating the whole body to keep balance, thus getting more flexibility.

Here is the question will be asked, that is

Experimental results

The values of parameters used in the experiment are summarized in Table 3. We record a human pose sequence of 1066 frames using one Kinect at a rate of 0.1 s per frame. The Kinect device can provide color frames, depth frames and skeleton frames in ordinary scenes. The 3D joint positions of the skeleton data are used in our framework as an input, i.e., Pos. Considering the movement speed of the Nao robot, we preform the imitation with an interval rather than continuous frames. In the qualitative

Conclusion

In this paper, we propose a novel framework for humanoid robot imitation with pose similarity metric learning. DOF angles are used to represent poses. Given a human pose, we adopt the related angles as the target pose of a Nao robot. Through whole body balance control, the stable pose is achieved. To solve the physical constraints of the Nao robot, we apply three transient poses to the original pose transfer, thus making some failure cases feasible. To be further, a latent structure model is

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China (61170142), by the National Key Technology R&D Program under Grant 2011BAG05B04, and by the Program of International S&T Cooperation (2013DFG12840).

References (27)

  • H. Lee et al.

    Efficient sparse coding algorithms

    Adv. Neural Inf. Process. Syst.

    (2007)
  • M. Lopes, F. Melo, L. Montesano, J. Santos-Victor, Abstraction levels for robotic imitation: overview and computational...
  • A.R. Ibrahim et al.

    Analytical upper body human motion transfer to naohumanoid robot

    Int. J. Electr. Eng. Inf.

    (2012)
  • M. Riley, A. Ude, K. Wade, C.G. Atkeson, Enabling real-time full-body imitation: a natural way of transferring human...
  • C. Ott, D. Lee, Y. Nakamura, Motion capture based human motion recognition and imitation by direct marker control, in:...
  • J. Aleotti, A. Skoglund, T. Duckett, Position teaching of a robot arm by demonstration with a wearable input device,...
  • C. Stanton, A. Bogdanovych, E. Ratanasena, Teleoperation of a humanoid robot using full-body motion capture, example...
  • S. Wehner et al.

    Humanoid gait optimization based on human data

    Automat.: J. Control Meas. Electron. Comput. Commun.

    (2011)
  • T. Sugihara, Y. Nakamura, H. Inoue, Real-time humanoid motion generation through zmp manipulation based on inverted...
  • B. Stephens, C. Atkeson, Modeling and control of periodic humanoid balance using the linear biped model, in: Ninth...
  • J. Koenemann, M. Bennewitz, Whole-body imitation of human motions with a nao humanoid, in: 2012 Seventh ACM/IEEE...
  • F. Zuher, R. Romero, Recognition of human motions for imitation and control of a humanoid robot, in: Robotics Symposium...
  • T. Harada, S. Taoka, T. Mori, T. Sato, Quantitative evaluation method for pose and motion similarity based on human...
  • Cited by (52)

    • Control framework for collaborative robot using imitation learning-based teleoperation from human digital twin to robot digital twin

      2022, Mechatronics
      Citation Excerpt :

      However, imitation learning requires similarity criteria between humans and robot poses. Lei et al. [7] considered four features (hip roll angle, hip pitch angle, knee pitch angle, and ankle pitch angle) to check the similarities between human poses and poses of a humanoid robot. Their research checked the difference between two objects, and all the sum of the differences was compared with a predefined constant.

    • Query-aware sparse coding for web multi-video summarization

      2019, Information Sciences
      Citation Excerpt :

      In this way, only those frames related to the query can be chosen as keyframes. Spare coding has been widely used in data representation [15,19,35,41]. There are several methods that formulate the single video summarization as a sparse coding problem.

    • Analysis and use of fuzzy intelligent technique for navigation of humanoid robot in obstacle prone zone

      2018, Defence Technology
      Citation Excerpt :

      Several researchers [25–28] have attempted to design the control architecture of mobile robot path planning in complex environments and validated the efficiency through proper simulation and experimental platforms. Lei et al. [29] have analysed the pose imitation between a humanoid robot and human. They have evaluated the imitation analysis between human and humanoid robot using pose similarity metric based analysis.

    • Real-Time Imitation of Human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach

      2023, 11th RSI International Conference on Robotics and Mechatronics, ICRoM 2023
    • Evaluating Simple Exercises with a Fuzzy System Based on Human Skeleton Poses

      2023, IEEE International Conference on Fuzzy Systems
    View all citing articles on Scopus
    View full text