Elsevier

Neural Networks

Volume 129, September 2020, Pages 323-333
Neural Networks

2020 Special Issue
Energy-efficient and damage-recovery slithering gait design for a snake-like robot based on reinforcement learning and inverse reinforcement learning

https://doi.org/10.1016/j.neunet.2020.05.029Get rights and content

Abstract

Similar to real snakes in nature, the flexible trunks of snake-like robots enhance their movement capabilities and adaptabilities in diverse environments. However, this flexibility corresponds to a complex control task involving highly redundant degrees of freedom, where traditional model-based methods usually fail to propel the robots energy-efficiently and adaptively to unforeseeable joint damage. In this work, we present an approach for designing an energy-efficient and damage-recovery slithering gait for a snake-like robot using the reinforcement learning (RL) algorithm and the inverse reinforcement learning (IRL) algorithm. Specifically, we first present an RL-based controller for generating locomotion gaits at a wide range of velocities, which is trained using the proximal policy optimization (PPO) algorithm. Then, by taking the RL-based controller as an expert and collecting trajectories from it, we train an IRL-based controller using the adversarial inverse reinforcement learning (AIRL) algorithm. For the purpose of comparison, a traditional parameterized gait controller is presented as the baseline and the parameter sets are optimized using the grid search and Bayesian optimization algorithm. Based on the analysis of the simulation results, we first demonstrate that this RL-based controller exhibits very natural and adaptive movements, which are also substantially more energy-efficient than the gaits generated by the parameterized controller. We then demonstrate that the IRL-based controller cannot only exhibit similar performances as the RL-based controller, but can also recover from the unpredictable damage body joints and still outperform the model-based controller, which has an undamaged body, in terms of energy efficiency. Videos can be viewed at https://videoviewsite.wixsite.com/rlsnake.

Introduction

Snake-like robots, as a class of hyper-redundant mechanisms, have the potential to be one kind of promising mobile robotic application that is capable of traveling and performing tasks in diverse environments, such as disaster rescue, underwater exploration, and industrial inspection (Liljebäck, Pettersen, Stavdahl, & Gravdahl, 2012). Since snake-like robots can only carry limited energy resources and usually suffer unpredictable damage during field operations, it is important to develop both energy-efficient and adaptive gaits to reduce the impact of the power constraints and increase the survival from any non-lethal hardware failures affecting the robots (e.g. one or several broken joints). On one hand, optimizing the power consumption can prolong the service time of a robot and maximize its locomotion performance at the same time. An efficient energy system may in return allow us to design a more lightweight robot or add other functional components (Tesch et al., 2009). On the other hand, enabling the robots to be able to complete normal tasks with a damaged body can ensure their usability in a severe environment. However, it is challenging to design both energy-efficient and damage-recovery gaits for snake-like robots due to their redundant degrees of freedom (DOF) and the complex interactions with the environment (Tucker, 1975).

Since the first snake-like robot was built in 1972, researchers have been working constantly on designing more advanced snake-like robots (Bing, Cheng, Huang, et al., 2017a, Liljebäck et al., 2012) and sophisticated gaits for robots with different types of mechanical configurations or for different terrain. Meanwhile, the slithering gait has been considered to be the most promising gait for snake-like robots to perform autonomous locomotion tasks, which imitates the serpentine locomotion of real snakes (Hu, Nirody, Scott, & Shelley, 2009). Hirose first used the serpenoid curve to control a snake-like robot, which was an effective approach by imitating the real snake movement (Hirose, 1993). Ma proposed another model, the serpentine curve, to describe the locomotion of snakes by modeling their muscle characteristics and achieved higher locomotive efficiency than the serpenoid curve by running simulations (Ma, 1999). Inspired by the central pattern generator (CPG), Bing et al. proposed a biological inspired controller for smoothing the gait transition process of a snake-like robot (Bing, Cheng, Chen, et al., 2017a, Bing, Cheng, Huang, Zhou, et al., 2017). On the basis of these snake-like movement curves, the gait equation, a robust and effective method, works as an abstract expression of the gaits of a snake-like robot by describing joint angles as parameterized sinusoidal functions (Tesch et al., 2009). It allows for the emergence of complex behaviors from low-dimensional representations with only a few key parameters, greatly expanding their maneuverability and simplifying user control. Using this method, researchers developed several biological gaits for snake-like robots to move in indoor and outdoor environments (Melo & Paez, 2014).

However, optimizing these parameterized gaits for the purpose of adaptability and saving energy is difficult and limited, since they are confined to those abstracted gait parameters and only a few studies have been reported. Crespi et al. adopted a heuristic optimization algorithm to rapidly adjust the travel speed of the robot (Crespi & Ijspeert, 2008). Tesch et al. used the Bayesian optimization approach to regulate those open-loop gait parameters for a snake robot, which made the robot move faster and more reliably (Tesch, Schneider, & Choset, 2011). Gong et al. proposed a shape basis optimization algorithm to simplify the gait design parameter space and came up with a novel gait in granular materials (Gong, Goldman, & Choset, 2016). Even so, all these studies still focused on optimizing the gait on the basis of the parameterized gait generation system and have very limited effect on further improving gait efficiency. Inadequate research of gait designing methods also makes it difficult to study the damage recover control of snake-like robots. Stoy et al. presented a study in which a snake-like robot was controlled in order to recover from signal loss and be able to continue effective locomotion (Stoy, Shen, & Will, 2002). However, their controller was hand-designed and incapable of adapting to complete the locomotion after the robot suffered unpredictable damage. Mahdavi et al. presented a self-adaptive snake robot that used shape memory alloy as muscles (Mahdavi and Bentley, 2003, Mahdavi and Bentley, 2006). By using an evolutionary algorithm, the robot recovered its moving ability when some of the muscles were deliberately damaged.

This gait design or optimization task, however, corresponds to a complex control problem due to two primary reasons (Liljebäck et al., 2012). The extrinsic challenge comes from the complex dynamic interaction between the ground and the redundant mechanism with many degrees of freedom. Therefore, it is extremely important to model precisely and rapidly. Once the damage to the robot changes the dynamics of the robot, these model-based methods will fail to control the robot. The intrinsic challenge is how to synchronize and coordinate all the body joints to exhibit a proper motion pattern integrally, which is expected to be both robust and efficient. Different configurations of those joints will have a different impact on the performance of the locomotion and also lead to a huge parameter space to explore.

As an emerging technology, reinforcement learning (RL) reveals the nature of the learning process of locomotion in animals that can offer a model-free learning process to master new skills or adapt to diverse environments. There are many agile or complicated motions developed for robotics using RL methods (Duan, Chen, Houthooft, Schulman, & Abbeel, 2016), such as legged robots (Hwangbo et al., 2019), humanoid robots (Peters, Vijayakumar, & Schaal, 2003), and dexterous robotic hands (Rajeswaran et al., 2017). However, RL-based methods still suffer from some drawbacks which limit them from being further applied in more complicated tasks involving unpredictable dynamics. For example, reward shaping remains a significant barrier for multi-objective optimization tasks, in which the overall performance is related to many factors and cannot be explicitly defined or balanced. On the basis of RL, inverse reinforcement learning (IRL) extends the learning by observing the behaviors of an expert and saving the effort for shaping the reward function, which is usually difficult define properly in most robotic applications. Especially when taking advantage of the generative adversarial network (GAN) (Goodfellow et al., 2014), adversarial IRL algorithms find a new solution to both utilize the efficient adversarial formulation and recover a transferable reward function that represents the desirable intentions of the tasks and can adapt in new environments (Fu et al., 2017, Qureshi et al., 2018).

To this end, we aim at exploring new gait design methods to further improve the field operation capabilities of snake-like robots in terms of energy efficiency. Specifically, on the basis of our previous research (Bing, Lemke, Jiang, Huang, & Knoll, 2019), we propose a novel alternative to design the slithering gait of a snake-like robot using RL and IRL technologies. Our main contributions are summarized as follows.

  • First, we define the energy efficiency metrics and introduce the widely-used parameterized slithering gait design method as the baseline, which is optimized by using a grid search method and the Bayesian optimization method in terms of energy efficiency.

  • Second, we propose a gait controller using the state-of-the-art RL algorithm PPO. The learned gait exhibits energy efficiency surprisingly well and similarity to the natural movement of real snakes.

  • Third, we train another IRL-based gait controller by learning from the demonstrations of the RL-based method. The learned gait can recover the robot from a damaged joint and still propel it forward.

  • Last, the results from both the RL-based and IRL-based controllers demonstrate that the learned gaits successfully outperform the parameterized slithering gait in terms of energy efficiency at a range of velocities.

Section snippets

Related work

As our paper discusses the locomotion control of snake-like robots using model-based, RL-based, and IRL-based methods, we will briefly review the state-of-the-art research on these three aspects in the following.

Models and metric definition

In this section, we first introduce the snake-like robot model used for exploring different gaits. Then, we present our energy efficiency metric for comparing different gaits based on our robot model.

Baseline examples

This section provides two baseline examples, where the parameterized gait equation controller is presented to generate the slithering gait for our snake-like robot. By searching a grid of gait parameters with fixed intervals, we try to determine the best energy-efficiency gaits at different velocities that can be acquired by this controller. Then we use the Bayesian optimization algorithm to explore better parameter combinations in the range of the searching grid, since the searching grid is

Proposed RL-based controller

We begin this section by introducing the key ingredients of our reinforcement learning-based controller. Next, the RL network architecture and the training configuration are introduced as well.

Proposed IRL-based controller

This section first introduces the generation of the expert trajectories and then presents the training algorithm adapted from AIRL.

Results and comparisons

In this section, we first describe the performances of the baseline methods. Second, we present the performance of the gaits generated by the RL controller. Third, the results from the IRL controller are discussed. Finally, we compare our gaits to the scripted slithering gaits in terms of energy efficiency.

Conclusion

Designing power-efficient and adaptive gaits for snake-like robots remains a challenging task, since they come with redundant degrees of freedom and have complicated interactions with the environment. In this paper, we present two novel gait design methods based on reinforcement learning for performing energy-efficient gait and inverse reinforcement learning for performing adaptive locomotion. The RL-based gait has shown to have much better energy efficiencies at different travel velocities

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This project/research has received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 785907 (Human Brain Project SGA2) and the Specific Grant Agreement No. 945539 (Human Brain Project SGA3), and the National Natural Science Foundation of China (grant number: 61902442).

References (60)

  • ChatzilygeroudisK. et al.

    Reset-free trial-and-error learning for robot damage recovery

    Robotics and Autonomous Systems

    (2018)
  • DongX. et al.

    Dynamical hyperparameter optimization via deep reinforcement learning in tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2019)
  • LiuP. et al.

    Modelling and analysis of dynamic frictional interactions of vibro-driven capsule systems with viscoelastic property

    European Journal of Mechanics. A. Solids

    (2019)
  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., & Dean, J., et al. (2016). Tensorflow: A system for large-scale...
  • AbbeelP. et al.

    Apprenticeship learning via inverse reinforcement learning

  • BhounsuleP.A. et al.

    Design and control of ranger: an energy-efficient, dynamic walking robot

  • BingZ. et al.

    Towards autonomous locomotion: CPG-based control of smooth 3D slithering gait transition of a snake-like robot

    Bioinspiration & Biomimetics

    (2017)
  • BingZ. et al.

    Towards autonomous locomotion: CPG-based control of smooth 3D slithering gait transition of a snake-like robot

    Bioinspiration & Biomimetics

    (2017)
  • BingZ. et al.

    Towards autonomous locomotion: Slithering gait design of a snake-like robot for target observation and tracking

  • BingZ. et al.

    Towards autonomous locomotion: Slithering gait design of a snake-like robot for target observation and tracking

  • BingZ. et al.

    CPG-based control of smooth transition for body shape and locomotion speed of a snake-like robot

  • BingZ. et al.

    Energy-efficient slithering gait exploration for a snake-like robot based on reinforcement learning

  • BrockmanG. et al.

    OpenAI gym

    (2016)
  • CalandraR. et al.

    Bayesian gait optimization for bipedal locomotion

  • ChernovaS. et al.

    An evolutionary approach to gait learning for four-legged robots

  • CrespiA. et al.

    Online optimization of swimming and crawling in an amphibious snake robot

    IEEE Transactions on Robotics

    (2008)
  • CullyA. et al.

    Robots that can adapt like animals

    Nature

    (2015)
  • DongX. et al.

    Quadruplet network with one-shot learning for fast visual object tracking

    IEEE Transactions on Image Processing

    (2019)
  • Dowling, K. J. (1996). Limbless locomotion: Learning to crawl with a snake robot (Ph.D. thesis). The Robotics...
  • DowlingK.

    Power sources for small robots

    (1997)
  • Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. (2016). Benchmarking deep reinforcement learning for...
  • FuJ. et al.

    Learning robust rewards with adversarial inverse reinforcement learning

    (2017)
  • GongC. et al.

    Simplifying gait design via shape basis optimization

  • GoodfellowI. et al.

    Generative adversarial nets

  • Grande, R., Walsh, T., & How, J. (2014). Sample efficient reinforcement learning with Gaussian processes. In...
  • HiroseS.

    Biologically inspired robots: Snake-like locomotors and manipulators, Vol. 1093

    (1993)
  • HoJ. et al.

    Generative adversarial imitation learning

  • HuD.L. et al.

    The mechanics of slithering locomotion

    Proceedings of the National Academy of Sciences

    (2009)
  • HwangboJ. et al.

    Learning agile and dynamic motor skills for legged robots

    Science Robotics

    (2019)
  • KimM.S. et al.

    Automatic gait optimisation for quadruped robots

  • Cited by (32)

    • Co-optimizing for task performance and energy efficiency in evolvable robots

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Liu et al. (2012) optimized the controller of biped robots to minimize the energy cost of locomotion. Bing et al. (2020) use reinforcement learning to optimize the controller of snake-like robots to generate energy-efficient gaits. Vanderborght et al. (2008) used joint trajectory tracking to reduce the energy consumption in biped robots.

    • AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient

      2022, Information Sciences
      Citation Excerpt :

      Though originating from imitation learning, IRL differs from it in a number of key aspects, particularly in that IRL can be generalized to other environments. As one of the frontier techniques in the field of reinforcement learning, IRL has been applied to diverse fields of learning rewards and decision making, such as intelligent driving [2,3], path planning [4,5], robot decision making [6,7], and target detection [8]. To learn rewards, some state-of-the-art IRL algorithms have been proposed.

    • Review of snake robots in constrained environments

      2021, Robotics and Autonomous Systems
      Citation Excerpt :

      They respectively proposed the local reflexive mechanisms (such as curvature derivative control and Tegotae-based control) of various terrain contour adaptive control methods for snake robots, which is of great significance for the study of the high adaptability of snake robots in constrained environments [32,33,38,218,219]. To generate adaptive motion, some learning algorithms, such as genetic algorithms or intensive learning, can be used on a large scale [220–222]. They can help the robots to collect sensory information, analyse the internal state of the robot and the stimuli around it, and learn the appropriate response [96].

    View all citing articles on Scopus
    View full text