Skip to main content

Research on Target Trajectory Planning Method of Humanoid Manipulators Based on Reinforcement Learning

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2023)

Abstract

The goal of most asymmetrically coordinated manipulative tasks of humanoid manipulators is multilevel. For example, a bottle cap screwing task is composed of several sub-objectives, such as reaching, grasping, aligning, and screwing. In addition, the flexible interaction requirements of dual-arm robots challenge the trajectory planning methods of manipulator with high dimensional and strong coupling characteristics. However, the traditional reinforcement learning algorithms cannot quickly learn and generate the required trajectories above. Based on the idea of multi-agent control, a dual-agent deep deterministic policy gradient algorithm is proposed in this paper, which uses two agents to simultaneously plan the coordinated trajectory of the left arm and the right arm online. This algorithm solves the problem of online trajectory planning for multi-objective tasks of humanoid manipulators. The design of observations and actions in the dual-agent structure can reduce the dimension and decouple the humanoid manipulators’ trajectory planning problem to a certain extent, thus speeding up the learning speed. Moreover, a reward function is constructed to realize the coordinated control between the two agents, to promote dual-agent to generate continuous trajectories for multi-objective tasks. Finally, the effectiveness of the proposed algorithm is verified in Baxter multi-objective task simulation environment under the Gym. The results show that this algorithm can quickly learn and online plan the coordinated trajectory of humanoid manipulators for multi-objective tasks.

Supported in part by the National Natural Science Foundation of China (U2013602, 52075115, 51521003, 61911530250), National Key R &D Program of China (2020YFB13134), Self-Planned Task (SKLRS202001B, SKLRS202110B) of State Key Laboratory of Robotics and System (HIT), Shenzhen Science and Technology Research and Development Foundation (JCYJ20190813171009236), and Basic Scientific Research of Technology (JCKY2020603C009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vahrenkamp, N., Asfour, T., Dillmann, R.: Simultaneous grasp and motion planning: humanoid robot ARMAR-III. IEEE Rob. Autom. Mag. 19(2), 43–57 (2012)

    Article  Google Scholar 

  2. Fang, C., Rocchi, A., Hoffman, E.M., Tsagarakis, N.G., Caldwell, D.G: Efficient self-collision avoidance based on focus of interest for humanoid robots. In: 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pp. 1060–1066. IEEE, Seoul, Korea (South) (2015)

    Google Scholar 

  3. Park, H.A., Lee, C.S. George: extended cooperative task space for manipulation tasks of humanoid robots. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 6088–6093. IEEE, Seattle, WA, USA (South) (2015)

    Google Scholar 

  4. Giftthaler, M., Farshidian, F., Sandy, T., Stadelmann, L., Buchli, J.: Efficient kinematic planning for mobile manipulators with non-holonomic constraints using optimal control. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3411–3417. IEEE, Singapore (2017)

    Google Scholar 

  5. Casalino, A., Massarenti, N., Zanchettin, A.M., Rocco, P.: Predicting the human behaviour in human-robot co-assemblies: an approach based on suffix trees. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11108–11114. IEEE, Las Vegas, NV, USA (2020)

    Google Scholar 

  6. Lentini, G., Grioli, G., Catalano, M.G. Bicchi, A.: Robot programming without coding. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7576–7582. IEEE, Paris, France (2020)

    Google Scholar 

  7. Mronga, D., Kirchner, F.: Learning context-adaptive task constraints for robotic manipulation. Rob. Auton. Syst. 141, 103779 (2021)

    Article  Google Scholar 

  8. Sasabuchi, K., Wake, N., Ikeuchi, K.: Task-oriented motion mapping on robots of various configuration using body role division. IEEE Rob. Autom. Lett. 6(2), 413–420 (2021)

    Article  Google Scholar 

  9. Kim, H., Ohmura, Y., Kuniyoshi, Y.: Transformer-based deep imitation learning for dual-arm robot manipulation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8965–8972. IEEE, Prague, Czech Republic (2021)

    Google Scholar 

  10. Ye, D., Liu, Z., Sun, M., Shi, B., Zhao, P.: Mastering complex control in moba games with deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6672–6679 (2020)

    Google Scholar 

  11. Sallab, A.E.L., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Elect. Imaging 29(19), 70–76 (2017)

    Article  Google Scholar 

  12. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on machine learning, pp. 1329–1338. PMLR, New York, USA (2016)

    Google Scholar 

  13. Cuayáhuitl, H., Yu, S., Williamson, A., Carse, J.: Scaling up deep reinforcement learning for multi-domain dialogue systems. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3339–3346. IEEE, Anchorage, AK, USA (2017)

    Google Scholar 

  14. Lillicrap, T.P: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  15. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  16. Mnih, V., et al: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR, New York, USA (2016)

    Google Scholar 

  17. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy eep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR, New York, USA (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liang, K., Zha, F., Sheng, W., Guo, W., Wang, P., Sun, L. (2023). Research on Target Trajectory Planning Method of Humanoid Manipulators Based on Reinforcement Learning. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14270. Springer, Singapore. https://doi.org/10.1007/978-981-99-6492-5_39

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6492-5_39

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6491-8

  • Online ISBN: 978-981-99-6492-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics