计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 268-277.doi: 10.11896/jsjkx.230500113

• 人工智能 • 上一篇    下一篇

基于DQN的多智能体深度强化学习运动规划方法

史殿习1,2, 彭滢璇2,3, 杨焕焕2,3, 欧阳倩滢1,2, 张玉晖2, 郝锋1   

  1. 1 智能博弈与决策实验室 北京100091
    2 天津(滨海)人工智能创新中心 天津300457
    3 国防科技大学计算机学院 长沙410073
  • 收稿日期:2023-05-17 修回日期:2023-11-03 出版日期:2024-02-15 发布日期:2024-02-22
  • 通讯作者: 郝锋(no1haofeng@163.com)
  • 作者简介:(dxshi@nudt.edu.cn)
  • 基金资助:
    科技部科技创新2030-重大项目(2020AAA0104802);国家自然科学基金(91948303)

DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning

SHI Dianxi1,2, PENG Yingxuan2,3, YANG Huanhuan2,3, OUYANG Qianying1,2, ZHANG Yuhui2, HAO Feng1   

  1. 1 Intelligent Game and Decision Lab(IGDL),Beijing 100091,China
    2 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
    3 College of Computer,National University of Defense Technology,Changsha 410073,China
  • Received:2023-05-17 Revised:2023-11-03 Online:2024-02-15 Published:2024-02-22
  • About author:SHI Dianxi,born in 1966,Ph.D,professor,Ph.D supervisor.His main research interests include artificial intelligence,robot operating system,distributed computing and cloud computing.HAO Feng,born in 1977,master,asso-ciate professor.His main research in-terests include artificial intelligence,mechanical and electronic engineering and computer applications.
  • Supported by:
    Science and Technology Innovation 2030-Major Project(2020AAA0104802) and National Natural Science Foundation of China(91948303).

摘要: DQN方法作为经典的基于价值的深度强化学习方法,在多智能体运动规划等领域得到了广泛应用。然而,DQN方法面临一系列挑战,例如,DQN会过高估计Q值,计算Q值较为复杂,神经网络没有历史记忆能力,使用ε-greedy策略进行探索效率较低等。针对这些问题,提出了一种基于DQN的多智能体深度强化学习运动规划方法,该方法可以帮助智能体学习到高效稳定的运动规划策略,无碰撞地到达目标点。首先,在DQN方法的基础上,提出了基于Dueling的Q值计算优化机制,将Q值的计算方式改进为计算状态值和优势函数值,并根据当前正在更新的Q值网络的参数选择最优动作,使得Q值的计算更加简单准确;其次,提出了基于GRU的记忆机制,引入了GRU模块,使得网络可以捕捉时序信息,具有处理智能体历史信息的能力;最后,提出了基于噪声的有效探索机制,通过引入参数化的噪声,改变了DQN中的探索方式,提高了智能体的探索效率,使得多智能体系统达到探索-利用的平衡状态。在PyBullet仿真平台的6种不同的仿真场景中进行了测试,实验结果表明,所提方法可以使多智能体团队进行高效协作,无碰撞地到达各自目标点,且策略训练过程稳定。

关键词: 多智能体系统, 运动规划, 深度强化学习, DQN方法

Abstract: DQN as a classical value-based deep reinforcement learning method,has been widely used in the field of multi-agent motion planning.However,there are a series of challenges in DQN,such as,DQN can overestimate Q values,calculating Q values is more complicated,neural networks have no historical memory capability,using ε-greedy strategy for exploration is less efficient.To address these problems,a DQN-based multi-agent deep reinforcement learning motion planning method is proposed,which can help the agents learn an efficient and stable motion planning strategy,so as to reach the target points without collision.Firstly,based on the DQN method,an optimization mechanism for Q value calculation based on Dueling is proposed,which improves the calculation of Q value to calculate the state value and the advantage function value,and selects the optimal action based on the parameters of the Q value network that is currently being updated,making the calculation of Q value simpler and more accurate.Secondly,a memory mechanism based on GRU is proposed,and a GRU module is introduced,which enables the network to capture the temporal information and has the ability to process the historical information of the agents.Thirdly,an effective exploration mechanism based on noise is proposed,which changes the exploration mode in DQN by introducing parameterized noise,improves the exploration efficiency of the agents,and makes the multi-agent system reach the exploration-utilization equilibrium state.It is tested on PyBullet simulation platform in six different simulation scenarios,and the results show that the proposed method can enable multi-agent teams to collaborate efficiently and reach their respective target points without collision,and the strategy training process is more stable.

Key words: Multi-agent system, Motion planning, Deep reinforcement learning, DQN

中图分类号: 

  • TP391
[1]HILDEBRANDT A C,KLISCHAT M,WAHRMANN D,et al.RealTime Path Planning in Unknown Environments for Bipedal Robots[J].IEEE Robotics and Automation Letters,2017,2(4):1856-1863.
[2]HOLTE R C,PEREZ M B,ZIMMER R M,et al.HierarchicalA*:Searching abstraction hierarchies efficiently[C]//AAAI.1996:530-535.
[3]DORIGO M,MANIEZZO V,COLORNI A.The ant system:An autocatalytic optimizingprocess[J].Clustering,1991,3(12):340.
[4]KHATIB O.Real-time obstacle avoidance system for manipulators and mobile robots[C]//Proceedings of the 1985 IEEE International Conference on Robotics and Automation.1985:25-28.
[5]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atariwith deep reinforcement learning[J].arXiv:1312.5602,2013.
[6]PU Z Q,YI J Q,LIU Z,et al.A review of collaborative know-ledge and Data driven swarm intelligent decision making [J].Acta Automatica Sinica,2022,48(3):1-17.
[7]SHARON G,STERN R,FELNER A,et al.Conflict-basedsearch for optimal multi-agent pathfinding[J].Artificial Intelligence,2015,219:40-66.
[8]FOX D,BURGARD W,THRUN S.The Dynamic Window Approach to Collision Avoidance[J].IEEE Robotics & Automation Magazine,2002,4(1):23-33.
[9]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multiagent Systems.Cham:Springer,2017:66-83.
[10]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:A survey[C]//2006 9th International Conference on Control,Automation,Robotics and Vision.IEEE,2006:1-6.
[11]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797.
[12]WANG W,YANG T,LIU Y,et al.From few to more:Large-scale dynamic multiagent curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(5):7293-7300.
[13]WANG D,DENG H,PAN Z.Mrcdrl:Multi-robot coordination with deep reinforcement learning[J].Neurocomputing,2020,406:68-76.
[14]WANG D,DENG H.Multirobot coordination with deep rein-forcement learning in complex environments[J].Expert Systems with Applications,2021,180:115128.
[15]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning[J].ar-Xiv:1706.05296,2017.
[16]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[17]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016.
[18]PENG Y X,SHI D X,YANG H H,et al.Motion planningMethod for Multi-agent Deep Reinforcement Learning Based on Intention [J].Computer Science,2023,50(10):156-164.
[19]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003.
[20]HAUSKNECHT M,STONE P.Deep recurrent q-learning forpartially observable mdps[C]//2015 AAAI Fall Symposium Series.2015.
[21]YAO S,CHEN G,PAN L,et al.Multi-robot collision avoidance with map-based deep reinforcement learning[C]//2020 IEEE 32nd International Conference on Tools with Artificial Intelligence(ICTAI).IEEE,2020:532-539.
[22]SUKHBAATAR S,FERGUS R.Learning multiagent communication with backpropagation[J].arXiv:1605.07736,2016.
[23]LIU Y,WANG W,HU Y,et al.Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7211-7218.
[24]MAHAJAN A,RASHID T,SAMVELYAN M,et al.Maven:Multi-agent variational exploration[J].arXiv:1910.07483,2019.
[25]WU J,SUN X,ZENG A,et al.Spatial intention maps for multi-agent mobile manipulation[C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:8749-8756.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!