Multi-agent differential game based cooperative synchronization control using a data-driven method

Shi, Yu; Hua, Yongzhao; Yu, Jianglong; Dong, Xiwang; Ren, Zhang

doi:10.1631/FITEE.2200001

Multi-agent differential game based cooperative synchronization control using a data-driven method

基于多智能体微分博弈的数据驱动协同一致控制

Published: 25 July 2022

Volume 23, pages 1043–1056, (2022)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Yu Shi (石宇) ORCID: orcid.org/0000-0001-8618-7395¹,
Yongzhao Hua (化永朝)²,
Jianglong Yu (于江龙)¹,
Xiwang Dong (董希旺) ORCID: orcid.org/0000-0002-4778-248X^1,2 &
…
Zhang Ren (任章)¹

312 Accesses
5 Citations
Explore all metrics

Abstract

This paper studies the multi-agent differential game based problem and its application to cooperative synchronization control. A systematized formulation and analysis method for the multi-agent differential game is proposed and a data-driven methodology based on the reinforcement learning (RL) technique is given. First, it is pointed out that typical distributed controllers may not necessarily lead to global Nash equilibrium of the differential game in general cases because of the coupling of networked interactions. Second, to this end, an alternative local Nash solution is derived by defining the best response concept, while the problem is decomposed into local differential games. An off-policy RL algorithm using neighboring interactive data is constructed to update the controller without requiring a system model, while the stability and robustness properties are proved. Third, to further tackle the dilemma, another differential game configuration is investigated based on modified coupling index functions. The distributed solution can achieve global Nash equilibrium in contrast to the previous case while guaranteeing the stability. An equivalent parallel RL method is constructed corresponding to this Nash solution. Finally, the effectiveness of the learning process and the stability of synchronization control are illustrated in simulation results.

摘要

本文研究了多智能体微分博弈问题及其在协同一致控制中的应用。提出系统化的多智能体微分博弈构建和分析方法, 同时给出一种基于强化学习技术的数据驱动方法。首先论证了由于网络交互的耦合特性, 典型的分布式控制器无法充分保证微分博弈的全局纳什均衡。其次通过定义最优对策的概念, 将问题分解为局部微分博弈问题, 并给出局部纳什均衡解。构造了一种无需系统模型信息的离轨策略强化学习算法, 利用在线邻居交互数据对控制器进行优化更新, 并证明控制器的稳定性和鲁棒性。进一步提出一种基于改进耦合指标函数的微分博弈模型及其等效的强化学习求解方法。与现有研究相比, 该模型解决了多智能体所需信息的耦合问题, 并实现分布式框架下全局纳什均衡和稳定控制。构造了与此纳什解对应的等价并行强化学习方法。最后, 仿真结果验证了学习过程的有效性和一致控制的稳定性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Coordination of Multiple Agents in a Class of Differential Games Through a Generalized Linear Reward Scheme

Optimal Distributed Synchronization Control for Heterogeneous Multi-agent Graphical Games

Distributed fixed-time optimization for multi-agent systems over a directed network

Article 06 January 2021

References

Abouheaf MI, Lewis FL, Vamvoudakis KG, et al., 2014. Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica, 50(12):3038–3053. https://doi.org/10.1016/j.automatica.2014.10.047
Article MathSciNet Google Scholar
Başar T, Olsder GJ, 1982. Dynamic Noncooperative Game Theory. Academic Press, New York, USA.
MATH Google Scholar
Dong XW, Xi JX, Lu G, et al., 2014. Formation control for high-order linear time-invariant multiagent systems with time delays. IEEE Trans Contr Netw Syst, 1(3): 232–240. https://doi.org/10.1109/TCNS.2014.2337972
Article MathSciNet Google Scholar
Lewis FL, Vrabie DL, Syrmos VL, 2012. Optimal Control. John Wiley & Sons, Hoboken, NJ, USA.
Book Google Scholar
Li JN, Modares H, Chai TY, et al., 2017. Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans Neur Netw Learn Syst, 28(10):2434–2445. https://doi.org/10.1109/TNNLS.2016.2609500
Article MathSciNet Google Scholar
Liu MS, Wan Y, Lopez VG, et al., 2021. Differential graphical game with distributed global Nash solution. IEEE Trans Contr Netw Syst, 8(3):1371–1382. https://doi.org/10.1109/TCNS.2021.3065654
Article MathSciNet Google Scholar
Lopez VG, Lewis FL, Wan Y, et al., 2020. Stability and robustness analysis of minmax solutions for differential graphical games. Automatica, 121:109177. https://doi.org/10.1016/j.automatica.2020.109177
Article MathSciNet Google Scholar
Modares H, Lewis FL, 2014. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Contr, 59(11):3051–3056. https://doi.org/10.1109/TAC.2014.2317301
Article MathSciNet Google Scholar
Modares H, Lewis FL, Jiang ZP, 2015. H_∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neur Netw Learn Syst, 26(10):2550–2562. https://doi.org/10.1109/TNNLS.2015.2441749
Article MathSciNet Google Scholar
Mu CX, Zhen N, Sun CY, et al., 2017. Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans Cybern, 47(6):1460–1470. https://doi.org/10.1109/TCYB.2016.2548941
Article Google Scholar
Olfati-Saber R, Murray RM, 2004. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Contr, 49(9):1520–1533. https://doi.org/10.1109/TAC.2004.834113
Article MathSciNet Google Scholar
Peng QY, Low SH, 2018. Distributed optimal power flow algorithm for radial networks, I: balanced single phase case. IEEE Trans Smart Grid, 9(1):111–121. https://doi.org/10.1109/TSG.2016.2546305
Article Google Scholar
Qian YY, Liu MS, Wan Y, et al., 2021. Distributed adaptive Nash equilibrium solution for differential graphical games. IEEE Trans Cybern, early access. https://doi.org/10.1109/TCYB.2021.3114749
Qin JH, Gao HJ, Zheng WX, 2011. Second-order consensus for multi-agent systems with switching topology and communication delay. Syst Contr Lett, 60(6):390–397. https://doi.org/10.1016/j.sysconle.2011.03.004
Article MathSciNet Google Scholar
Ren W, Beard RW, 2005. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans Autom Contr, 50(5):655–661. https://doi.org/10.1109/TAC.2005.846556
Article MathSciNet Google Scholar
Sun C, Ye MJ, Hu GQ, 2017. Distributed time-varying quadratic optimization for multiple agents under undirected graphs. IEEE Trans Autom Contr, 62(7):3687–3694. https://doi.org/10.1109/TAC.2017.2673240
Article MathSciNet Google Scholar
Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Cambridge, MA, USA.
MATH Google Scholar
Tamimi A, Lewis FL, Abu-Khalaf M, 2008. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern B Cybern, 38(4):943–949. https://doi.org/10.1109/TSMCB.2008.926614
Article Google Scholar
Vamvoudakis KG, Lewis FL, 2011. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 47(8):1556–1569. https://doi.org/10.1016/j.automatica.2011.03.005
Article MathSciNet Google Scholar
Vamvoudakis KG, Lewis FL, Hudas GR, 2012. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 48(8):1598–1611. https://doi.org/10.1016/j.automatica.2012.05.074
Article MathSciNet Google Scholar
Wang MY, Wang ZJ, Talbot J, et al., 2021. Game-theoretic planning for self-driving cars in multivehicle competitive scenarios. IEEE Trans Robot, 37(4):1313–1325. https://doi.org/10.1109/TRO.2020.3047521
Article Google Scholar
Wang W, Chen X, Fu H, et al., 2020. Model-free distributed consensus control based on actor-critic framework for discrete-time nonlinear multiagent systems. IEEE Trans Syst Man Cybern Syst, 50(11):4123–4134. https://doi.org/10.1109/tsmc.2018.2883801
Article Google Scholar
Wen GH, Yu XH, Liu ZW, 2021. Recent progress on the study of distributed economic dispatch in smart grid: an overview. Front Inform Technol Electron Eng, 22(1):25–39. https://doi.org/10.1631/FITEE.2000205
Article Google Scholar
Yang T, Yi XL, Wu JF, et al., 2019. A survey of distributed optimization. Ann Rev Contr, 47:278–305. https://doi.org/10.1016/j.arcontrol.2019.05.006
Article MathSciNet Google Scholar
Yang YJ, Wan Y, Zhu JH, et al., 2021. H_∞ tracking control for linear discrete-time systems: model-free Q-learning designs. IEEE Contr Syst Lett, 5(1):175–180. https://doi.org/10.1109/LCSYS.2020.3001241
Article MathSciNet Google Scholar
Ye MJ, Hu GQ, Lewis FL, 2018. Nash equilibrium seeking for N-coalition noncooperative games. Automatica, 95:266–272. https://doi.org/10.1016/j.automatica.2018.05.020
Article MathSciNet Google Scholar
Ye MJ, Hu GQ, Lewis FL, et al., 2019. A unified strategy for solution seeking in graphical N-coalition noncooperative games. IEEE Trans Autom Contr, 64(11):4645–4652. https://doi.org/10.1109/TAC.2019.2901820
Article MathSciNet Google Scholar
Zhang HG, Jiang H, Luo YH, et al., 2017. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 64(5):4091–4100. https://doi.org/10.1109/TIE.2016.2542134
Article Google Scholar
Zhao DB, Xia ZP, Wang D, 2015. Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng, 12(4):1461–1468. https://doi.org/10.1109/TASE.2014.2348991
Article Google Scholar
Zhao JG, 2020. Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning. Neurocomputing, 412:167–176. https://doi.org/10.1016/j.neucom.2020.06.083
Article Google Scholar
Zheng WY, Wu WC, Zhang BM, et al., 2016. A fully distributed reactive power optimization and control method for active distribution networks. IEEE Trans Smart Grid, 7(2):1021–1033. https://doi.org/10.1109/TSG.2015.2396493
Google Scholar
Zhu QY, Başar T, 2015. Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Contr Syst, 35(1):46–65. https://doi.org/10.1109/MCS.2014.2364710
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, China
Yu Shi (石宇), Jianglong Yu (于江龙), Xiwang Dong (董希旺) & Zhang Ren (任章)
Institute of Artificial Intelligence, Beihang University, Beijing, 100191, China
Yongzhao Hua (化永朝) & Xiwang Dong (董希旺)

Authors

Yu Shi (石宇)
View author publications
You can also search for this author in PubMed Google Scholar
Yongzhao Hua (化永朝)
View author publications
You can also search for this author in PubMed Google Scholar
Jianglong Yu (于江龙)
View author publications
You can also search for this author in PubMed Google Scholar
Xiwang Dong (董希旺)
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Ren (任章)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiwang Dong (董希旺).

Additional information

Project supported by the Science and Technology Innovation 2030, China (No. 2020AAA0108200), the National Natural Science Foundation of China (Nos. 61873011, 61973013, 61922008, and 61803014), the Defense Industrial Technology Development Program, China (No. JCKY2019601C106), the Innovation Zone Project, China (No. 18-163-00-TS-001-001-34), the Foundation Strengthening Program Technology Field Fund, China (No. 2019-JCJQ-JJ-243), and the Fund from the Key Laboratory of Dependable Service Computing in Cyber Physical Society, China (No. CPSDSC202001)

Contributors

Yu SHI designed the research, conducted the simulations, and drafted the paper. Yongzhao HUA and Jianglong YU helped organize the paper. Xiwang DONG and Zhang REN revised and finalized the paper.

Compliance with ethics guidelines

Yu SHI, Yongzhao HUA, Jianglong YU, Xiwang DONG, and Zhang REN declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, Y., Hua, Y., Yu, J. et al. Multi-agent differential game based cooperative synchronization control using a data-driven method. Front Inform Technol Electron Eng 23, 1043–1056 (2022). https://doi.org/10.1631/FITEE.2200001

Download citation

Received: 03 January 2022
Accepted: 21 April 2022
Published: 25 July 2022
Issue Date: July 2022
DOI: https://doi.org/10.1631/FITEE.2200001

Key words

CLC number

TP273

关键词

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent differential game based cooperative synchronization control using a data-driven method

Abstract

摘要

Access this article

Similar content being viewed by others

Dynamic Coordination of Multiple Agents in a Class of Differential Games Through a Generalized Linear Reward Scheme

Optimal Distributed Synchronization Control for Heterogeneous Multi-agent Graphical Games

Distributed fixed-time optimization for multi-agent systems over a directed network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Contributors

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Key words

CLC number

关键词

Navigation

Multi-agent differential game based cooperative synchronization control using a data-driven method

Abstract

摘要

Access this article

Similar content being viewed by others

Dynamic Coordination of Multiple Agents in a Class of Differential Games Through a Generalized Linear Reward Scheme

Optimal Distributed Synchronization Control for Heterogeneous Multi-agent Graphical Games

Distributed fixed-time optimization for multi-agent systems over a directed network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Contributors

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

关键词

Search

Navigation