• CSCD核心库收录期刊
  • 中文核心期刊
  • 中国科技核心期刊

电力建设 ›› 2024, Vol. 45 ›› Issue (5): 80-93.doi: 10.12204/j.issn.1000-7229.2024.05.009

• 智能电网 • 上一篇    下一篇

考虑区域间辅助奖励的配电网电压优化控制

周祥1(), 李晓露1(), 柳劲松2, 林顺富1   

  1. 1.上海电力大学电气工程学院,上海市 200090
    2.国网上海市电力公司电力科学研究院,上海市 200437
  • 收稿日期:2023-08-25 出版日期:2024-05-01 发布日期:2024-04-29
  • 通讯作者: 李晓露(1971),女,博士,副教授,研究方向为电网调度自动化及电力系统分析与运行等,E-mail:lixiaolu_sh@163.com
  • 作者简介:周祥(1998),男,硕士研究生,研究方向为配电网优化运行与控制,E-mail:zxdxyx1@163.com;
    柳劲松(1971),男,博士,高级工程师,研究方向为配电自动化及配电管理系统等;
    林顺富(1983),男,博士,教授,研究方向为电能质量与智能电网用户端技术等。
  • 基金资助:
    国家自然科学基金项目(51977127)

Voltage Optimization Control of Distribution Networks Considering Inter-Regional Auxiliary Rewards

ZHOU Xiang1(), LI Xiaolu1(), LIU Jinsong2, LIN Shunfu1   

  1. 1. College of Electrical Engineering, Shanghai University of Electric Power, Shanghai 200090, China
    2. Electric Power Research Institute, State Grid Shanghai Electric Power Company, Shanghai 200437, China
  • Received:2023-08-25 Published:2024-05-01 Online:2024-04-29
  • Supported by:
    National Natural Science Foundation of China(51977127)

摘要:

智能软开关能够有效解决分布式光伏大规模接入配电网引起的电压波动问题,但会导致区域间协作程度加深,而现阶段使用多智能体深度强化学习算法进行电压优化时,各智能体仅使用各自区域内的奖励进行训练,导致智能体缺乏协同,输出策略难以保证最优性。为此提出考虑区域间辅助奖励的配电网电压优化方法,首先建立基于多智能体深度强化学习的多时间尺度电压优化框架,其次针对控制智能软开关的智能体,将各自区域内奖励定义为主奖励,邻近区域内奖励定义为辅助奖励,然后通过主、辅助奖励损失函数关于网络参数梯度的数量积分析辅助奖励对训练的有利程度,并采用演化博弈方法自适应修改辅助奖励参与因子;最后,在改进的IEEE 33节点系统验证了所提方法能够稳定智能体训练过程,提升智能体策略的优化效果。

关键词: 多智能体深度强化学习, 电压优化, 辅助奖励, 演化博弈, 参与因子

Abstract:

A soft open point can effectively solve the voltage fluctuation problem caused by the large-scale integration of distributed photovoltaics into a power distribution network. However, this can lead to increased collaboration between regions. Currently, when using multi-agent deep reinforcement learning algorithms for voltage optimization, each agent uses only rewards within its own region for training, resulting in a lack of coordination among agents and difficulty in guaranteeing the optimality of the output strategies. To address this problem, a method for voltage optimization in distribution networks that considers inter-regional auxiliary rewards was proposed. First, a multi-agent deep reinforcement learning framework based on multiple timescales was established for voltage optimization. Second, for agents controlling the soft open points, the rewards within their respective regions were defined as primary rewards, whereas the rewards from neighboring regions are defined as auxiliary rewards. The beneficial effect of auxiliary rewards on training was analyzed using the dot product of the primary and auxiliary reward loss functions with respect to the network parameter gradients. An adaptive modification of the auxiliary reward participation factor is implemented using an evolutionary game approach. Finally, the proposed method is validated in an improved IEEE 33 node system, which demonstrates stable training processes and improves strategy optimization for the agents.

Key words: multi-agent deep reinforcement learning, voltage optimization, auxiliary rewards, evolutionary game, participation factor

中图分类号: