Gym-ANM: Open-source software to leverage reinforcement learning for power system management in research and education

Gym-ANM is a Python package that facilitates the design of reinforcement learning (RL) environments that model active network management (ANM) tasks in electricity networks. Here, we describe how to implement new environments and how to write code to interact with pre-existing ones. We also provide an overview of ANM6-Easy, an environment designed to highlight common ANM challenges. Finally, we discuss the potential impact of Gym-ANM on the scientific community, both in terms of research and education. We hope this package will facilitate collaboration between the power system and RL communities in the search for algorithms to control future energy systems.


Introduction
Active network management (ANM) of electricity distribution networks is the process of controlling generators, loads, and storage devices for specific purposes (e.g., minimizing operating costs, keeping voltages and currents within operating limits) [1]. The modernization of distribution networks is taking place with the addition of distributed renewable energy resources and storage devices. This attempt to transition towards sustainable energy systems leaves distribution network operators (DNO) facing many new complex ANM problems (overvoltages, transmission line congestion, voltage coordination, investment issues, etc.) [2].
There is a growing belief that reinforcement learning (RL) algorithms have the potential to tackle these complex ANM challenges more efficiently than traditional optimization methods. This optimism results from the fact that RL approaches have been successfully and extensively applied to a wide range of fields with similarly difficult decision-making problems, including games [3,4,5,6], robotics [7,8,9,10], and autonomous driving [11,12,13].
What games, robotics, and autonomous driving all have in common is that the environment in which the decisions have to be taken can be efficiently replicated using open-source software simulators. In addition, these software libraries usually provide interfaces tailored for writing code for RL research. Hence, the availability of such packages makes it easier for RL researchers to apply their algorithms to decision-making problems in these fields, without needing to first develop a deep understanding of the underlying dynamics of the environments with which their agents interact.
Put simply, we believe that ANM-related problems would benefit from a similar amount of attention from the RL community if open-source software simulators were available to model them and provide a simple interface for writing RL research code. With that in mind, we designed Gym-ANM, an open-source Python package that facilitates the design and the implementation of RL environments that model ANM tasks [14]. Its key features, which differentiate it from traditional power system modeling software (e.g., MATPOWER [15], pandapower [16]), are: arXiv:2105.08846v1 [cs.
LG] 18 May 2021 • Very little background in power system modeling is required, since most of the complex dynamics are abstracted away from the user. • The environments (tasks) built using Gym-ANM follow the OpenAI Gym interface [17], with which a large part of the RL community is already familiar. • The flexibility of Gym-ANM, with its different customizable components, makes it a suitable framework to model a wide range of ANM tasks, from simple ones that can be used for educational purposes, to complex ones designed to conduct advanced research.
Finally, as an example of the type of environment that can be built using Gym-ANM, we also released ANM6-Easy, an environment that highlights common ANM challenges in a 6-bus distribution network.
Both the Gym-ANM framework and the ANM6-Easy environment, including detailed mathematical formulations, were previously introduced in [14]. Here, our goal is to provide a short practical guide to the use of the package and discuss the impact that it may have on the research community.

The Gym-ANM package
The Gym-ANM package was designed to be used for two particular use cases. The first is the design of novel environments (ANM tasks), which requires writing code that simulates generation and demand curves for each device connected to the power grid (Section 2.1). The second use case is the training of RL algorithms on an existing environment (Section 2.2).

Design a Gym-ANM environment
The internal structure of a Gym-ANM environment is shown in Figure 1. At each timestep, the agent passes an action a t to the environment. The latter generates a set of stochastic variables by calling the next vars() function, which are then used along with a t to simulate the distribution network and transition to a new state s t+1 . Finally, the environment outputs an observation vector o t+1 and a reward r t through the observation() and reward() functions.
The core of the power system modeling is abstracted from the user in the next state() call. The grey blocks, next vars() and observation(), are the only components that are fully customizable when designing new Gym-ANM environments.

Use a Gym-ANM environment
A code snippet illustrating how a custom Gym-ANM environment can be used alongside an RL agent implementation is shown in Listing 2. Note that for clarity, this example omits the agent-learning procedure. Because Gym-ANM is built on top of the Gym toolkit [17], all Gym-ANM environments provide the same interface as traditional Gym environments, as described in their online documentation 2 . env = gym . make ( ' MyANMEnv ') # Initialize the environment . obs = env . reset () # Reset the env . and collect o0 . Listing 2: A Python code snippet illustrating environment-agent interactions [14].

Example: the ANM6-Easy environment
ANM6-Easy is the first Gym-ANM environment that we have released [14]. It models a 6-bus network and was engineered so as to highlight some of the most common ANM challenges faced by network operators. A screenshot of the rendering of the environment is shown in Figure 2. Figure 2: The ANM6-Easy Gym-ANM environment, taken from [14].
In order to limit the complexity of the task, the environment was designed to be fully deterministic: both the demand from loads (1: residential area, 3: industrial complex, 5: EV charging garage) and the maximum generation (before curtailment) profiles from the renewable energies (2: solar farm, 4: wind farm) are modelled as fixed 24-hour time series that repeat every day, indefinitely.
More information about the ANM6-Easy environment can be found in the online documentation 3 .

Research and educational impact
Many software applications exist for modeling steady-state power systems in industrial settings, such as Pow-erFactory [18], ERACS [19], ETAP [20], IPSA [21], and PowerWorld [22], all of which require a paid license. In addition, these programs are not well suited to conduct RL research since they do not integrate well with the two programming languages mostly used by the RL community: MATLAB and Python. Among the power system software packages that do not require an additional license and that are compatible with these programming languages, the commonly used in power system management research are MATPOWER (MAT-LAB) [15], PSAT (MATLAB) [23], PYPOWER (Python interface for MATPOWER) [24], and pandapower (Python) [16].
Nevertheless, using the aforementioned software libraries to design RL environments that model ANM tasks is not ideal. First, the user needs to become familiar with the modeling language of the library, which already requires a good understanding of the inner workings of the various components making up power systems and of their interactions. Second, these packages often include a large number of advanced features, which is likely to overwhelm the inexperienced user and get in the way of designing even simple ANM scenarios. Third, because these libraries were designed to facilitate a wide range of simulations and analyses, they often do so at the cost of solving simpler problems more slowly (e.g., simple AC load flows). Fourth, in the absence of a programming framework agreed upon by the RL research community interested in tackling energy system management problems, various research teams are likely to spend time and resources implementing the same underlying dynamics common to all such problems.
By releasing Gym-ANM, we hope to address all the shortcomings of traditional modeling packages described in the previous paragraph. Specifically: • The dissociation between the design of the environment (Section 2.1) and the training of RL agents on it (Section 2.2) encourages collaboration between researchers experienced in power system modeling and in RL algorithms. Thanks to the general framework provided by Gym-ANM, each researcher may focus on their particular area of expertise (designing or solving the environment), without having to worry about coordinating their implementations. • This dissociation also means that RL researchers are able to tackle the ANM tasks modelled by Gym-ANM environments without having to first understand the complex dynamics of the system. As a result, existing Gym-ANM environments can be explored by many in the RL community, from novices to experienced researchers. This is further facilitated by the fact that all Gym-ANM environments implement the Gym interface, which allows RL users to apply their own algorithms to any Gym-ANM task with little code modification (assuming they have used Gym in the past). • Gym-ANM focuses on a particular subset of ANM problems. This specificity has two advantages.
The first is that it simplifies the process of designing new environments, since only a few components need to be implemented by the user. The second is that, during the implementation of the package, it allowed us to focus on simplicity and speed. That is, rather than providing a large range of modeling features like most of the other packages, we focused on optimizing the computational steps behind the next state() block of Figure 1 (i.e., solving AC load flows). This effectively reduces the computational time required to train RL agents on environments built with Gym-ANM.
The simplicity with which Gym-ANM can be used by both the power system modeling and the RL communities has an additional advantage: it makes it a great teaching tool. This is particularly true for individuals interested in working at the intersection of power system management and RL research. One of the authors, Damien Ernst, has recently started incorporating the ANM6-Easy task in his RL course, Optimal decision making for complex systems, at the University of Liège [25].
Finally, we also compared the performance of the soft actor-critic (SAC) and proximal policy optimization (PPO) RL algorithms against that of an optimal model predictive control (MPC) policy on the ANM6-Easy task in [14]. We showed that, with almost no hyperparameter tuning, the RL policies were already able to reach near-optimal performance. These results suggest that state-of-the-art RL methods have the potential to compete with, or even outperform, traditional optimization approaches in the management of electricity distribution networks. Of course, ANM6-Easy is only a toy example, and confirming this hypothesis will require the design of more complex and advanced Gym-ANM environments.

Conclusions and future works
In this paper, we discussed the usage of the Gym-ANM software package first introduced in [14], as well as its potential impact on the research community. We created Gym-ANM as a framework for the RL and energy system management communities to collaborate on tackling ANM problems in electricity distribution networks. As such, we hope to contribute to the gathering of momentum around the applications of RL techniques to challenges slowing down the transition towards more sustainable energy systems.
In the future, we plan to design and release Gym-ANM environments that more accurately model real-world distribution networks as opposed to that modeled by ANM6-Easy. However, we also highly encourage other teams to design and release their own Gym-ANM tasks and/or to attempt to solve existing ones.

Declaration of competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.