Physics-Informed Neural Networks for AC Optimal Power Flow

This paper introduces, for the first time to our knowledge, physics-informed neural networks to accurately estimate the AC-OPF result and delivers rigorous guarantees about their performance. Power system operators, along with several other actors, are increasingly using Optimal Power Flow (OPF) algorithms for a wide number of applications, including planning and real-time operations. However, in its original form, the AC Optimal Power Flow problem is often challenging to solve as it is non-linear and non-convex. Besides the large number of approximations and relaxations, recent efforts have also been focusing on Machine Learning approaches, especially neural networks. So far, however, these approaches have only partially considered the wide number of physical models available during training. And, more importantly, they have offered no guarantees about potential constraint violations of their output. Our approach (i) introduces the AC power flow equations inside neural network training and (ii) integrates methods that rigorously determine and reduce the worst-case constraint violations across the entire input domain, while maintaining the optimality of the prediction. We demonstrate how physics-informed neural networks achieve higher accuracy and lower constraint violations than standard neural networks, and show how we can further reduce the worst-case violations for all neural networks.


I. INTRODUCTION
Power system operators, electricity market operators, and several other actors increasingly use Optimal Power Flow (OPF) for the planning and real-time operation of power systems.Countless instances of OPF need to be solved when it comes to evaluating uncertain scenarios, finding optimal control setpoints, bidding strategies, or determining the electricity market clearing.However, the exact formulation of the AC-Power Flow equations in the OPF problem is nonlinear and non-convex [1], which could result in significant difficulties for convergence and long computational times.A large body of literature exists on deriving approximations of the AC-OPF problem, with the most popular ones being the linearized DC-OPF, other linear or convex approximmations, and convex relaxations, that will ensure fast computation speed and convergence guarantees [2].
Most recently, there is a revived interest in machine learning methods to accurately estimate the AC-OPF solution [3] [4], which have demonstrated a computation speedup of 100-1'000 times compared with the conventional methods.This means that in the time it would take to assess one scenario by solving one AC-OPF instance, we can now assess up to 1'000 scenarios simultaneously.However, these machine learning algorithms experience two significant challenges.First, the availability and quality of training datasets: to train a neural network with considerable accuracy, we need OPF results for a huge set of operating points that will cover both normal and abnormal situations.Such datasets often do not exist, or it is often challenging to generate.Convex relaxation techniques were proposed in [5], [6] to efficiently generating such large datasets, concentrating closer to the security boundary.Along the lines of improving the performance of such Machine Learning algorithms, a method was proposed in [7] to identify and include adversarial examples in the training data set during training.
The second major issue limiting the Neural Network (NN) widespread adaptation is that, so far, none of the proposed machine learning algorithms have supplied any worst-case performance guarantee.With OPF often used for safety-critical applications, the neural network estimates must not violate any OPF constraints, e.g., line, voltage, or generator limits.To mitigate these limitations, the NN predictions can be postprocessed to satisfy generation constraints as proposed in [8] [9] for AC-OPF.However, this could negatively impact the optimality of the solution.A few methods have also offered to include the constraint violations in the NN loss function [10], [8].However, none of these proposed algorithms provide any worst-case performance guarantees for the AC-OPF problem.The only works that have derived worst-case performance guarantees have so far focused only on the DC-OPF problem [11], [12].
This paper attempts to address both of these challenges by using a physics-informed neural network for AC-OPF applications.First, we introduce the physical equations in the form of the AC-OPF Karush-Kuhn-Tacker (KKT) conditions inside the neural network training.By doing that, the neural network can reduce its dependency on the size and quality of the training dataset, and instead, it can determine its optimal parameters based on the actual equations that it aims to emulate [13].
Second, we introduce methods that extract worst-case guarantees for generation and line flow constraint violations of the neural network AC-OPF predictions, extending our previous work presented in [11] and [12].Through that, we (i) determine the worst violations that can result from any neural network output across the whole input domain, and (ii) propose methods to reduce them.This paper is structured as follows: Section II describes the AC-OPF problem and its KKT formulation, introduces the proposed physics-informed neural network training architecture, and the optimization algorithm used to determine the worstcase guarantees.Section III provides the simulation setup used and delivers the results demonstrating the performance of physics-informed neural networks.Section IV discusses the possible opportunities to improve the system performance and concludes.

A. AC -Optimal Power Flow
The AC Optimal Power Flow (AC-OPF) problem for generation cost minimization in a system with N b number of buses, N g number of generators and N d number of loads is a Quadratically Constrained Quadratic Programing (QCQP) problem.The objective function for the cost minimization can be written as follows: where vector c T p and c T q refers to the linear cost terms for the active and reactive power generations P g and Q g , respectively.The optimal generation values depend on the active and reactive power demand, denoted by P d and Q d , and the network constraints.For a given demand, the active and reactive power injection at each node n ∈ N b can be represented as follows: where p n and q n denote the active and reactive power injection at bus n and p g n , q g n , p d n and q d n specifies the active and reactive power generation and demand, respectively at bus n.The power flow equations in the network can be expressed as follows: where real and imaginary part of the voltage at bus n is given by v r n and v i n and, the conductance and susceptance of the line nk is denoted by G nk and B nk .If the real and reactive parts of voltage are combined into a vector of size 2N b x1 as v = [(v r ) T , (v i ) T ] T .Then, the power flow equation can be simplified as follows [14]: Other than the power flow equations the optimal generation values should also satisfy the active and reactive power generation limits.
Similarly, the voltage and line current flow constraints for the power system can be represented in a matrix form as follows: where M n v := e n e T n + e N b +n e T N b +n and e n is a 2N b x1 unit vector with zeros at all the locations except n.The square of magnitude of upper and lower voltage limit is denoted by v n and v n respectively, and the square of magnitude of line current flow in line mn is represented by mn and matrix T where y mn is the admittance of branch nm.Assuming the slack bus N sb acts as an angle reference for the voltage, we will have: The constraints (2)-( 3),( 6)- (12) and the objective function for the AC-OPF problem (1) can be simplified as follows: where , and c T is the combined linear cost terms for the active and reactive power generations.Then the equality constraints ( 6)-( 7) and ( 12) can be represented by the L = 2N b + 1 constraints in (13b).Similarly, the inequality constraints ( 8)-( 11) can be represented by the M = 4N g + 2N b + N l constraints in (13c).The corresponding Lagrange multipliers are denoted by λ l and µ m .
The Lagrangian function L for the AC-OPF can be formulated as follows: where x = {G, v}.So, the KKT conditions can be formulated as follows: where the stationarity condition is given in ( 15) and ( 16), the complementary slackness condition is given by ( 17) and dual feasibility is given by (18).For an AC-OPF problem these KKT conditions act as a necessary condition for optimality.

B. Physics Informed Neural Network
This section introduces the physics-informed neural network architecture used for predicting the AC-OPF active and reactive power generation setpoints G, given active and reactive power demand D as the input.A conventional neural network (NN) is a group of interconnected nodes correlating the input and the output layers, as shown in Fig. 1.Nodes connecting the input and output layers will be divided into K number of hidden layers with N k number of neurons in hidden layer k and the edges connecting the neurons will have a weight w and a bias b associated with them.Also, each neuron in the neural network will have a nonlinear activation function linked with them.The output of each layer in the NN can be denoted as follows: where Z k+1 is the output of layer k + 1, w k+1 and b k+1 are the weights and biases connecting layer k and k + 1. π is the nonlinear activation function.We chose the ReLU as the nonlinear activation function in this work, as it is observed to accelerate the neural network training [14].The ReLU activation function will return the input if the input is positive and return zero if the input is negative or zero.The ReLU activation function can be formulated as follows: During the NN training, the backpropagation algorithm will modify the weights and biases to predict the optimal generation setpoint for the AC-OPF problem accurately.
In case of a physics-informed neural network, on top of comparing the NN predictions to the AC-OPF setpoints of the training database, the validity of the physical equations governing the problem will also be accessed during NN training (see [15] [13], and our previous work [16] for DC-OPF applications).Since the optimal value should satisfy the KKT conditions given in ( 15) -( 19), the disparities in the KKT condition, denoted by , as shown in (23a)-(23d) are added to the NN training loss function (24) and minimized during training.The proposed physics-informed neural network structure is given in Fig. 2. The voltage values and the dual variables required for calculating the discrepancy in the KKT conditions are predicted using a separate set of hidden layers, as we have observed that this increases the accuracy while maintaining a smaller neural network size.
The discrepancy in KKT conditions are calculated as follows: where v is the voltage values predicted by the hidden layers V and λl and μm are the dual variables predicted by hidden layers L m in Fig. 2. The absolute value of the stationarity condition is measured in stat , and the error in complementary slackness conditions (17) is given by com .The ReLU activation function, represented by π in (23c), is used to measure the dual feasibility violation.If the neural network prediction is the optimal value, then these errors will be zero.Since we also have the KKT conditions to measure the accuracy of the NN prediction, we can have collocation points in the training set.Like the NN training data points, the collocation points are a set of random input values from the input domain.However, unlike the training data points, we do not spend resources to determine the optimal generation dispatch values, the voltage setpoints, or dual variables before training.Instead, the error factors given in (23a) -(23d) will be used to measure the prediction accuracy and train the NN.
The following loss function is used to modify the shared parameters of the three neural networks: where N t is the number of training data points, and N c is the number of collocation points.The mean absolute error of active and reactive power generation prediction to the actual value is denoted by M AE g .M AE v and M AE l indicates the mean absolute error of voltage and dual value prediction, and M AE is the mean absolute value of KKT condition violations given in (23a) -(23d).Λ P , Λ V , Λ L , and Λ are the corresponding weights of each loss functions.The physicsinformed neural network performance depends significantly on these weights.So, they have to be selected appropriately to reduce either the average error or the maximum constraint violations.
Since the three NNs are independent, the NN will be trained to minimize the corresponding MAE along with M AE .For the collocation points, as discussed earlier, we do not compute the OPF output a priori.As we have not computed the optimal generation dispatch values G, voltage v, and dual variables L m for the collocation points, M AE g , M AE v , and M AE l will be considered zero for these points, and only M AE will be considered in the NN training.

C. Evaluating the Worst Case Performance of the Physics-Informed Neural Network
The average performance of NN prediction on an unseen test data set is typically used to evaluate and compare different NN architectures.However, in our case, other than the average performance on the test dataset, we will also be using the worstcase generation and line flow constraint violations in the entire input domain to evaluate and improve the performance of the proposed NN training architecture.This section describes the optimization problem used for extracting the worst-case guarantees.The NNs used for predicting the voltages and dual variables are independent of the NN used for predicting the optimal generation set-points.Since they are not required after the training and when the system is ready to be deployed, we can ignore them and instead focus only on the hidden layers used for predicting the generation values.
1) Worst-Case Guarantees for Generation Constraint Violations: This section discusses the Mixed Integer Linear Programming (MILP) problem formulation used to determine the maximum active and reactive power generation constraint violations, denoted by v g .The maximum constraint violations in the input domain can be formulated as follows: where G and G are the maximum and minimum active and reactive power generation capacity.Since the ReLU activation function, used in (22), is nonlinear, the mixed-integer reformulation of ReLU activation function proposed in [11] is used as follows: where Z i k and Z i k are the inputs and outputs of the ReLU activation function.Z max,i and Z min,i are the maximum and minimum values possible for the respective Z i k .They should be large enough so that the constraints (26a) and (26c) will not be binding and small enough so that the constraints are not unbounded.y i k is a binary variable.If Z i k is less than zero, then (26d) will be active.Z i k and y i k will be zero to satisfy the other constraints.Similarly, if Z i k is greater than zero, then (26b) and (26a) will be active, and y i k will be one to satisfy the other constraints.
2) Worst-Case Guarantees for Line Flow Constraint Violations: Similar to the generation constraints, the line flow limits play an equally significant role in ensuring the proper functioning of the network.However, unlike the generation set-points, we do not directly get as an output the line flow values from the NN prediction.In our effort to formulate the most computationally efficient approach, we observed that simplified convex relaxations such as semidefinite programming for AC Power Flow (AC-PF) were not binding for worstcase guarantees.Therefore, we had to formulate a non-convex power flow problem to obtain the line current flow from the generation set points as follows: where vpf and ˆ mn are the voltage and square of line flow obtained from the power flow equations.The AC-PF equations in  27) − (30), ( 21), (26a) − (26e) (31c) When the MILP and MIQCQP problems are solved to zero MILP gap, we can ensure that the v g and v l values we obtained are the global optima.Thus, we can guarantee that there is no input {P d , Q d } ∈ D in the entire input domain, leading to constraint violations larger than the obtained values v g and v l .

A. Simulation Setup
The accuracy and scalability of the proposed physicsinformed neural network training architecture are analyzed against a standard NN on four different test systems.The test system characteristics are given in Table I.The network details for case 39, case 118, and case 162 are from the PGLib-OPF network library v19.05 [12], and the network details for case 14 is from [17].In all of the test cases, the active and reactive power demand at each node is assumed to be independent of each other.And they are specified to be between 60 to 100% of their respective maximum loading as follows: where D denotes the maximum active and reactive power demand.The sum of the maximum loading over all nodes for each system is given in Table I.Ten thousand sets of random active and reactive power input values were generated using latin hypercube sampling [18].From these, 50% were allotted to the collocation data set, for which we do not have to calculate and provide the OPF setpoints.20% of the rest was considered training data points and the remaining 30% as the unseen test data set.AC-OPF in MATPOWER was used to determine the optimal active and reactive power generation values and voltage setpoints for the input data points in the training and test data sets, and afterwards the KKT conditions given in ( 15) -( 19) were used to obtain the dual variables.
The properties of both standard and physics-informed neural networks used for the analysis are given in the Table II.We used TensorFlow [19] with Python for neural network training, and we fixed the number of training epochs to 1,000 and split the data set into 200 batches while training.A High-Performance Computing (HPC) server with an Intel Xeon E5-2650v4 processor and 256 GB RAM was used to train the neural networks.The training time for the standard NN, denoted by NN, and physics-informed neural network, indicated by PINN, as a factor of time taken by the NN is given in Table II.In our experiments the proposed physicsinformed neural network took almost three times as much time to train compared to the standard NN.On the other hand, physics-informed neural networks result in significant computational savings when we count in the generation of the training database, as PINNs require considerably less training samples for which both input and output needs to be computed (i.e.we do not need to run an OPF for any collocation point).
The MILP problem used for solving worst-case guarantees for generation violation and the MIQCQP problem used for solving the worst-case guarantees for line flow violations is formulated in YALMIP [20] and solved using Gurobi.The code to reproduce all the simulation results is available online [21].

B. Average Performance over Test Data Set
This section examines the performance of the physicsinformed neural networks versus the standard NNs, considering the most common performance metric for regression NN: the mean absolute error over an unseen test data set.As shown in (33)-(36) and explained below, along with MAE we also define metrics that measure the average constraint violation, sub-optimality, and distance to optimal setpoint: where M AE T is the mean absolute error with respect to the total generation in test data set, N T is the test data size, v avg g is the average active and reactive power generation constraint violation, v opt is the average sub-optimality and v dist is the average distance of predicted value to optimal decision variables v dist in percentage.
During training, both the standard and the physics-informed neural network were optimized to minimize the MAE.However, it was observed that the average performance of the  In Table III, NN denotes the standard NN, and P IN N avg indicates the results from the physics-informed neural network.
Incorporating the KKT conditions in the neural network training resulted in a 20% to 30% reduction in the MAE on the test dataset.Similarly, a considerable improvement was observed in the average generation constraint violation, suboptimality, and distance from optimality as well.This indicates that we can achieve higher prediction accuracy using a physics-informed neural network with the same or fewer data.

C. Worst Case performance
While examining the worst-case guarantees for generator violations, we observed no direct correlation between the average performance of the physics-informed neural network in a test data set and the maximum constraint violations.Since worst-case guarantees play a crucial role in the trustworthiness of a NN, similar to the previous case, different hyperparameters were tested to see which one would produce the lowest worst-case generation constraint violations.Then those worst-case violations values, denoted by P IN N , are compared against (i) the standard NN and (ii) the physicsinformed neural network that delivered the least MAE, denoted by P IN N avg , in Table IV to access the importance of hyperparameters in the worst-case performance.From Table IV, we can infer that the physics-informed neural network which produced the lowest MAE also offered a significant reduction in worst-case generation constraint violation compared to the standard NN in all the test cases.This establishes the fact that the physics-informed neural networks offer both better generalization capabilities and lower worst-case guarantees with the same training data compared to a standard NNs.However, an additional 15% to 30% reduction in worst-case generation constraint violation is achieved when we tune the hyperparameters of the physics-informed neural networks with the objective to improve specifically the worstcase guarantees instead of the objective to minimize the mean absolute error.This indicates we can further tighten the worst-case guarantees by optimizing the hyperparameters to minimize the worst-case constraint violations.
In 'Case 14', along with the generation constraint violations, we were also able to compute the worst-case line flow constraint violations.As Table IV shows, physics-informed neural networks also resulted in lower line flow constraint violations.For Case 39, Case 118, and Case 162, we were unable to compute the worst-case line flow constraint violation since the MIQCQP problem could not be solved to zero optimality gap within 5hr.This highlights the computational challenges associated with the extraction of the worst-case guarantees for the AC-OPF, and in general non-linear, problems.The present work focuses on introducing the first formulations that can extract such guarantees for a non-linear optimization problem.Future work shall focus on addressing the computational issues in order to arrive at scalable algorithms for any system size.
The average performance of P IN N , (i.e mean absolute error, average generator constraint violations, average suboptimality, and average distance to the optimal setpoint) is given in Table V.If we compare Table V with Table III, we observe that in most cases the average performance worsens when we tune the hyperparameters with the objective to reduce the worstcase generation constraint violations instead of the objective to obtain the least mean absolute error.This possibly indicates that there is a trade-off in the performance of the NN between trying to achieve the best average performance and achieving the least worst case violations.However, further analysis is required to ascertain the relationship between good average performance and least worst case violations.

D. Input Domain Reduction
Ref. [11] has shown that for linear optimization programs, the worst-case constraint violation of the NN can be reduced by training the NN on a broader input domain than the one on which it will be deployed.We explore how this approach performs for non-linear programs, both for standard NNs and The resulting worst-case generation constraint violation with respect to the maximum active power loading is given in Fig. 3.We observe that similar to linear programs, for the AC-OPF as well input domain reduction of test sets improves the worstcase guarantees considerably, especially for the standard NNs.We notice that the reduction of the violations is significantly higher for standard NNs compared with the Physics-Informed Neural Networks (PINNs), although across the whole input domain reduction the PINNs exhibit always lower worstcase violations.This implies that the physics-informed neural network will only require a slightly larger input domain of the training data set to maintain the worst-case constraint violations below a certain threshold, while standard NNs would require significantly larger input domains to achieve a similarly low level of worst-case constraint violation, if at all possible.
IV. CONCLUSION This paper offers two key contributions.First, we introduced a framework to incorporate the physical equations of the AC-OPF inside the neural network training.Additionally, we show that we can improve the NN prediction accuracy by using physics-informed neural networks while using considerably fewer input data points.Second, we propose a method to extract and minimize the worst-case generation and line con-

Fig. 1 .
Fig.1.Illustration of the neural network architecture to predict the optimal generation active and reactive power outputs Ĝ using the active and reactive power demand D as input: There are K hidden layers in the neural network with N k neurons each.Where k = 1, ...,K.

Fig. 2 .
Fig. 2. Illustration of the physics-informed neural network architecture to predict the optimal active and reactive power generation outputs Ĝ, voltage setpoints v and dual variables Lm utilizing the active and reactive power demand D as input.Hidden layers that predict Ĝ, v and Lm are independent of each other.During training, the weights w and biases b are modified to minimizes the mean absolute errors M AEg, M AEv, M AE l and M AE -June 27 -July 1, 2022PINNs.For both neural network types, the input domain for the test set was symmetrically reduced by δ as follows:(0.6 + δ)D ≤ D ≤ (1 − δ)D(37)

Fig. 3 .
Fig. 3. Input Domain Reduction June 27 -July 1, 2022 physics-informed neural network in the test data set depends on the hyperparameters used in (24), while training.So different hyperparameters values were tested, and the combination which produced the least MAE is used to generate the results given in TableIII.