Finding a Closest Saddle-Node Bifurcation in Power Systems: An Approach by Unsupervised Deep Learning

—We propose a neural network using an unsupervised learning strategy for direct computation of closest saddle-node bifurcations, eliminating the need for labeled training data. Our method not only estimates the worst-case load increase scenarios but also significantly reduces the computational complexity traditionally associated with this task during inference time. Simulation results validate the effectiveness and real-time applicability of our approach, demonstrating its potential as a robust tool for modern power system analysis.


I. INTRODUCTION
The rapid growth of intermittent energy sources, primary wind and solar power, over the last decade renders it reasonably to investigate the effect it has on voltage stability.Unpredictable fluctuations in power generation and inaccuracies in forecasting can lead to overloaded transmission lines, insufficient reactive power supply, and failures to meet scheduled demands.These situations pose substantial risks to voltage stability, potentially triggering a voltage collapse.
This paper addresses the aforementioned issue of voltage stability by employing a direct method to compute the closest saddle-node bifurcation.More specifically, we consider the load flow model with k unknown system variables (voltage magnitudes at PQ-buses and voltage angles at non-slack buses) and ℓ uncertain injections (e.g.active and reactive power injected at load buses and at buses connecting variable production sources to the grid), as represented by (1) Following an approach similar to [1], we focus on the proximity of a voltage collapse, avoiding assumptions regarding the dynamics of load change and instead estimating the distance to the worst-case load increase.

Submitted to the 23rd Power Systems Computation Conference (PSCC 2024).
To circumvent the online computational burden traditionally associated with loading margin assessments, we introduce an approach using deep learning.Specifically, we define the map Λ * that takes a nominal parameter vector λ p to a closest point of the saddle-node bifurcation surface and train a multi-layered (deep) neural network (DNN), denoted as Λ to approximate Λ * .
When training neural networks, it's crucial to differntiate between supervised and unsupervised learning.The former involves utilizing a dataset D s = {(λ i , Λ * (λ i )}.Here, λ i represents an input feature to the neural network while Λ * (λ i ) corresponds to the ground truth output that the neural network is expected to produce given the input feature.The process of supervised learning involves using a suitable performance metric to enable the neural network to closely mimic the ground truth.However, acquiring a comprehensive set of ground truths can be difficult and in many instances impractical due to the significant computational resources required.In contrast, unsupervised learning diverges from the mimicking approach and focuses on enabling the training process to autonomously output the true solution, as will be detailed further in this paper.
By employing an unsupervised learning strategy, our method eliminates the necessity for labeled training data, i.e. the training data augmented with the ground truth.The proposed neural network learns directly from the data, effectively harnessing the inherent structure and physical laws of power systems.We assess the effectiveness of the proposed scheme through a numerical simulation on a small power system model, comparing our results to those obtained with conventional iterative methods.The results reveal the proposed method's potential for generating stability margins appropriate for real-time use.
In conclusion, this paper offers a new solution to the issue of voltage stability in power systems.Our approach combines the robustness of saddle-node bifurcation computations with the computational efficiency afforded by deep learning, providing a promising tool for managing the dynamics of today's power grids with increasing renwable energy penetration.

A. Related literature
The examination of voltage stability often entails approximating operational boundaries through loadability surfaces in parameter space [2]- [4].However, due to the complexity of practical power systems, achieving a parametrization of the loadability surface is practically unattainable To remedy this, the approach of finding the closest point on the loadability surface has instead gained considerable attention in litterature, [1], [5]- [7].Exploring the loadability surface by means of continuation methods have been proposed in [5], [8]- [10].Other methods that have been investigated is loadability surface approximations, such as local approximations [11]- [13] and global approximations based on Galerkin methods [14].A challenge with local approximations is the non-smoothness, of the saddle-node bifurcation surface.This non-smoothnes arises mostly due to changes in power system limits [13], [15].
In addition to conventional iterative methods, various alternative approaches have been explored in the literature.For instance, genetic algorithms [26] have been employed to find the closest saddle node bifurcation.On the subject of voltage stability assesment, researchers have also utilized neural networks [28].Here the authors use a combination of graph neural networks with long-short term memory networks, utilizing time series data as inputs to evaluate short-term voltage stability.A method using similar methods to the one proposed in our paper can be found in [27], were the author utilized neural networks for load margin assesment through the interleaving of two neural networks.Care was taken to ensure that constraints were not violated,employing a methodology that combines supervised learning with physicsguided principles.
Whereas we have not been able to find any paper that approximates the distance to the saddle-node bifurcation surface using DNNs, the optimal power flow problem has had a recent resurge due to the implementation of DNNs [16]- [18].

II. THE LOADABILITY SURFACE
In the context of voltage stability analysis, the concept of saddle-node bifurcation loadability limit is one of the most common ways of assessing the stability.Saddle-node bifurcations are crucial as it denotes the boundary at which voltage collapse may occur.A saddle-node bifurcations is a point in parameter space in which the system moves from a stable state into an unstable state.The collection of these points forms the saddle-node bifurcation surface Σ := ∂U, with U := {λ ∈ R m : ∃x ∈ R n , f (x, λ) = 0} in particular, [6] offers an expression of characterizing a saddlenode bifurcation surface as In a saddle-node bifurcation, the jacobian of the power flow equations f (x, λ) becomes singular.This fact is reflected by the eigenvector, v, corresponding to the zero eigenvalue of the jacobian matrix.
Given a nominal parameter vector λ p ∈ U, it is of interest to find a point λ * ∈ Σ for which the distance ∥λ p − λ * ∥ is minimal, as it can be used to assess the risk of voltage collapse.

III. PROXIMITY TO THE SNB-SURFACE
With the present papers focus on saddle-node bifurcations we assume an n-bus power system with a predefined set of PQ-buses and voltage controlled buses.Furthermore, let f : R k × R ℓ → R k denote the power flow equations, under the assumption that the parameter space has dimension ℓ.The saddle-node bifurcation surface can then be defined as A. An iterative approach The normal vector n Σ (λ 0 ) to the surface Σ at the point λ 0 ∈ Σ can be determined from the jacobian matrix, f x (x 0 , λ 0 ), where x 0 ∈ R k satisfies f (x 0 , λ 0 ) = 0. Specifically, if w ∈ R k is the left eigenvector (assuming multiplicity one) corresponding to the zero eigenvalue of the jacobian matrix, then where α ∈ R is chosen such that ∥n Σ (λ 0 )∥ = 1 and ensuring that the normal is oriented outwards from the saddlenode bifurcation surface.
In an method involving successive computations of n Σ (λ 0 ), the authors of [1] introduced an iterative method for finding a closest point on the saddle-node bifurcation surface.By initializing the method with a parameter λ p ∈ U and a direction d 0 ∈ S ℓ−1 (ℓ − 1 dimensional unit sphere.) the authors of [1] use continuation methods to find the point λ 1 ∈ Σ for which the ray λ p + rd 0 , r ≥ 0, intersect.Effectively the continuation method results in r * 0 such that λ 1 := λ p + r * 0 d 0 , ending the first iteration.In the successive iteration the directional vector of load increase is chosen to d 1 := n Σ (λ 1 ) computed from (3).This algorithm as proposed by [1] generates a sequence of points (λ i ) i≥0 on Σ such that lim j→∞ λ j ∈ arg min{∥λ − λ p ∥ : λ ∈ Σ}, provided the assumption of Σ having sufficiently nice properties, such as convex or not too concave.

B. A direct method
The work of [6] proposes an alternative to the iterative approach.In this context, λ * serves as a fixed point of the previously mentioned iterative algorithm.Consequently, due to the fact that n Σ (λ * ) is paralell to λ * − λ imples that there are points A direct method emerges by searching for solutions to (4) using standard numerical algorithms.

IV. A DEEP LEARNING METHOD
The iterative and the direct methods described in the previous section can be used to find the closest point on the SNB surface to a given parameter vector λ p .Effectively, this gives us a single point of the map Λ * : U → Σ, where U ⊂ R m is the set of feasible parameter vectors, satisfying (5) In power system operation, the randomness of demand and variable production sources introduce uncertainty in the system parameters and for online supervision and planning purposes, the system operator would ideally like to have full access to the entire map Λ * .In this section we explain how unsupervised deep learning can be used to find an accurate approximation Λ of Λ * using limited computational sources at inference time.
The main ingredient in the algorithm that we propose is to rewrite the optimization problem using an augmented Lagrangian formulation to get an unconstrained minimization problem and then use to corresponding objective function to train our neural network.

A. Power system model
In the context of this paper, an n-bus power system with fixed network topology is considered.The indices for the nodes of the power system are assumed to be in the set N = {i : i = 0, 1, ..., n − 1}.It is further assumed that the power system in consideration has m voltage controlled buses, whose bus indices are located in N P V ⊂ N .Consequently, there are n − m − 1 load buses located at The focus in present paper is on steady state analysis of power systems, it is therefore reasonable to assume constant power loads.Hence, it is assumed that there are subsets D p ⊆ N P Q and D q ⊆ N P Q on which there are non-zero active and reactive loads respectively.The load vector is represented as For each i ∈ N P V , the generator at corresponding bus will produce the fixed active power p g,i ∈ R + and variable reactive power q g,i ∈ R, as determined by the solution of load flow equations.The active-and reactive power output of the generators can compactly be denoted as the vectors p G = (p g,i ) i∈N P V and q G = (q g,i ) i∈N P V .For the scope of this paper we will not assign any upper-or lower bounds on the reactive power generation and thus let q G take values in R m .Given this power system model, the state can be defined as With the model described as above, the power flow equations can be written as

where
• f G corresponds to the active power injection at PV nodes in N P V • f 0 gives the active and reactive injected power at PQ nodes where there is no generation or load present.I.e. in buses N P Q \D p and N P Q \D q respectively.• f L outputs the active and reactive power injection at nodes where these are uncertain, that is in D p and D q respectively.It should be noted that while the model presented in this section has the load exclusively on PQ-buses, the method presented in subsequent section allow for the more general case in where arbitrary buses can be equipped with passive loads.The chosen model in present paper is merely for ease of notation.
By leveraging on the approximation capabilities of neural networks, a neural network Λ(λ p , θ), is in this paper proposed to approximate Λ * .The parameter θ is a vector of trainable weights for the neural network.Therefore, the objective of the training process is to find a parameter θ * such that the neural network output is a good approximation of Λ * Acquiring labeled data for the problem at hand would entail finding a multitude of mappings Λ * (λ p ) from a sample data set D. To circumvent this, the neural network training procedure proposed in this paper is trained using unsupervised learning.This necessitates the replacement of labels, typically used in a supervised setting, with a strategy that enables the network to learn from the data itself, guided by physical laws and other constraints.Therefore, ( 6) is relaxed into an unconstrained optimization problem through the application of the Augmented Lagrangian Method (ALM).
The ALM iteratively solves a sequence of sub-optimization problems.The Lagrangian of ( 6) is supplemented with a penalty term to form the augmented Lagrangian, represented as: The augmented Lagrangian penalizes violations of the equality constraints, each weighted by a factor ρ k .Each suboptimization problem is indexed by k and µ k serve as a parameter to approximate the Lagrange multiplier µ * .Hence, for each k we solve for For each iteration k, µ k and ρ k are kept constant, while minimizing the augmented Lagrangian.Once a (local) minimum has been obtained, multipliers are updated according to the update rule Then, under mild regularity conditions, see e.g.[19], one can show that {x k } → x * and {µ k } → µ * giving the necessary conditions for local minima, ∇ x L(x * , µ * ) = 0 and ∇ µ L(x * , µ * ) = 0.

C. Unsupervised learning
A supervised approach to the present problem would be to train the neural network using a dataset D = {λ i p , λ * ,i } M i=1 , with λ i p being the input feature and λ * ,i := Λ * (λ i p ) the ground truth obtained by employing either of the techniques proposed in the previous section.The network is trained by minimizing the empiric risk, While this is a common regression approach for neural networks, it does not ensure that the output is feasible.To remedy this, a regularization term can be added, which in this constrained setting is a function penalizing constraint violations.Similar approaches have been used within the realm of deep learning based optimal power flow, see e.g.[17].
A key observation above is that our optimization problem does not need to involve a direct search over parameters λ ∈ Σ.To approach the deep learning approximation of Λ * we consider a different network F (λ p , θ) = ( X(λ p , θ), Ŵ (λ p , θ)) the output of which represents the vector of unknown voltage components x ∈ R 2n−m−2 and eigenvector w to the power flow Jacobian corresponding to the zero eigenvalue.The principal layout of the neural network used is illustrated in Fig. 1.
We will then search for a weight θ * such that ∥f L ( X(λ p ; θ * )) − λ p ∥ ≈ Λ * (λ p ) for each λ p within the set of feasible parameters.Effectively, this corresponds to setting  a formulation designed to implicitly guarantee that load-flow feasibility is retained in all load buses.Given a dataset D = {λ p } M i=1 , the distance ∥ Λ(λ i p , θ) − λ p ∥ should be minimized over all samples in D while adhering to the constraints (6).We can therefore formulate the training as the following optimization where Ŵ (λ i p ; θ) is an auxiliary output of the neural network that, when properly trained, should output the eigenvector to the power flow Jacobian corresponding to the zero eigenvalue at the closest saddle-node bifurcation point.
To incorporate the above optimization problem in the backpropagation of the networks, we first need to relax the above problem into an unconstrained optimization problem.Lagrangian relaxation is a common method in conventional constrained optimization that has been successfully applied in the training of neural networks for constrained optimization problems.In [16] it is used to solve supervised AC-OPF using two networks for predicting the primal and dual variables respectively.In [17], [18] a penalty function is used as a regularization term in the loss function for penalizing deviations from the constraints.In this paper we will resort to use the ALM.
In the field of constrained optimization using machine learning, the augmented lagrangian has been reported successful.In particular, physics informed neural networks such as in [20] [21] [22] make use of ALM as relaxation.

D. Neural network architecture
Common with the previous cited work of using deep neural networks for approximating optimization problems, the neural network architecture employed in this paper is a feed-forward deep neural network.
The input to the neural network is the load vector λ = (λ P , λ Q ), giving a total of |D p | + |D q | input neurons.The input layer is followed by a sequence of hidden layers, h i each having h i neurons.The number of neurons per hidden layer is dependent on the power grid model under study.In general, power grids with large amount of nodes often call for more neurons in the hidden layers for better approximations.In terms of activation functions, the recticifer linear unit (ReLU), ϕ(x) = max{0, x}, is used throughout the hidden layers.
For the output layer, 2|N |−|N P V |−1 neurons are dedicated for representing the state vector X = (x v , x δ ).In terms of the voltage magnitude prediction, the set for which the amplitude take values from is the set [V min , V max ] n−m−1 .Similarily, the voltage angles are defined to take values in the set [− π 2 , π 2 ] n−1 .To enforce that voltage magnitude and angles fall within these sets, we make use of the sigmoid activation function.
While it is possible to directly output the state vector in terms of votage magnitude and angle, it was found during testing that a more suitable approach in terms of numerical stability was to use an implicit representation of the state vector, x.As such, a complex representation of the voltages is used, Specifically, on the neurons representing the state vector X (V m and δ neurons in Figure 1), the sigmoid activation function is applied, The predicted voltage magnitude can then be recovered as The voltage angles are determined implicitly by considering the complex representation of voltages, by letting the neural network output the imaginary part of a complex number with unit magnitude.
Given that the network is designed to output the eigenvector w, we can enforce the constraint (13) at the output directly, by normalizing the output.

E. Training
Based on a base case power system model with a feasible load λ 0 , the set of load samples D is created by randomly sample points from the set B r (λ 0 ) = {λ : ∥λ − λ 0 ∥ < r} ∩ Following the ALM, the training of F involves solving a sequence of optimization problems.A pre-defined amount,κ of outer-iterations is performed.Within each outer-iteration we randomly sample a mini-batch B ⊂ D and update θ k using the Adam optimizer [23].The entire dataset is processed up to n epoch times, hence the inner iteration follows a conventional training process.However, in an effort to avoid overfitting and reduce training time, early stopping with a fixed patience is used.The loss is evaluated in between epochs and if the loss is not improved for a fixed number of epochs, the inner iteration is interrupted.Further, an adaptive learning rate is used, and the learning rate is reduced if the loss is not improved.Note, the learning rate is is reverted back to the initial learning rate in each new outer iteration.Once an inner iteration is completed, the lagrange multipliers are updated in accordance with the ALM algorithm, Data: Training dataset D T Result: optimal parameter θ * µ 0 ← 0; For illustrating the proposed approach, we consider the same power system topology as the one used in the IEEE-24 bus system, with 11 voltage controlled buses and 12 PQ-buses.
The neural network is implemented using the pyTorch library and trained on a MacBook pro 2 GHz Quad-Core Intel core i5, 16 GB RAM.

A. Hyperparameters
Each hidden layer comprises of 512, 256 and 128 neurons respectively.The training dataset consists of a total of |D| = 10986 samples, out of which 986 samples are reserved for testing the neural network.The samples were obtained by randomly selecting points in a ball centered at the base case load λ 0 with a radius, r = 50 (MVA).
In terms for the loss function, and the augmented Lagrangian algorithm, the outer-loop was set to 50 iterations whereas the inner iteration count was set to 50 epochs.The learning rate was set to 10 −4 , however an exponential decay strategy were used with a decay factor, γ = 0.9.The learning rate is decayed if the loss is not improved for 5 consecutive   epochs, only to be reset between outer-iterations.Furthermore, early stopping is practiced, where the inner iteration is halted if the loss is not improved for 10 consecutive epochs.The initial penalty factor for the augmented Lagrangian algorithm was set to ρ 0 = 100.To avoid numerical instability, the penalty factor is capped at ρ max = 12000.Finally, the Lagrange multipliers are initialized to zero.

B. Performance
To evaluate the accuracy of the neural networks accuracy in approximating Λ * , the samples in D E were used as input the iterative approach presented in section III-A to serve as ground truth.The results obtained using the test set are displayed in Table 1, as well as the histograms over the optimality gap, shown in Figure 2.
The optimality gap prestented both in the table and the histogram is computed according to VI. CONCLUSION Based on the iterative-and direct approaches detailed in [6] and [1] we proposed the parametric approximation Λ(λ p , θ) of Λ * (λ p ), to find the minimal distance to the saddle-node bifurcation surface, given a stable point (x p , λ p ).While accurate, the iterative and direct methods are computational intensive which may pose a problem in time-senstivie applications, such as real-time assesment of the margin to voltage collapse.While training neural networks is associated with needing high computational resources in terms of memory and processing power, a trained neural network, given an input, provides outputs momentarily.
Owing to the fact that the saddle-node bifurcation surface can be characterized by a set of equality constraints, the problem of finding the closest saddle-node bifurcation can be formulated as a constrained optimization problem.As such, the problem can be relaxed using the Augmented Lagrangian method to formulate a concise loss function, suitable for training the neural network.The Augmented Lagrangian formulation allows the training to learn an optimal parameter θ * by penalizing constraint violations while minimizing the distance to the saddle-node bifurcation surface.While possible to train the neural network using supervised learning, generating a large sample set D and the corresponding labels Λ * (D) is highly time-consuming.However comparing the different approaches is indeed interesting for a future work.
In the numerical example, the IEEE-24 bus system was used.Using the iterative approach as ground truth in evaluating the accuracy of the neural network, 80% of the points in the test dataset had a relative error of less than 2%.While this work focused on the common fully connected feedforward neural networks, there may be other architectures worth exploring in future work.In particular, practical power systems, comprising thousands of nodes may be difficult for the neural network architecture used in this paper.In terms of potential scalability, it is evident from the literature that architectures similar to the one used in this paper have been utilized to address various computational tasks within the scope of power system analysis.Specifically, solving the Optimal Power Flow (OPF) problem, characterized by its non-convexity and nonlinearity has been approached in several papers using deep neural networks to approximate the results.In [24] the authors predicts the OPF solution using neural networks and showcase their findings on the PEGASE 1354 bus system.Similarily, in [25] the authors demonstrate the application of deep neural networks using a power system model comprising 30 000 buses.
A key aspect of this paper is that we demonstrate a neural networks ability to approximate the solution of the SNB problem without a large dataset containing ground truths.Specifically, we showcase a method allowing the network to autonomously learn the approximate solution through power flow equations, therby circumventing the need for ground truth all together.It is well-known that training neural networks is a computational demanding task.However, this process occurs offline, allowing a trained network produce accurate solutions to the problem on-demand.This capability is in our view a key strength to the proposed method as we envision scenarios where one need numerous solutions for various load situations aiding in prediction task.

Fig. 1 .
Fig. 1.Neural architecture R |Dp| + × R |Dq| .D is further split into two disjoint sets D T and D E , where the former is used for training the neural network and the latter for evaluating the performance of the trained neural network.

TABLE I RESULTS
OF TESTING THE TRAINED NEURAL NETWORK.