Physics-Informed Neural Networks for Time-Domain Simulations: Accuracy, Computational Cost, and Flexibility

The simulation of power system dynamics poses a computationally expensive task. Considering the growing uncertainty of generation and demand patterns, thousands of scenarios need to be continuously assessed to ensure the safety of power systems. Physics-Informed Neural Networks (PINNs) have recently emerged as a promising solution for drastically accelerating computations of non-linear dynamical systems. This work investigates the applicability of these methods for power system dynamics, focusing on the dynamic response to load disturbances. Comparing the prediction of PINNs to the solution of conventional solvers, we find that PINNs can be 10 to 1000 times faster than conventional solvers. At the same time, we find them to be sufficiently accurate and numerically stable even for large time steps. To facilitate a deeper understanding, this paper also present a new regularisation of Neural Network (NN) training by introducing a gradient-based term in the loss function. The resulting NNs, which we call dtNNs, help us deliver a comprehensive analysis about the strengths and weaknesses of the NN based approaches, how incorporating knowledge of the underlying physics affects NN performance, and how this compares with conventional solvers for power system dynamics.


Introduction
Time-domain simulations form the backbone in many power system analyses such as transient or voltage stability analyses.However, even the simplest set of governing Differential-Algebraic Equations (DAEs) which can describe the system dynamics sufficiently accurate, can impose a significant computational burden during the analysis.Ways to reduce this computational cost while maintaining a sufficiently high level of accuracy is of paramount importance across all applications in the power systems industry.
Since, generally speaking, there is no closed form analytical solution for DAEs [1], we revert to numerical methods to approximate the dynamic response.Refs.[2,3] provide a good overview on general solution approaches and the modelling in the power system context, and [4,5,6] summarise important developments, mostly relying on model simplification, decompositions, pre-computing partial solutions, and parallelisations.
A new avenue to solve ordinary and partial differential equations emerged recently through so-called Scientific Machine Learning (SciML) -a field, which combines scientific computing with Machine Learning (ML).SciML has been receiving a lot of attention due to the significant potential speed-ups it can achieve for computationally expensive problems, such as the solution of differential equations.More specifically, the authors in [7], already 25 years ago, introduced the idea of using artificial Neural Networks (NNs) to approximate such solutions.The idea is that NNs learn from a set of training data to interpolate the solution for data points that lie between the training data with high accuracy.Ref. [8] has revived this effort, now named Physics-Informed Neural Networks (PINNs), which has developed into a growing field within SciML as [9] reviews.The key idea of PINNs is to directly incorporate the domain knowledge into the learning process.We do so by evaluating if the NN output satisfies the set of DAEs during training.If it does not, the parameters of the NN are adjusted in the next training iteration until the NN output satisfies the DAEs.This approach reduces the need for large training datasets and hence the associated costs for simulating them.Ref. [10] introduced PINNs in the field of power systems.
Our ultimate goal is to develop PINNs as a solution tool for time-domain simulations in power systems.This paper takes a first step, and identifies the strengths and weaknesses of such a method in comparison with existing solution methods with respect to the application specific requirements on the solution method.Stott elaborated nearly half a century ago that, among others, sufficient accuracy, numerical stability, and flexibility were important characteristics that need to be weighed against the solution speed [2].In an ideal world, we are looking for tools that are highly accurate, numerically stable, and flexible, and at the same time very fast.Several approaches have been proposed to deal with this trade-off, aiming at being faster (at least during run-time) while maintaining accuracy, numerical stability, and flexibility to the extent possible.Some of the promising ones are based on pre-computing parts of the solution of DAEs.For example, Semi-Analytical Solution (SAS)-methods adopt this approach [11,12,13].We can push this idea of pre-computing the solution even further: PINNs, and NNs in general, pre-compute -learn -the entire solution, hence, the computation at runtime is extremely fast.Related works in [14,15,16] introduce alternative NN architectures and problem setups, primarily driven by considerations on the achieved accuracy.In contrast, our focus lies on assessing PINNs from a perspective of a numerical solution method in which accuracy has to be weighed against other numerical characteristics namely speed, numerical stability and flexibility.The contributions of this work are the following: 1. We apply Physics-Informed Neural Networks (PINNs) to multi-machine systems and show that PINNs can be 10 to 1'000 times faster than conventional methods for time-domain simulations, while achieving sufficient accuracy.
2. We demonstrate that the trade-off between speed and accuracy for PINNs, and NNs in general, does not directly relate to power system size but rather to the complexity of the dynamics.Hence, NNs can solve larger systems equally fast as small ones, if the complexity of the dynamics is comparable.This is contrary to conventional methods, where the solution time is closely linked to the system size.
3. We examine further numerical properties of NNs for solving DAEs.Besides speed, one of their key benefits is that NNs do not suffer from numerical instability as they solve without any iterative procedure.We also discuss the challenges of flexibility in different parameter settings and we outline concrete directions for future work to resolve them.Section 2 describes the construction of a NN-based approximation for DAEs and how to incorporate physical knowledge in dtNNs and PINNs.Section 3 presents the case study and the training setup.Section 4 shows the results, on which basis we discuss the route forward in Section 5. Section 6 concludes.

Methodology
This section lays out how we train a NN that shall be used in time-domain simulations, how the physical equations can be incorporated transforming the NN to a dtNN and a PINN, and how the resulting approximation is assessed.

Approximating the solution to a dynamical system
A dynamical system is characterised by its temporal evolution being dependent on the system's state variables x, the algebraic variables y and the control inputs u: For clarity and ease of implementation, we express (1a) and (1b) as by incorporating y into x and adding M , which is a diagonal matrix to distinguish if a state x i is differential (M ii = 0) or algebraic (M ii = 0).We will use a NN to define an explicit function x(t) that shall approximate the solution x(t) for all t ∈ [t 0 , t max ], i.e., for the entire trajectory, starting from the initial condition x(t 0 ) = x 0 .

Neural network as function approximator
We use a standard feed-forward NN with K hidden layers that implements a sequence of linear combinations and non-linear activation functions σ(•).In theory, a NN with a single hidden layer already constitutes a universal function approximator [17] if it is wide enough, i.e., the hidden layer consists of enough neurons N K .In practice, restrictions on the width and the process of determining the NN's parameters might limit this universality as [18] elaborates.Still, a multi-layer NN in the form of (3) provides us with a powerful function approximator: The NN output x is the system state at the prediction time t.The input z 0 is composed of the prediction time t, the initial condition x 0 and the control input u.The weight matrices W k and bias vectors b i form the adjustable parameters θ of the NN.
For the training process, we compile a training dataset D train , that maps z 0 → x for a chosen input domain Z and contains N = |D train | points.For our purposes, the input domain is a discrete set of the prediction time, e.g. from 0 s until 10 s with a step size of 0.2 s, and a set of different initial conditions and control inputs, e.g.different power disturbances.The output domain is the rotor angle and frequency at each of the prediction time steps and for each of the studied disturbances.
During training we adjust the NN's parameters θ with an iterative gradientbased optimisation algorithm to minimise the so-called loss L for D train We do not aim for optimality of ( The simplest loss function for such a problem is to define the loss as the mismatch between the NN prediction x and the ground truth or target x, and measure it using the L2-norm.To account for different orders of magnitude (for example, the voltage angles in radians are often much larger than frequency deviations expressed in p.u.) and levels of variations of the individual states x, we first apply a scaling factor ξ x,i to the error computed per state i.A physics-agnostic choice of ξ x,i could be to use the state's standard deviation in the training dataset; for more details please see Section 3.2.We then apply the squared L2-norm for each data point j and take the average across the dataset D to obtain the loss (6)

dtNNs
As an intermediate step between standard NNs and PINNs, in this subsection we introduce a new regularisation term to loss function (6).We do so to avoid the previously mentioned over-fitting and improve the generalisation performance of the NNs.To the best of our knowledge, this paper is the first to introduce a regularisation term based on the update function f (x) from (2).Using the tool of Automatic Differentiation (AD) [19], we can compute the derivative of the NN, i.e., the time derivative of the approximated trajectory, d dt x and compute a loss analogous to (6) (with a scaling factor ξ dt,i ):

PINNs
As [7,8] introduced generally, and [10] for power systems, we can also regularise such a NN by comparing the derivative of the NN d dt x with the update function evaluated based on the estimated state f ( x): This physics-loss does not require the ground truth state x or its derivative.Quite the contrary, this loss can be queried for any desired point without requiring any form of simulation.We therefore can evaluate a dataset D f of randomly sampled or ordered collocation points that map to 0 to essentially assess how well the NN approximation follows the physicsany point where this physics loss equals zero is in line with the governing physics of (2).However, ( 9) defines a mapping that is not bijective, hence, L f (D f ) = 0 does not imply that the desired trajectory is perfectly matched, only that a trajectory complying with (2) is matched.As an example, an exact prediction of the steady state of the system will yield L f (D f ) = 0 even though the target trajectory in D train is different.

Combined loss function during training
To obtain a single objective or loss value for the training problem (5), we weigh the three terms as follows: where λ dt and λ f are hyper-parameters of the problem.Subsequently, we refer to a NN trained with λ dt = 0, λ f = 0 as "vanilla NN"1 , with λ dt = 0, λ f = 0 as "dtNN", and with λ dt = 0, λ f = 0 as "PINN".

Accuracy metrics
To compare across the different methods and setups, we monitor the loss L x in (6) as the comparison metric throughout the training and evaluation process and as an accuracy metric for the performance assessment.To get a more detailed picture, we also consider the loss value of single points, i.e., before calculating the mean in (6).However, the loss is dependent on the chosen values for ξ x,i and does not provide an easily interpretable meaning.Therefore, we use the maximum absolute error max AE S = max i∈S,j∈Dtest as an additional metric for assessment purposes, i.e., based on D test , but not during training.Whereas a state-by-state metric would capture most details, we opt to compute the maximum absolute error across meaningful groups of states i ∈ S that are of the same units and magnitudes.This aligns with the engineering perspective on the desired accuracy of a method.

Case study
This section introduces the test cases and the details of the NN training.
3.1.Power system -Kundur 11-bus and IEEE 39-bus system As a study setup, we investigate the dynamic response of a power system to a load disturbance.We use a second order model to represent each of the generators in the system.The update equation ( 2) formulates for generator buses as and for load buses as where P mech,i = P set,i + P dist,i at bus i, with P set,i representing the power setpoint and P dist,i the disturbance .The states x are the bus voltage angle δ i and the frequency deviation ∆ω i for generator buses, and the bus voltage angle δ i for the load buses.The buses are linked through the active power flows in the network defined by the admittance matrix Ȳbus and the vector of complex voltages V = V m e jδ , where the vector V m collects the voltage magnitudes and δ the bus voltage angles: The * indicates the complex conjugate and P e,i corresponds to the i-th entry of vector P e , i.e., the active power balance at bus i.In Section 4, we demonstrate the methodology on the Kundur 2-area system (11 buses, 4 generators) and the IEEE 39-bus test system (39 buses, 10 generators).For both systems we are using the base power of 100 MVA and ω 0 = 60 Hz.The network parameters and set-points stem from the case description of Kundur

NN training implementation
The entire workflow is implemented in Python 3.8 and available under [23].When we use the conventional numerical approaches to carry out the time-domain simulations for this system, the dynamical system is simulated using the Assimulo package [24] which implements various solution methods for systems of DAEs.The training process utilises PyTorch [25] for the learning process and WandB [26] for monitoring and processing the workflow.The implementation builds on [27] for the steps of the workflow.
All datasets comprise the simulated response of the system over a period of 20 s to a disturbance.The tested disturbance is the step response to an instantaneous loss of load |P dist,i | at bus i with a magnitude between 0 p.u. and 10 p.u., where i = 7 for the 11-bus system and i = 20 for the 39-bus system.We record these data in increments of ∆t and ∆P .The For the scalings ξ x,i in (6), we calculate the average standard deviation σ across all voltage angle differences δ ij2 and all frequency deviations ∆ω i , here the relevant groups of states S: Thereby, we aim for equal levels of error within all δ ij and ∆ω i states and account for the difference in magnitude between them.ξ dt,i and ξ f,i are all set to 1.0 to avoid adding further hyper-parameters, more elaborate choices based on system analysis or the database are conceivable.During training and testing ξ x,i is based on D train and D test respectively.The regularisation weights λ dt and λ f are hyper-parameters.For the latter, we incorporate a fade-in dependent on the current epoch E : where λ f,max is the maximum and λ f,0 the initial regularisation weight and E determines the "speed" of the fade-in.The fade-in causes that L x and L dt are first minimised and then L f helps for "fine-tuning" and better generalisation.
We apply the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm implemented in PyTorch in the training process, a standard optimiser for PINNs as [28] reviews.The set of hyper-parameters comprises K, N K , λ dt , λ f,max , λ f,0 E , and additional L-BFGS parameters.The available implementation [23] lists the range of the tested hyper-parameters and the choices for the different settings.All training and timing was performed on the High Performance Computing (HPC) cluster at the Technical University of Denmark (DTU) with nodes of 2xIntel Xeon Processor 2650v4 (12 core, 2.20GHz) and 256 GB memory of which we used 4 cores per training run.

Results
We first show in this section an assessment of NNs at run-time that highlights their methodological advantages compared to conventional solvers.We then perform a comprehensive analysis of the required training phase and the effect of physics regularisation.

NNs at run-time -opportunities for accuracy and computational cost
The primary motivation for the use of NN-based solution approaches is their extremely fast evaluation.Figure 1 shows the run-time for different prediction times.The NNs return the value of the states at prediction time t between 10 and 1'000 times faster than the conventional solvers depending on three factors: the prediction time, the power system size, and the solver/NN settings.
First, for NNs the run-time is independent of the prediction time as the prediction only requires a single evaluation of the NN.In contrast, the conventional solver's run-time increases with larger prediction times as more internal time steps are required.Second, the power system size strongly  affects the solver's run-time as shown by the increase when moving from the 11-bus to the 39-bus system.For the NN, it causes only a negligible change in run-time as only the last layer of the NN changes in size according to the number of states of the system, see (3c).Third, the "solver settings" play an important role; for conventional solvers, the internal tolerance setting governs its evaluation speed, while for the NN the size, i.e., its number of layers K and number of neurons per layer N K , determine the run-time.Figure 2 sets the above results in relation to the achieved accuracy.The points represent different disturbance sizes and prediction times and the accuracy is measured as the associated loss.If a solver yielded points in the lower left corner of the plot, it could be called an ideal solver -fast and accurate.Conventional solvers can be very accurate when the internal tolerance is set low enough, but at the price of being slower to evaluate.Allowing larger tolerances accelerates the solution process slightly at the expense of less accurate solutions.However, this trade-off is limited by the numerical stability of the used scheme; for too high tolerances the results would be considered as non-converged.In case of NNs, their superior speed is weighed against less accurate solutions.The accuracy of NNs is not only controlled by their size but also, very importantly, by the training process.The achievable accuracy is therefore determined before run-time, in contrast to the tolerance of a conventional solver, which is set at run-time.As a final remark related  to Figure 2, we need to highlight that while less adjustable, in contrast to conventional solvers, NNs do not face issues of numerical stability as their evaluation is a single and explicit function call.We lastly want to show how the accuracy, here expressed as the maximum absolute error across all voltage angle states max AE δ for better intuition, relates to the NN size and the power system size.The boxplots in Fig. 3 represent the evaluation of 20 NNs with the same training setup but with different random initialisations of their parameters.We observe that deeper and wider NNs usually perform better on this metric.However, the largest NNs for the 11-bus system (N K = 128 and K = 4 or K = 5) show a larger variation than the smaller NN which means that the initialisation of the NNs affect their performance on the test dataset.This arises in models with a large representational capacity, loosely speaking models with many parameters, hence multiple parameter sets can lead to a low training loss but not all of them generalise well, i.e., have low error on the test dataset.The other, at first sight counter-intuitive, observation is that the 39-bus system performs better on the metric than the smaller 11-bus system.This can be attributed to the complexity of the target function, i.e., of the dynamic responses.The 11-bus system exhibits faster and more intricate dynamics for the presented cases, hence, it is more difficult to approximate their evolution.We could therefore achieve the same level accuracy for the 39-bus system with a smaller NN than for the 11-bus system.In terms of run-time, this would mean that the 39-bus system could be faster to evaluate than the 11-bus system.This characteristic of NNs effectively overcomes the relationship seen for conventional solvers that larger systems cause longer run-times3 as we have seen in Fig. 1.

NNs at training time -a trade-off between accuracy and computational cost
The benefits of NNs compared to conventional solvers at run-time become possible by shifting the computational burden to the NN training stage, i.e., the pre-computation of the solution.In this stage, we examine the trade-off between accuracy and the computational cost of the training.This tradeoff is influenced by several factors; here, we consider 1) the used training dataset, 2) the type of regularisation, and 3) the optimisation algorithm.To investigate the influence of the training dataset and the regularisation, we use the 11-bus system with a NN of size K = 5 and N K = 32.We consider five scenarios as shown in Table 1 with different numbers of data points |D| and the three "flavours" of NNs which we introduced in Section 2: vanilla NN, dtNN, PINN.The datasets are created by sampling with different increments of time ∆t and the power disturbance ∆P .As expected, more data points incur a higher dataset creation cost, however, it also depends what "kind" of additional data points we generate.When we halve the time increment ∆t, e.g., from scenario A to B or from scenario C to D, the dataset generation cost remains approximately the same.However, this does not hold if we halve the power increment ∆P .When simulating a certain trajectory, it is basically free to evaluate additional points, i.e., reduce ∆t, since interpolation schemes can be used for intermediate points.In contrast, any additional trajectory that needs to be simulated adds to the total cost.Similarly for "free", we can obtain the necessary values for the dtNN regularisation as this only requires the evaluation of the right hand side in (2).The PINN regularisation also incurs only negligible dataset generation cost, as it is a mere sampling of the collocation points |D f |, here 5151, without the need for any simulation.Therefore, the additional regularisation come at no or negligible cost compared to generating more data points unless they lie on trajectories that are evaluated anyways.
Figure 4a shows the resulting max AE δ across 20 training runs with different initialisations of the NN parameters.Unsurprisingly, the error metric improves with more data points, i.e., from scenario A to E, and additional regularisation, i.e., from a vanilla NN to a dtNN and a PINN.In scenario E, which has the largest dataset, all three network types perform on a similar level, whereas PINNs otherwise clearly deliver the best performance.Furthermore, the performance becomes more consistent, i.e., less variance, towards scenario E. A very sensitive issue is the point when to stop the training process to prevent over-fitting.In this study, we use the best validation loss as the indicator to determine the "best epoch" and Fig. 4b shows the results.PINNs consistently train for more epochs and only for scenario E the three NN types train for approximately the same number of epochs.In Fig. 5 we plot the validation loss over the training epoch.In scenario D, we can clearly see, that while the vanilla NNs and the dtNNs do not improve much  1) further after about 100 epochs, PINNs still see a significant improvement in terms of accuracy.From around this point onward, the physics-based loss L f drives the optimisations, the other training loss terms are already very small.This behaviour partly stems from the fade-in of L f but also from the fact that L x and L dt are based on much smaller datasets except for scenario E, in which the improvement of the accuracy progresses at similar speeds for all three NN types.PINNs offer us therefore the ability to achieve accuracy improvements for more epochs but we can also terminate them early if the achieved accuracy is sufficient to reduce the computational burden.By mul-tiplying the number of epochs with the computational cost per epoch, we can estimate the total computational cost of the training.The vanilla NN and dtNN required about 0.17 s and 0.18 s per epoch for scenarios A-D and 0.40 s and 0.43 s for scenario E while the PINN constantly needs 0.66 s per epoch due to the collocation points.These numbers are very implementation and setup dependent, but show the trend that PINNs have higher cost per epoch due to the computation of L f while the dtNN is only slightly more expensive than a vanilla NN.The total up-front cost, comprised of data generation cost and training cost, has then to be evaluated against the desired accuracy to find an efficient setup.This trade-off is again very dependent on the case study.For the 39-bus system the dataset generation cost is 2.5 times more while the cost per epoch only increases by a few percentage points.Figures 3 and 4a displayed the maximum absolute errors on the test dataset as the accuracy metric, which is a critical metric of any solution approach.However, the accuracy of a NNs must also be seen as a distribution across data points as it is visible in Fig. 2. We, therefore, show in Fig. 6 the resulting distribution of the loss values as a function of the two input variables, i.e., the prediction time t and the disturbance size ∆P i , for scenarios A, C, and E and the NN types.The plots clearly show that for a majority of points in the test dataset, the predictions are much more accurate than the maximum values.This is true in particular around the data points.These are clearly visible in scenario A by the "indents".The panel for the dtNN and scenario C in Fig. 6a shows an extreme case where the prediction at the available data points is very accurate but the interpolation in between produces high errors.In comparison with the vanilla NN, the additional regularisation of the dtNN leads to a more unbalanced error distribution.In contrast, the PINN shows overall higher levels of accuracy but also more balanced error distributions thanks to the evaluation of the collocation points.We observe two more trends: Smaller prediction times are associated with higher errors due to the faster dynamics; and secondly, larger disturbances tend to show larger errors as they include larger variations of the output variables.These results show the importance of the dataset and the regularisation on the overall characteristics of a NN-based predictor, which must be considered for assessing the trade-off between training time and accuracy.
We lastly touch upon the effect of the optimiser on the training process, in this case the L-BFGS algorithm.The hyper-parameters of the algorithm strongly influence the required training time but also the achieved accuracy which is shown in Fig. 7.The points represent the outcome from random   E).The shaded areas correspond to 100%, 80%, 50% of the errors and the black represents the median.(for the definition of scenarios A, C, E, see Table 1) hyper-parameter settings and they clearly show a strong relationship between training time and accuracy.Furthermore, the optimiser's internal tolerance setting (coloured) strongly influences this relationship.

Discussion
The results in the previous Section illustrate how NN-based approaches for solving DAEs offer a number of advantages at run-time: 10 to 1'000 times faster evaluation speed, no issues of numerical instability, and, in contrast to conventional solvers, their solution time does not increase with a growing power system size.These properties come at the cost of training the NNs.To assess an overall benefit in terms of computational time we, therefore, have to consider the total cost as the sum of up-front cost C up-front for the dataset generation and training and the run-time cost C run-time per evaluation n: Figure 8 shows a graphical representation of ( 17) for conventional solvers and NNs.It is clear, that NN-based approaches need to pass a critical number of evaluations n critical to be useful in terms of overall cost, unless other considerations like numerical stability or real-time applicability outweigh the cost consideration.The results in Sections 4.1 and 4.2 discussed the various "settings" that affect run and training time -in Fig. 8 they would correspond to the dashed lines.For conventional solvers, changing these settings affects the slope, whereas for NNs they mostly impact the y-intercept, i.e., C up-front ; in either case, as expected, a different "setting" will change n critical .Hence, the decision for using NN-based methods largely hinges around whether we expect sufficiently many evaluations n.Here, it is important to point out that the NN will be trained for a specific problem setup and a change in the setup, e.g., another network configuration, requires a new training process.In this aspect of "flexibility", conventional solvers have an important advantage over NN-based approaches.Addressing this lack of flexibility is of paramount importance for adopting NNs-based simulation methods and we see three routes forward for this challenge: 1) Reducing the up-front cost C up-front by tailoring for example the learning algorithms, the used NN architectures, and regularisation schemes to the applications; this can largely be seen in the context of actively controlling the trade-off between accuracy and training time.2) Finding use cases with large n, i.e., highly repetitive tasks.3) Designing hybrid setupssimilar to SAS-based methods -in which repetitive sub-problems are solved by NNs and conventional solvers handle computations that require a lot of flexibility.

Conclusion
This paper presented a comprehensive analysis of the use of Physics-Informed Neural Network (PINN) for power system dynamic simulations.We show that PINNs (i) are 10 to 1'000 times faster than conventional solvers, (ii) do not face issues of numerical instability unlike conventional solvers, and, (iii) achieve a decoupling between the power system size and the required solution time.However, PINNs are less flexible (i.e. they do not easily handle parameter changes), and require an up-front training cost.Overall, this makes PINN-based solutions well-suited for repetitive tasks as well as task where run-time speed is crucial, such as for screening.
Besides the comparison between conventional and NN-based methods, this paper conducts a deeper analysis on the parameters that affect the performance of the NN solutions.In that respect, we introduce a new NN regularisation, called dtNN, as a intermediate step between NNs and PINNs.We show that PINNs achieve overall higher levels of accuracy, and more balanced error distributions thanks to the evaluation of the collocation points.
test dataset D test which shall serve as a ground truth uses ∆t = 0.05 s and ∆P = 0.05 p.u., resulting in |D test | = 401 × 201 = 80601 points.For the training datasets D train used in Section 4.2 we create datasets with ∆t ∈ [0.2, 1.0, 2.0]s and ∆P ∈ [0.2, 1.0, 2.0]p.u..The validation datasets D validation for those scenarios are offset by ∆t 2 and ∆P 2 .

Figure 1 :
Figure1: Run-time as a function of the prediction time t for NNs of different size and a conventional solver with varied tolerance settings .Tests for the 11-bus and 39-bus system with a disturbance P i = 6.09p.u..

Figure 2 :
Figure 2: Evaluation of run-time and accuracy for the 11-bus system for varied solver tolerances and NN sizes (K layers and N K neurons per layers).Point-wise evaluation for 10 disturbance sizes P 7 and prediction time as in Fig. 1.

Figure 3 :
Figure3: Maximum absolute error of angle δ on the test dataset for the 11-bus and 39-bus system with varying NN sizes, i.e., number of layers K and neurons per layer N K .

Figure 5 :
Figure 5: Validation loss a function of trained epochs.The shadings signify the range from 20 randomly initialised runs.
Distribution of L x Test ) as a function of the prediction time t Distribution of L (D Test ) as a function of the power disturbance size P 7

Figure 6 :
Figure 6: Distribution of L x (D Test ) for different NN flavours (vanilla NN, dtNN, PINN) and scenarios (A,C,E).The shaded areas correspond to 100%, 80%, 50% of the errors and the black represents the median.(for the definition of scenarios A, C, E, see Table1)

Figure 7 :
Figure 7: Influence of hyper-parameters of the L-BFGS-optimiser on the trade-off between training time and achieved accuracy.The tolerance level of the optimiser has a large influence as shown by the coloured clusters of points.

Figure 8 :
Figure 8: Total cost of different approaches in dependence of the number of evaluations.
4. Having shown that NNs do have significant benefits and desirable properties, we carry out a comprehensive analysis on the performance and training of NNs and PINNs that can be helpful for future applications.In this context, we introduce dtNNs, a regularised form of NNs.dtNNs are an intermediate methodological step between NNs and PINNs as they are regularised by the time derivatives at the training data points.

Table 1 :
Overview of the scenarios with different training datasets.