Accelerating System Adequacy Assessment using the Multilevel Monte Carlo Approach

Accurately and efficiently estimating system performance under uncertainty is paramount in power system planning and operation. Monte Carlo simulation is often used for this purpose, but convergence may be slow, especially when detailed models are used. Previously published methods to speed up computations may severely constrain model complexity, limiting their real-world effectiveness. This paper uses the recently proposed Multilevel Monte Carlo (MLMC) framework, which combines outputs from a hierarchy of simulators to boost computational efficiency without sacrificing accuracy. It explains which requirements the MLMC framework imposes on the model hierarchy, and how these naturally occur in power system adequacy assessment problems. Two adequacy assessment examples are studied in detail: a composite system and a system with heterogeneous storage units. An intuitive speed metric is introduced for easy comparison of simulation setups. Depending on the problem and metric of interest, large speedups can be obtained.


I. INTRODUCTION
Operational and planning problems in the power system domain often involve the assessment of (sub-)system performance across a range of probabilistically modelled scenarios. For all but the simplest power system models, this cannot be done analytically, and Monte Carlo (MC) simulations are used instead. MC simulations are a powerful general purpose computation method with a long tradition in power system applications [1], but convergence to the correct answer may be slow. A number of different variance reduction methods exist to speed up convergence of Monte Carlo estimates, e.g. [1], [2]. One of these, importance sampling, has recently grown in popularity for power system applications, especially in combination with automatic tuning of model bias parameters using the cross-entropy approach [3], [4]. However, implementing importance sampling typically requires deep insight into the model, and limits the design freedom, e.g. for simulations involving complex decision making or sequential actions. This research was supported by the SMART-SAFE project, funded through the HubNet (Extension) programme (EPSRC EP/N030028/1).
The Multilevel Monte Carlo (MLMC) method was introduced in the context of computational finance to speed up averaging over sample paths, without compromising model detail or accuracy [5]. Initial applications involved the combination of multi-resolution models (geometric sequences), but other applications have subsequently evolved. A good overview of the method and its applications is given in [5]. The MLMC approach has recently been used in a reliability context to speed up the estimation of the average mission time of large systems in [6]. In [7], electrical distribution system risk metrics were estimated using MLMC, using a multi-scale approach to simulate component failures and repairs.
This paper considers how the MLMC framework [5] can be used to accelerate risk calculations, in particular in applications relating to system adequacy assessment of complex systems. The contributions of this work are as follows.
1) A concise overview of the MLMC approach to the estimation of risks is given. It is shown how the structure required for MLMC simulation naturally occurs in adequacy assessment problems, and can often be implemented with minimal changes to the constituent models. Two examples of common model patterns are given. 2) An intuitive speed metric is introduced that allows for fair comparison between Monte Carlo simulation approaches, and across risk measures. 3) Two case studies are presented, each representing one of the common model patterns. The MLMC approach results in large speedups, in one case speeding up simulations by a factor 2000 compared to conventional Monte Carlo sampling. The sensitivity of computational speed to the model stack is investigated.

A. Mathematical problem statement
Power system performance indicators often take the form of risk measures q that are expressed as the expectation 1 of a performance indicator X (a random variable), i.e. q = E [X]. Formally, the random variable X may be seen as a function X : Ω → R that associates a numerical outcome with every system state ω ∈ Ω in a sample space Ω. The probabilistic behaviour of the system, and therefore of X, is defined by associating probabilities with events (sets of states) E ⊆ Ω.
In the context of system adequacy assessment, the probabilistic behaviour of a power system is typically specified using a bottom up model that defines demand levels, component status, generator output levels, etc. This model generates both the sample space Ω (the set of all possible combinations of component states) and the associated probabilities. The function X deterministically evaluates any specific state ω ∈ Ω and computes a numerical performance measure for that state. The risk measure q = E [X] is then the (probability weighted) average of the function X over all states.
For even moderately complex systems, it is not possible to compute the quantity of interest q = E [X] analytically, nor can it be computed by enumeration of all states in Ω. In such cases, it is common to resort to Monte Carlo simulation, in which power system states ω (i) , i = 1, 2, . . . are generated using the probabilistic bottom-up model and analysed to provide relevant outcomes X(ω (i) ). It should be noted that at any time, multiple outcomes X (a) , X (b) , . . . can be measured simultaneously, at little to no extra cost. In the mathematical analysis that follows, only a single risk measure q = E [X] is discussed, but the methods can trivially be applied in parallel.

B. Conventional Monte Carlo
A brief summary of conventional Monte Carlo simulation is given in this section, as a point of reference for following sections. In conventional Monte Carlo simulation, the quantity q = E [X] is approximated by the Monte Carlo estimator where {X (1) , . . . , X (n) } represents a random sample 2 from X, with each X (i) independent and identically distributed to X. Note that we distinguish the random variable X (i) that represents the i-th random draw from X, and its realisation x (i) in a particular experiment or simulation run. The MC estimate for a simulation run is thus given by We proceed to use the generic expression (1) to reason about the convergence of the result. The error ∆Q MC obtained in this approximation is The MC estimatorQ is unbiased, and, as a result of the central limit theorem, for a sufficiently large sample size n, ∆Q MC is normally distributed, so that The variance ofQ MC follows from the MC estimator (1): As a result, the standard error σQ For quantification of the computational efficiency of an MC simulation, we denote by τ the average time required to generate a single realisation x (i) . The time spent to generate a sample of size n is then Using this relation, the variance (5) can be expressed as

C. Multilevel Monte Carlo
For multilevel Monte Carlo (MLMC), we assume to have at our disposal a hierarchy of models M 1 , . . . , M L that generate random outputs X 1 , . . . , X L , which approximate X with increasing accuracy. Specifically, we consider the case where the top level model M L is the model of interest, i.e. X L ≡ X and q = E [X L ]. The lower level models M 0 , . . . , M L−1 are used to speed up the calculations.
The material in this section is generic, and can be found using slightly different notation in e.g. [5]. The basis for the MLMC method is the trivial identity that is the telescopic sum: The quantity of interest q is decomposed into a crude estimate plus iterative refinements. In MLMC, each of the terms r 0 , . . . , r L is independently estimated using (1). This results in the MLMC estimator An additional superscipt has been added to the random model output X (k,i) l to denote the level pair k it is associated with. The model outputs are distributed according to X (k,i) l d = X l (equality in distribution) and they are assumed to be mutually independent, except for the pairs (X (l,i) l , X (l,i) l−1 ), which are jointly sampled from a common distribution in such a way that the marginal distributions are X l and X l−1 , respectively.
The MLMC estimator is unbiased and asymptotically normally distributed, by virtue of its constituent MC estimators. Its variance follows from the MLMC estimator (9a) and the mutual independence of sampled values: Here, the superscipt (l) on the simulation outputs is maintained, because the covariance term depends on the joint sampling process of the pairs (X . Clearly, the variance is minimised if the sample pairs are highly correlated. For a given set of models M 0 , . . . , M L , the challenge is to optimally choose the samples sizes n l . Defining the average time to generate a single value y (i) l as τ l , the total time taken to produce an MLMC estimate is given by The optimal sample counts n l can now be determined by minimising the variance (10) with respect to n 1:L while keeping t ML constant. Using (12) to substitute n 0 and setting dσ 2 QML /dn l = 0 for l = 1, . . . , L results in optimal sample counts (ignoring their discrete nature) With this optimal choice of n l , the computational effort spent on each level pair l is proportional to σ Y l √ τ l (see (12)), and the total variance (10) can be expressed as a function of computational time as

D. Measuring simulation speed
By comparing the expressions for the variance of the conventional and multilevel MC approaches, we can investigate the potential speedup resulting from the MLMC approach. Let us consider the timest MC andt ML required to converge to a given varianceṽ = σ 2 QMC = σ 2, * QML . Then, combining (5) and In practice, the variance of the lowest level is similar to that of the direct MC simulator, σ Y0 ≈ σ X , and the cost of evaluating the highest level pair is at least that of a direct evaluation of the highest level, i.e. τ L ≥ τ . However, considerable speedups are possible if σ Y l √ τ l ≪ σ X √ τ for all l. Intuitively, this occurs when each simplified model M l−1 is much faster than the next level M l , but returns very similar results for the majority of samples. Examples where this occurs naturally in the context of power system adequacy assessment will be discussed in Sections IV and V. In order to compare the compuational efficiency of various implementation, we require an operational definition of 'computational speed'. Monte Carlo simulations are often run with the goal to estimate the quantity q with a certain relative accuracy, expressed using the coefficient of variation c q = σ Q /q. We note that both (7) and (14) can be brought into the form This implicitly defines the computation speed s q as .
This definition may be compared with the 'figure of merit' used in [8]. The inclusion of the quantity q 2 in (17) has a number of advantages, provided that q = 0. First, the speed has dimensions 1/time, independent of the measure q. Second, speeds corresponding to different metrics are directly comparable. For example, when z LOLE < z EENS , this indicates that the LOLE estimator is the limiting factor in achieving convergence to a given coefficient of variation. And finally, the speed metric and the implied computational distance are easily interpretable in terms of simulation outcomes. For example, in order to achieve a coefficient of variation of 1% ('distance' 10,000) using a speed of 10 s −1 , a simulation run of 1000 s is required.
In the course of a simulation run, (17) can be used to estimate the computational speed, replacing q and σ Q by their empirical estimates. The speed z q for MC and MLMC estimation follow from (5) and (10) as and

A. Joint sample spaces
The core of the MLMC algorithm is the joint generation of sample pairs (X (l,i) l , X (l,i) l−1 ), used in (9b), in such a way that they are maximally correlated. The random variables X l and X l−1 have sample spaces Ω l and Ω l−1 , respectively, which must be combined into a joint sample space Ω ′ l . We highlight two common model patterns that naturally achieve this. 1) Pattern 1: component subsets: One common occurrence in system adequacy studies is that the lower level model M l−1 omits components that are present in the higher level model M l . As a result, the sample space Ω l can be written as a Cartesian product where A l is the sample space of components present in M l but not in M l−1 . We may then identify Ω ′ l and Ω l . In practical terms this means that samples can be generated at the higher level l and unused elements are discarded for the simpler models M l−1 . An example of this design pattern is explored in Section IV.
2) Pattern 2: identical randomness: It is also easy to conceive of scenarios where M l−1 and M l−1 have identical sample spaces, so that This occurs when both models are driven by the same set of random inputs, but the higher level model performs more complex processing. An example of this model pattern is given in Section V.

B. Direct evaluation of expectations
Occasionally, the base model M 0 is sufficiently simple to permit direct computation of r 0 = E [X 0 ], either analytically or using a numerical approximation procedure. In those cases, the long run efficiency is enhanced by evaluating r 0 directly instead of using its MC estimate. The standard deviation σ Y0 is then equal to 0, or a value commensurate with the accuracy of the numerical approximation of r 0 . Although direct evaluation of the lowest level is nearly always preferred, there may be cases where the evaluation of E [X 0 ] is a comparatively time-consuming operation and the optimal trade-off is more complex.
In the examples that follow in Sections IV and V, direct evaluation is always possible, and results in faster convergence of the overall MLMC estimator.
The use of an analytical result at the lowest level also highlights a connection between the MLMC method and the control variate approach [5]. The control variate similarly makes use of a simplified model for which an explicit solution can be calculated. It can therefore be considered as a special case of a bilevel MLMC procedure where the value E [X 0 ] is known and the output X 0 is scaled for optimal convergence. The control variate approach was used in [2] to speed up composite system adequacy assessment -a problem that is also addressed in Section IV.

C. Implementation
Simulations were implemented in Python 3.7 and were run on an Intel i5-7360U CPU under macOS 10.14.6. A generic multilevel sampler was developed with specialisations for particular simulation studies. No effort was made to optimise the execution speed of individual models, because the aim of this paper is not to maximise execution speed per se, but to investigate the relative speed between sampling strategies.
All MLMC simulations started with an exploratory run in which a sample with fixed size n (0) is taken at each level set Y l , in order to determine initial estimates of the evaluation cost τ l and varianceσ 2 Y l . This initial run is followed by a sequence of follow-up runs, each parameterised by a target run time t * . Given t * , optimal sample sizes at each level were determined using (13) and the most up to date estimates of evaluation timesτ l and variancesσ 2 Y l . For all results in this paper, 10 runs with an estimated run time of 60 seconds (each) were used, for a total runtime of approximately 600 seconds.
One practical concern with determining optimal sample sizes using (13) is that the values of σ 2 Y l are estimated using relatively small data sets. In power system risk assessment, the simulation outputs X l often have long-tailed distributions, so that there is a high probability that thatσ 2 Y l ≪ σ 2 Y l (or evenσ 2 Y l = 0). If the estimated value is used naively in (13), this leads to undersampling of Y l , thereby exacerbating the problem because fewer samples are generated that can correct the estimate of σ 2 Y l . To mitigate this risk, the variance estimators were adjusted as follows. First, a conservative estimate for the variance of X was obtained as It was then assumed that the decrease in true variance between subsequent levels was bounded by a factor α, resulting in updated estimatesσ for those pairs l where E [Y l ] is estimated by sampling. For the simulations, the value of α was heuristically set to 0.1. Finally, in simulations, multiple risk measures q (a) , q (b) , . . . were estimated in parallel. In determining optimal sample sizes, one of these was selected as the 'target measure' to optimise for, so that its mean and variance estimates were inserted in (13).

IV. COMPOSITE SYSTEM ADEQUACY ASSESSMENT
The first case study is a system adequacy assessment of the single area IEEE Reliability Test System (RTS) [9]. A twolevel MLMC approach is used, where the upper level, i.e. the study of interest, is a hierarchical level 2 (HL2) study [1]: a composite system adequacy assessment that takes into account transmission line outages and constraints. The lower level HL1 is a single node assessment that omits the transmission system. This is in accordance with the subset model pattern in Section III-A1.

1) Composite system adequacy assessment (HL2):
The RTS model defines outage probabilities of generators and transmission lines, which were modelled as independent two state Markov models. Maintenance and transient outages were not considered. Load levels were sampled by uniformly selecting an hour from the annual demand trace and assigning loads to each node in proportion to the maximum nodal demands. Therefore, at the upper level (l = 1), a sampled system state ω where the matrix M (i) = DA(A T DA + 1/|N |) −1 relates bus injections and line flows, using the sample-dependent linenode incidence matrix A ≡ A (i) and the diagonal matrix D of inverse line reactances. The element-wise constant 1/|N | ensures invertibility, eliminating the need for a designated slack bus. In cases where line outages resulted in multiple islands, problem (26) was formulated and solved for each island independently and the curtailments were summed to obtain the total system curtailment. Linear optimisation was performed using scipy.optimize.linprog, with the revised simplex method.

B. Results
For all runs, an initial exploratory run with n (0) = 100 was used, followed by 10 runs of approximately 60 seconds. The target risk measure for sample size optimisation was EPNS. Unless stated otherwise, thermal line ratings were scaled to 80% of the nominal values, to tighten network constraints. Throughout, Monte Carlo estimates of risk measures are given with the relevant number of significant digits, followed by the estimated standard error in parentheses. Thus, 1.71(13)×10 −3 stands for an estimate of 0.00171 with a standard error of 0.00013. Table I compares the results of three different estimators. The top row is the conventional Monte Carlo estimator that directly performs the HL2 study. The middle row represents a two-level MLMC approach where HL2 sampling is combined with HL1 sampling, immediate leading to significant speedups of 2.5 (for LOLP) and 10 (for EPNS). In the third configuration (bottom row), further speedups are obtained by eliminating sampling of the lower level model, and computing the lower level estimates r LOLP,0 = E [X LOLP,0 ] and r EPNS,0 = E [X EPNS,0 ] directly by convolution using 1 MW discretisation steps.
An interesting observation is that, for the regular MC sampler, the speed of LOLP estimation (0.31 s −1 ) is larger than that of EPNS estimation (0.17 s −1 ). However, the MLMC sampler sees much more substantial speedups for EPNS estimation than for LOLP estimation. This is only partially caused by the EPNS-focused sample size optimisation. The other factor is that the discontinuous LOL performance measure (28) is less amenable to successive approximation [5]. Table II gives insight into the multilevel structure of the regular MLMC estimate. For both LOLP and EPNS, the refinement term r 1 is substantially smaller than the crude estimate r 0 . More importantly, sampling from the HL1 model is substantially faster (0.023 ms per evaluation) than the HL2-HL1 difference term (5.4 ms per evaluation), due to the linear program (26) involved in the latter. The MLMC algorithm  Table III shows the impact on convergence speed of varying the thermal line ratings between 80% and 100% of the nominal values. Higher line ratings cause fewer constraints, which results in a slight reduction in speed for the regular MC sampler. On the other hand, the MLMC sampler experiences very large speedups as the difference between the results from the HL1 and HL2 models becomes smaller, so that fewer (expensive) HL2 evaluations are required. Once again, the gains in EPNS estimation speed exceed the gains in LOLP estimation speed.

V. DISPATCH OF STORAGE
The second example concerns the assessment of system adequacy in the presence of energy-constrained storage units (e.g. batteries). The energy constraints couple decisions in subsequent time slots, thus necessitating the use of timesequential Monte Carlo simulations. Convergence for timesequential simulations tends to be much slower than for snapshot problems, due to significant correlations in visited system states. An additional complication is deciding an appropriate dispatch strategy for energy storage units. A greedy EENSminimising discharging strategy was recently proposed in [10], as a reasonable default dispatch strategy for adequacy studies.

A. Models
The Great Britain (GB) adequacy study from [10] is reproduced here, with an eye on speeding up estimation of LOLE and EENS risks using the MLMC approach. Individual simulations are run for a sequence of 8760 hours (1 year). The system performance in a simulated year is driven entirely by the net generation margin trace where the sampled state ω (i) consists of the demand trace d t . Annual demand traces are chosen randomly from historical GB demand measurements for 2006-2015 (net demand, [11]). Annual wind traces are similarly sampled from a synthetic data set for hypothetical GB wind power output for the period 1985-2014, derived from MERRA reanalysis data and an assumed constant distribution of wind generation sites with an installed capacity of 10 GW [12]. Conventional generation traces are generated using an assumed diverse portfolio of thermal units, the portfolio of 27 storage units was based on storage units contracted in the GB 2018 T-4 capacity auction. The reader is referred to [10] for further details.
We consider three different storage dispatch models. The resulting storage dispatch (with sign convention that consumption is positive) is denoted by S t,l (ω), and is entirely determined by the net generation margin M t (ω (i) ). All three models are defined on the same sample space Ω, providing an example of the model pattern described in Section III-A2. However, the models differ tremendously in computational complexity, as is clear from the descriptions below.
1) EENS-optimal dispatch: The storage dispatch S t,2 (ω) is computed using the algorithm given in [10]. It is sequential and requires complex logic for each step.
2) Sequential greedy dispatch: The storage dispatch S t,1 (ω) is computed using a heuristic approximation of the EENS-minimising policy. Storage units s are sorted by decreasing time to go (from full) e s /p s , where e s and p s are energy and discharge power ratings, respectively. Then, a sequential greedy dispatch is performed, charging when possible, and discharging only when load curtailment is the alternative. Evaluating this model requires one sequential pass per storage unit, but the simulation steps are trivial.
3) Constant peak-shaving dispatch: For this model, the storage fleet is optimistically approximated by a single storage unit with e = s e s and p = s p s . The historical mean daily demand profiled 1:24 is used to compute a peak-shaving dispatch, usings Because S t,0 is a deterministic load offset, risk measures for this model can be computed by convolution.

4) Risk measures:
The net generation margin M t (ω) and storage dispatch S t,l (ω) result in a curtailment trace as follows The loss of load expectation (LOLE) and expected energy not supplied (EENS) risk measures can be computed using the performance measures

B. Results
For all simulations, an exploratory run with n (0) = 20 was used, followed by 10 runs of 60 seconds, where sample sizes were optimised for the EENS risk measure. In all cases, the crude estimate r 0 = E [Y 0 ] was evaluated using a convolution approach. Results are shown in Table IV, comparing the performance of three MLMC architectures with direct MC simulation. A three-layer architecture using a model without storage as a bottom layer achieved speedups of 10 (LOLE) and 30 (EENS), but much better results were obtained when the daily average dispatch was used as the crude model M 0even when a two-layer MLMC stack was created by omitting the intermediate model (sequential greedy dispatch).
The results show that the MLMC performance is very sensitive to the choice of levels, but robust speedups are available even for sub-optimal model choices. The best performing architecture is further analysed in Table V. It can be seen that the contribution from the final refinement r 2 is minimal, i.e. the heuristic model is very accurate, which is key to the observed speedup of 2113. The MLMC algorithm dynamically adjusted sample sizes to generate more samples evaluating Y 1 = X 1 − X 0 than on the costly evaluation of X 2 (accurate model).

VI. CONCLUSIONS AND FUTURE WORK
This paper has set out how the MLMC approach can be applied to power system risk analysis, and specifically to system adequacy assessment problems. Common model patterns were identified that are particularly amenable to MLMC implementation, and a computational speed measure (17) was introduced to quantify simulation speed in a way that is easily comparable across tools, Monte Carlo methods and risk measures. Two case studies illustrate the potential for speeding up estimation of risk measures, and the ability to apply the method to complex simulations.
In future work, we will consider automatic selection of optimal model stacks, and explore the scope for the application of multi-index Monte Carlo [13] schemes.