Quantitative analysis of power systems resilience: Standardization, categorizations, and challenges

Power systems incur considerable operational and infrastructural damages from high impact low probability events such as natural disasters. It therefore becomes imperative to quantify the impact of these disruptive events on power system performance so that adaptive actions can be effectively applied. This impact can be evaluated using resilience metrics which should be able to assess the transitions between the different phases in which a power system resides when subjected to an extreme event. Also, the metrics can aid in the evaluation of the effectiveness of the power system adaptive strategies. However, challenges exist since developed metrics do not proceed with a standardized framework


Introduction
Power system infrastructure suffers substantial damages due to natural disasters, creating widespread and prolonged electricity outage for millions of customers [1]. In the United States, estimates show that weather-related power outages cause $25-$70 billion in economic losses annually [2]. The Derecho storm in June 2012 left approximately 4.2 million customers across 11 mid-atlantic states without electricity for 10 days [3]. In October 2012, hurricane Sandy caused electricity outage to more than 9.3 million customers across 20 states in the U.S. [4]. In September 2017, hurricane Maria struck Puerto Rico destroying 80% of the island's power infrastructure which led to 3.6 million residents loosing power for several months [5]. In May 2018, a 6.9 magnitude earthquake caused power outages to more than 14,000 customers in Hawaii [6], while in 2019 and 2020, the cost of natural disasters have approximated $48 billion with more than 74 deaths [7]. These devastating impacts on power infrastructure, and subsequently on the socio-economic activities of communities, call for methods and metrics to quantify the resilience of the power grid against disruptive events. As emphasized in the U.S. presidential policy directive 21 [8], these quantitative power grid resilience metrics would also guide utilities and policy makers towards informed decisions for resilience enhancement.
☆ The work described in this paper was supported by funds from the US Department of Energy under award DE-OE0000895. This is however challenging because there are still different perceptions of resilience and, its systematic and effective quantification.

Power systems resilience
The concept of resilience was first introduced by Holling as a measure to determine the ability of an ecological system to absorb changes to its state and driving variables [9]. Specifically, resilience is defined as a system's ability to withstand and minimize the impact of disruptions provoked by an external event, as well as the ability of the system to satisfy or maintain its performance after the disruption [10]. Resilience is typically characterized by three main components: (a) the magnitude of shock that the system can absorb and remain within a given state, (b) the degree to which the system is capable of self-organization, and (c) the degree to which the system can build capacity for learning and adaptation [11]. Since then, resilience has been extended to power systems where it has been defined as "The ability to withstand and reduce the magnitude and/or duration of disruptive events [12], which includes the capability to anticipate, absorb, adapt to, and/or rapidly recover from such an event" [13]. Specifically, in power systems, resilience is defined as the ability of the grid to prepare for and adapt to changing operating conditions, as well as withstanding and recovering rapidly from major disruptions caused by naturally occurring threats and incidents or deliberate cyber-physical attacks [8,14].

The resilience trapezoid
Resilience can be analyzed in different phases [15], in which the system resides when exposed to a disruptive event as shown in Fig. 1. Although interchangeable terminologies could be found in literature for these phases, the concept of the resilience trapezoid remains the same. As illustrated by the resilience trapezoid in Fig. 1, the disruption transition phase is between the time of event occurrence to the end of the disruptive event impact (t e − t d ), the outage phase is that in which the system is degraded following the end of the disruptive event until restoration efforts commence (t d − t s ), while the recovery transition phase begins from the time of commencement of restoration to full or satisfactory functionality (t s − t r ). Hence, enhancing grid resilience involves conducting several preventive, corrective and restorative actions in these phases.
As shown in Fig. 1, in the pre-event phase, preventive actions are applied before the disruptive event to prepare the system response. During the disruption phase, corrective actions are utilized in an effort to mitigate the impact of the external shock, and initiate the procedures for restoring the degraded system. In the recovery phase, restorative actions are applied to quickly restore power to disconnected customers, and repair/replace damaged infrastructure.

Resilience actions
A way to integrate these preventive, corrective, and restorative actions into power systems operation is via smart grid technologies such as advanced telecommunication and control techniques that could increase preventive and corrective operational flexibility, as well as distributed energy resources (DERs), e.g., solar/wind energy and energy storage, for restorative actions. For instance, distributed energy resources local to customers could act as generation or flexible load resources during disaster related outages and assist in restoration efforts by reducing reliance on long-span transmission lines [16]. On another hand, in Ref. [17] network reconfiguration options have been used to aid recovery and hence optimize system resilience, while in Ref. [18] transportable energy storage systems have been proposed to mitigate large area blackouts and generation scheduling in microgrids. In Refs. [19][20][21][22][23][24][25][26][27][28], discussions are directed towards enhancing power grid resilience using intelligent control and communication methods in microgrid facilities. In Ref. [29], the proposition to improve power systems resilience includes integrating disruptive events into planning decisions, while resilience enhancement using distributed energy resources is proposed in Ref. [30].

Resilience capabilities and dimensions
The phases of the resilience trapezoid can also be directly associated with the different capabilities of a resilient system, which are the withstanding, absorptive, adaptive and restorative capabilities [31]. The withstanding capability is the ability of the system to maintain an acceptable level of essential functionality under disruptions [32,33], and can be evaluated by comparing the normal (baseline) functionality of a system to its functionality in disruptive states. The absorptive capability is the ability of the system to absorb the impact of the disruptive event and hence minimize the system damage, given that the system could not withstand the disruptive event [34]. To define this capability, the minimum acceptable level of system functionality or the normal state is defined. This capability is active in the disruption transition phase and can be assessed in the outage phase. It is important to note that the withstanding capability is associated with long-term exposure to disruption, while the absorptive capability is demonstrated only in the short disruptive transition period. The restorative capability is the ability of the system to rapidly recover to normal or Fig. 1. The resilience trapezoid, with power system performance P(t) at time t: t 0 , t e , t d , t s , t r* , t r** , t r , respectively, the time at original state, disruptive event occurrence, post-disruption state, initiation of recovery actions, initial system recovery, infrastructure restoration begins, and full system restoration state. In the pre-event phase, system operates at normal conditions. As disruptive event strikes, the system absorbs some shock and goes into the alert state, then proceeds to an emergency state with further degradation, lasting through the outage phase which is an abnormal state. In this state, the system applies corrective actions and emergency resources towards critical load restoration, also known as the self-recovery when the emergency resources are preintegrated into system operations. After prioritized restoration of critical loads, recovery efforts continue with repair and restoration of damaged infrastructure. satisfactory functionality [31]. Rapidity is essential in this phase as the preference is for faster resource allocation towards system recovery. An influential factor to rapidity is also the fastness of identification of system failure. The adaptive capability of the system is its ability to learn from disruptive events and modify system configuration, personnel training, and functions, to enhance system flexibility against future disruptions. This can be assessed by comparing system resilience indicators post-disruption after restoration and pre-disruption. In addition, the planning capability, associated with the pre-event phase, is also introduced in Ref. [35] as the ability of the electric power utility to implement measures to reduce effects of potential hazards on the power grid performance. Common terms which have been used in different stages of the resilience trapezoid and associated with the resilience capabilities in reviewed studies are also presented in Fig. 2.
Moreover, there are resilience dimensions associated with these discussed capabilities, namely robustness, redundancy, resourcefulness, and rapidity, also known as the 4Rs of resilience [36]. Robustness is the ability of the system to withstand disruption up to a given level without loss of functionality. Hence the system robustness is associated with the withstanding capability at the pre-event phase. Redundancy is the extent to which components and subsystems can be substituted to satisfy the suffered loss of functionality. It is associated with the absorptive capability at the disruption transition and outage phases. Resourcefulness is the ability of the system to identify system failures, prioritize and mobilize resources when conditions threaten the system, or towards meeting target recovery. Hence, resourcefulness is associated with the absorptive and restorative capabilities and can be assessed between the disruption and restorative transition phases. Rapidity, as discussed, basically is the ability to meet recovery priorities in a timely manner in order to contain losses and maintain functionality.
Specifically, in order to effectively quantify resilience, quantitative metrics are utilized to evaluate these capabilities and dimensions by measuring the impact of different operational resilience and infrastructural enhancement strategies (e.g., grid hardening) on power systems when resident in the different performance phases described. Although a number of studies have presented overviews of resilience [37][38][39][40][41][42][43][44][45][46][47][48], the gaps remain that there are no existing attempts that focus on the comprehensive analysis of quantitative power system resilience metrics (PSRMs). In addition, no attempts exist that provide a framework towards categorization and a review methodology towards quantitative standardization as a baseline to compare the functional effectiveness of PSRMs. This is a major issue for power system resilience as highlighted in Ref. [49]: "There is presently no standard for the design of resilient distribution systems". This paper bridges this gap by providing attribute categorizations, a standardized comprehensive review and analysis of power system metrics for resilience quantification, while proposing insights into improving their design and development. Hence, this review focuses on quantitative power systems resilience metrics (PSRMs) with the aim of providing adequate information to aid the selection of design-appropriate metrics for power systems analysis, while identifying and recommending improvements.

Review methodology
For a comprehensive review, the authors searched various standard databases for publications relevant to resilience research. The steps taken were: defining the criteria for selected publications, defining databases and selecting the publications based on the criteria, data analysis and discussion of selected publications. The selections were limited to publications in English, in all databases visible mainly on Google Scholar, published until 2020, and with "Power system/grid resilience" as the keyword in the title, abstract or main body. Pioneer and authoritative papers were included to discuss and review power system resilience and related relevant topics. The selected publications were then filtered to consider power system quantitative metrics in the original publications in which they were developed. The authors then analyzed the methods and frameworks employed by the filtered publications in order to elucidate the different types and attributes of the metrics, providing broad categorizations and standardization discussed further. Innovative papers that adapt metrics from other studies are mentioned appropriately to help drive the review, but not included in the tables. However, all selected publications of original and adapted quantitative metrics are included in the statistical analysis. The methodology for review is based on the Axiomatic Design Process (ADP) which is elaborated next.

The Axiomatic Design Process
The fundamental hypothesis behind this process is that there are basic principles that govern a good design practice. In particular, the ADP is the logical process towards an objective through a series of domains [50,51]. The ADP integrates a design sequence which consists of four domains: (1) the Service Domain, (2) the Functional Domain, (3) the Physical Domain, and (4) the Process Domain. The Service Domain represents customer needs. For an electric power utility, these needs are resilience objectives that the power system has to meet to provide customer satisfaction. These objectives are then transformed into requirements in the Functional Domain. Having recognized these Functional Requirements (FRs), Design Parameters (DPs) are then defined in the Physical Domain to specify the recognized FRs. In the ADP process the next step would be to define process variables for the processes running in the system under analysis. Moreover, the interaction between FRs and DPs is considered the major design process [31]. To integrate the ADP into our methodology, the Independence Axiom, illustrated in equation (1), must be satisfied.
where [A] is the design matrix characterizing design structure. Specifically, the process where FRs are mapped to their corresponding DPs, as in Table 1, should fundamentally satisfy the Independence Axiom which states that the independence of FRs must always be maintained [50,52]. We present the mapping process evaluating the relationship between the FRs and DPs in Table 1, where the "✓" in the design matrix indicates the non-zero matrix elements, meaning a relationship exists between an FR and a DP. In addition, the independence axiom is satisfied since the design matrix furnished in this paper is of the triangular (decoupled) form. 1 In this work, the previously discussed resilience capabilities are adapted as the power system objectives towards resilience enhancement and thus customer satisfaction, hence recognizing these capabilities as the service (functions) that the power system has to provide to its customers. These objectives are then transformed into functional requirements. For instance, for the ability to withstand disruption, following the definition of the withstanding capability in Section 1.1.3, the FRs would include the interaction of the real (current) system performance with the target system performance, where the DPs are the real system performance, and its standardization/comparison to the target system performance. Table 1 further summarizes the FRs and DPs (left) with the specification of the design matrix (right). Thus, the design parameters are utilized in the review process in order to maintain generalization of PSRMs and a standard for assessment of reviewed metrics. However, this does not imply that metrics which do not meet all the design criteria are sub-par since metrics are developed for specific purposes. Instead, it highlights functional requirements that, without loss of generality, have not been adequately addressed in literature. 1 The design matrix should either be in diagonal (uncoupled) or triangular (decoupled) form [53]. The former means that the relationship between FRs and DPs perfectly satisfies the Independence Axiom while the latter will also guarantee the Independence Axiom given that DPs are specified in an appropriate sequence such that dependence between FRs are minimal.

Paper contributions
In summary, this review paper presents a lead-in discussion of the main concepts of power system resilience, and focuses on furnishing power system quantitative metrics with unified notations towards standardization for research and industry applications. Furthermore, to answer pressing questions of metric validation in power system analysis, this paper presents a review methodology which explains service objectives expressed as the functional requirements for a resilient power system and identifies appropriate design parameters which demonstrate the functional requirement. Since this work provides a review of power systems and not a specific system design, the design parameters are not further mapped to the process variables.
Hence in this review, the ADP is employed to recognize the major features of a resilient system and develop criteria for assessing reviewed power system resilience metrics using the DPs. The reviewed metrics are presented with design parameters they satisfy in Table 2, while categorizations of reviewed metrics, in the next sections, are presented in Table 3. Furthermore, this work presents statistical analysis of the reviewed papers on quantitative PSRMs, furnishing information on the venues and timelines of publication as well as the power system level/ stage in which the metrics are applicable.
The rest of this paper is organized as illustrated in Fig. 3 and as follows: Section III discusses the different attributes and categories of PSRMs, while Sections IV, V, and VI present in-depth analysis of the metrics in these discussed categories which are distribution-level, transmission-level, and generic metrics furnished with details under the different techniques and enhancement strategies that were employed to develop such metrics. Section 7 summarizes, for researchers and utilities alike, sources for obtaining real historic outage data for resilience analysis. Section 8 discusses the existing gaps and challenges while making recommendations, and conclusions are drawn in Section 9.

Standardized notation
One of the additional contributions of this paper lies in the standardization of parameters, that have been utilized to develop resilience metrics, which have varied across different studies. The following notations are used in the paper: bold-faced letters represent matrices and vectors. With the letter R we define power systems resilience and with P (t) we define the performance state of the system which is the power system performance at any time t along the resilience trapezoid. The system performance is a suitable metric associated to the resilience level of the system [15]. In particular, it is the state of the quantity (e.g., active power and load demand, cost of loss of functionality) used as the system's resilience indicator, and is suggestive of the resilience level at any point in time. Based on the performance state of the system, t can be equal to: t 0 , t e , t d , t s , t r* , t r** , t r , where these times respectively represent the time at original system state, the time of the disruptive event occurrence, the time at post-disruption state, the time when recovery actions are initiated, the time to initial system recovery, the time that infrastructure restoration begins, the time at final system restoration state, and T as the time to full system restoration as shown in Fig. 1. In    the paper, we refer to the transmission/distribution lines as links, and the term nodes refers to the power system's buses and load points.

Attributes of resilience metrics
The quantitative resilience metrics reviewed in this paper are assigned the following attributes that specify the nature, scope and methodology utilized to develop the metrics. All attributes of the reviewed resilience metrics are summarily tabulated in Table 3.

Stochastic vs. deterministic metrics
Stochastic metrics incorporate the impacts of uncertainties (e.g., component failure and restoration time) in the metrics calculation, while the deterministic metrics calculate the metrics without considering the uncertainty of parameters and events.

Cost vs. energy vs. time-based metrics
Cost-based metrics quantify resilience based on such costs associated with recovering the system performance, lost opportunity costs due to power outage in the system, costs of energy not supplied, and value of lost load. Energy-based metrics assess resilience by measuring the power and/or energy that is lost or retained post disaster. Time-based metrics measure resilience based on time, quantifying how fast the system is affected by an extreme event or how quick the system recovers.

Infrastructural vs. operational metrics
Infrastructural metrics, also known as Planning metrics, assess the impacts of planning decisions and infrastructure hardening strategies (e. g., replacing aging components and integrating automated switching technologies) on improving the resilience of a given system against a disturbance that may happen in future. Operational metrics quantify the impacts of operational actions (e.g., DER dispatch, system reconfiguration) that aid in maintaining and/or restoring the system performance given an imminent disruptive event.

Dynamic vs. static metrics
Dynamic metrics capture the time-dependent performance and evolution of disrupted system or components while static metrics are time invariant. Hence, dynamic metrics consider the time dependent functions of a system, while static metrics do not consider them [31].

Metrics utilizing real data vs. simulated data
The resilience metrics reviewed in this paper are evaluated using either real historical data (e.g., line failure data, weather data), or simulated and/or synthetic data and are highlighted to aid power system resilience research.
In the following sections, we discuss and analyze developed power system resilience metrics, which are categorized into: 1) distributionlevel system metrics (DSMs), 2) transmission-level system metrics (TSMs), and 3) generic system metrics (GSMs). The metrics in the first two categories are specifically proposed for quantifying resilience in distribution and transmission systems, respectively, as these metrics have unique qualities in terms of component characteristics, configurations and topology, applicable to each of these systems. The generic resilience metrics are those that are proposed to quantify the resilience of power systems without specification to a particular level and hence presumably can be applied at any level. However, first, we provide statistical analysis of common performance indicators utilized in quantitative PSRMs, time evolution of the reviewed PSRMs, and also, time evolution based on journals in which these PSRMs have been published, detailed as follows.

Statistical analysis
In this section, we analyze the statistics associated with the publications of quantitative power system resilience metrics over the past

Recurring performance indicators
In Table 4, we furnish information about the system performance indicators that have been identified in the course of this review, and their distribution as commonly used in resilience quantification of DSMs, TSMs, and GSMs. These are system connectivity parameters, failure/recovery parameters, active power, energy cost, voltage magnitude, and line thermal limits. The system connectivity can be evaluated using different measures discussed as follows. The network diameter is the largest geodesic distance between possible pairs of nodes, and represents the length of the shortest path between the farthest components in the power system. The average path length is the average geodesic distance of the shortest paths between all possible pairs of nodes, and represents the number of components that must be traveled through to connect a power source to a load while considering the entire network. The degree distribution, which is the fraction of network nodes with certain node degrees, represents the number of laterals arising out of each feeder node, aiding a more heterogeneous network. The betweenness centrality of the network represented as a graph, is the number of all the shortest paths passing through a node. This represents the relative importance of each component in the network. The clustering coefficient of the network represents the probability that two incident nodes are completed by a third node to form a triangle, and determines which components tend to be connected to adjacent components.

Peer review journals and venues of publication
We present the evolution of the reviewed metrics according to the journals in which they have been published over time as illustrated in Fig. 4. For instance, all red circles are publications in the Journal of Reliability Engineering and System Safety in different years within the decade, while the journal with the largest publication percentage for resilience metrics is the IEEE Transactions on Smart Grid in the year 2016. In addition, 2019 and 2020 years have seen an increase in quantitative PSRM publications, mainly with the IEEE Transactions on Power Systems, Risk Analysis, and Reliability Engineering and System Safety. In the following sections, we discuss and analyze PSRMs in their different categories.

Time evolution of reviewed metrics
In addition, we present the overall time evolution of metrics over the last decade as illustrated in Fig. 5. The plot shows the statistical quartile box plot for the distribution system metrics (DSM), the transmission system metrics (TSM) and the Generic system metrics (GSM) for the year 2011-2015, 2016-2020, 2010-2020, from left to right. The main goal of the plots is to show the variance of the number of papers studying PSRMs in each of the power system levels. The analysis shows that DSMs have ramped up in literary focus through the decade while less literary attention has been given to the TSMs.

Distribution-level resilience metrics
This section discusses PSRMs that have been developed for utilization in the power distribution system (PDS). We further compartmentalize these metrics according to employed techniques or parameters, even though metrics can fit more than one compartment, to aid readability and research.

Employing system topology
In [54], a resilience measure is proposed based on the concept of "topological resilience". The topological resilience, which is quantified by determining the probability of distribution components remaining functional after a disruptive event, increases if the functional probability of components is above a certain threshold and a path can be connected   through functional components to critical loads. The functional threshold indicates the fraction of components the operator can bear to have damaged. With power flow feasibility being the most influential factor, the resilience of each possible feasible topological configuration of the distribution system is defined and aggregated towards the overall resilience of the system. The overall resilience is defined mathematically by a stochastic, static, energy-based, and operational metric as in equation (2).
where R = [R 1 , R 2 , …, R U ] is a vector that contains the resilience scores/ values of U feasible configurations of the distribution network, w u is the normalized weights assigned to resilient configurations, r = max[R 1 , R 2 , …, R U ], and R u is an ascending order of vectors containing all the composite resilient values of the networks that do not have the maximum resilience. The value of the overall resilience increasing illustrates an increase in the number of paths that connect functional components to critical loads. Similarly, by considering feasible paths and modeling the distribution network [55], develops a stochastic resilience metric using the Choquet integral computation method, which aggregates the contribution of individual factors associated with nodes and links of the power distribution network, including redundancy of paths, probability of available sources, and central point dominance, in order to identify the feasible paths to restore a load. In Ref. [56], the spatial distribution of the impacts of disruptive events is assessed, where the system performance as the system fails and recovers is characterized by network efficiency and the largest connected component. Network efficiency has been proposed in literature as a measure of how well a system exchanges information and has also been used in resilience evaluation [57], while the largest connected component, which is the number of nodes in the largest connected subgraph of the network, is also used to assess system performance given a range of vulnerability scenarios. In Ref. [58], islanding is utilized proactively to minimize the adverse effects of disruptive events in a resilience-driven reconfiguration using anomaly  In the first plot, we observe the focus on developing GSMs, and the gradual peaking interest in DSMs. In the following 5 years, we observe the that the DSMs had peaked in literary attention over GSMs. Overall, through the last decade, literary attention has been significant with the GSMs and DSMs, followed by the TSMs which have generally received less attention.
ridden distribution-PMU data. The reconfiguration employed is based on the spanning tree optimization algorithm that maximizes critical demand met, while the component-based resilience metric employed is time-based, deterministic, based on the system topology, and mathematically defined as: where w 1 and w 2 are system-specific weights, bc n is the betweenness centrality of the node being assessed for its resilience, l g,n represents the geodesic path of between a node and a generator, l max is the maximum of all path lengths in a given network, while P c and P n are the real power demands of the critical and all loads respectively, at and downstream of the assessed node. This metric notably integrates the topology of the distribution system into resilience analysis, however except for the disruption phase, it is not comprehensive in other resilience phases which are not considered in their resilience analysis.

Employing microgrids and/or distributed resources
In [59], a resilience metric is developed to evaluate the ability of the system to restore critical loads by employing microgrids. This stochastic, energy-based and operational metric is developed based on two restoration levels: (1) the restoration of critical loads and (2) the restoration of power system infrastructure (e.g., damaged poles and lines). Given t r* to be the time to restoration of the critical loads, the system resilience is defined in the time period [t d , t r* ] as: where C is the set of critical loads restored by microgrids, W c is the weight of a critical load c, and P c (t) is the active power of load c at time t. This metric as defined in equation (4) is also adapted in a stochastic posthurricane framework proposed in Ref. [60] in order to improve the resilience of networked microgrids using mobile emergency resources, as well as in Ref. [61] for pre-hurricane resource allocation inclusive of electric buses. Similarly, in Ref. [62], a resilience metric is introduced to evaluate restoration of critical loads in a PDS and is defined as the cumulative service time of distributed generators to loads which are weighted by priority. A great advantage of these measures is the ability to simultaneously assess the operational and planning attributes of the PDS, hence, assessing the system resilience attained by short-term operational strategies such as resource allocation, and the long-term planning strategies such as location and hardening of distribution lines. In Ref. [63], the resilience metric in Ref. [59] is extended to consider multiple microgrids and also the total switching time of the microgrids from grid connected to islanded mode. A stochastic, operational, cost-based resilience metric is developed and optimized as a mathematical formulation. Stochasticity here arises from the utilization of the Bat algorithm in handling discrete and non-linear characteristics of the formulation and also, the uncertainty of the distributed generators subject to weather conditions. The post degradation objective minimizes the time slots needed for load restoration.
where G represents distributed energy resource costs, Λ is a constant value to convert system performance to a dollar value, L is the set of links i connected to node n, ω n denotes the weighting factor of the node n, P dn,t is the total load demand on node n, and T sw is the switching time for microgrids to transit between islanding and grid connected mode.
In [64], the resilience of microgrids against windstorms is analyzed using fragility curves of overhead lines and windstorm profile, where vulnerability and degradation metrics are developed, and restoration efficiency is measured. Hence, proposing a resilience metric analyzed through disruption transition to restoration and defined as in equation (6): where Δt is the time the system awaits infrastructure recovery. The advantage here is system resilience details afforded by the employed component level modeling, and hence providing a standardized [0,1] metric directly proportional to system resilience. Similarly, vulnerability and degradation indicators are utilized in Ref. [65] to evaluate the resilience of distribution networks focusing on critical load impact under disruptive weather conditions. The proposed resilience index is defined: where P s is the comprehensive loss of critical load in scenario s at time t. In Ref. [66], the resilience of electric infrastructure systems is assessed by criticality prioritization of the demand to be restored (priority, urgent, routine) hence recovery proceeds in three stages where initial recovery consists of two stages in which the priority and urgent loads are recovered, and the final stage which recovers routine loads. In their framework, they take into account the impact of temporary services such as distributed generators (DG) to restoration, hence different stages of recovery include the use of these DG resources. The resilience is then evaluated by applying the concept of the four resilience dimensions namely robustness, redundancy,resourcefulness,and rapidity, including the system adaptive capacity (readjust-ability), into a weighted resilience average as shown in equation (8).
where γ is the time scale factor, t δn is the slack time (maximum acceptable time post-disruption before recovery begins) defined for different load n which could be priority (p), urgent (u), or routine (r), P n I = P(t 1 ) n -(P(t d ) n + P(DG) n ) where t 1 is the time that restoration of urgent loads begins, P n II = P n 1 − P n I , and P n 1 is the performance level when urgent loads are recovered. The first term in equation (8) is the weighted system robustness, active in the disruption transition phase, and formulated using a monotonic modified sigmoid function bounded in [0,1]. The second term is the weighted system redundancy which is formulated as a ratio between the capacity of the DGs to the system capacity predisruption. The third term is the system resourcefulness which is the ability of the system to rally its resources towards prioritized restoration, where 0 → 1 implies positively utilized resources. The fourth term is the rapidity of system recovery where the timescale γ represents customer sensitivity to power outage, and the recovery time and slack time are defined for different n types. The last term is the system adaptability which represents the capacity of the system to re-adjust to operations in the event that the system does not adequately absorb the impact disruption. In Ref. [67], the resilience of the distribution grid against earthquakes is enhanced through the optimization of the capacity and location of battery energy storage. A resilience index, mathematically defined in equation (9), is proposed as an objective function to be maximized subject to battery storage constraints.
where ∑ tr * − te te P is the total system energy demand for the system emergency duration, Pr(t e ) = 1 24 is the probability of earthquake occurrence in each our of the day, π s is the probability of a scenario s, N i is the number of networked islands in the distribution system, α is the parameter that indicates the battery energy storage in a network island in a given scenario, and P dch is the discharge of the battery energy storage in a given scenario, in a given time slot given that the earthquake occurred at t e . This metric basically informs on the discharge of the battery energy storage relative to the energy demand by critical loads in the emergency duration over different scenarios. In Ref. [68], an outage management scheme is proposed for smart grids with multiple microgrids for resilience enhancement in the event of disruptions. This hierarchical scheme consists of two stages; the first stage involves the scheduling of the available microgrid resources based on a proposed model predictive control-based algorithm, while in the second stage, the system operator coordinates the inter-microgrid unused capacity to meet unserved demand from the first stage. The performance of the proposed scheme is then evaluated using a resilience index proposed as follows: where N is the number of microgrids, T is the number of time slots in a day, and P n,t is the curtailed load from the second stage scheduling of microgrid resources supplying unserved loads. However, this index focuses on the disruption transition phase and does not integrate the other dimensions of resilience. In Ref. [69], analysis and discussions are further provided on strategies used by microgrids for resilience enhancement as well as the use of microgrids as a resilience resource.

Employing failure and recovery assessment
In [70], a spatio-temporal resilience evaluation which focuses on the dynamic nature of failure-recovery states is proposed. For instance, the allocation of repair crews following a failure is dynamic. They develop a metric that defines resilience of a system component as the probability that the component is either functioning or exhibiting infant (fast) recovery. This time-based metric is illustrated in equation (11).
where Pr{X i (t) = 0} is the probability that a component is in normal operations, hence characterizing the ability of resist failure at time t, and Pr{D i (t) < d, X i (t) = 1} is the probability of infant recovery, characterizing the rapidity of recovery given the occurrence of component failure.
In (12) the second term corresponds to the expected percentage of aging (slow) recoveries at time t, k = (t − t e + d) and s specifies the three failure scenarios. Given any of the three scenarios, λ (s) z (t e ) is the probability of a scenario to occur during the period (t e − dt e , t e ], z is a sub network considered in region Z, and P r { D (s) is the probability for failures that last a duration longer than k. This metric indicates regions and time durations of least resilience based on fast (infant) and slow (aging) recovery states. If a node remains in a failed state less than the threshold value, d, the recovery is said to be infant. Infant recovery indicates higher distribution system resilience. This measure is assessed using empirical methods and real component failure data from hurricane Ike. Another major advantage of this metric is its capability to recognize effects of cascades in failure and recovery, hence reflective of real events and accurate evaluation of PDS resilience.
In [72], the authors define planning and operational resilience measures, where the former is specified by time-varying component failure rates, with which resilience has an inverse relationship. The latter is specified by the customer interruption time, system-and customer-disruptions that do not recover after a specified threshold, hence following the infant recovery described in Ref. [71]. Therefore, the metric combines planning and operational measures to obtain a dynamic and stochastic metric for assessing PDS resilience. The authors incorporate the uncertainty and dynamics of failures and recoveries, spatio-temporal methods, and customer cost of failures as shown in equation (13).
where ϒ is a factor that normalizes resilience value to [0,1], d is a threshold on disruption duration, E s(t d ) is the expectation over a random system state, S(t d ), R inf (t d |S(t d )) is the infrastructure failure rate of the system given S(t d ), R serv |S(t d ) is the conditional expectation of the service cost due to delays in restoring failures given S(t d ), and the integral is over the expected cost at time t due to a delayed recovery in a service area. They validate their metric using real data from Hurricane Sandy and illustrate that slow recovery after disruption is considerably influenced by lower-level components such as fuses.
A resilience metric similar to Ref. [73] in Ref. [74] where the loss of load/curtailed load is utilized in the resilience evaluation of integrated electricity and natural gas transportation system planning given disruptive conditions. In Ref. [75], the resilience of the integrated energy systems is quantified based on the system loss in functionality. The proposed method entails the introduction of the loss matrix, whose scenario-based elements are the undelivered system services during operational periods of analysis and for internal and external failure modes, which is then transformed into the consequence matrix where the penalty costs of undelivered services are assigned to matrix elements. The elements of this consequence matrix are then normalized to obtain the proposed resilience matrix where element-wise resilience is defined as: where P max i in each scenario is the penalty cost if all functional services during operational temporal period i are lost, and P i,j is the penalty cost of lost functional services in operational temporal period i and failure mode j.

Employing availability and risk indices
In [76], system availability analogous to fault tolerance is adapted in defining resilience metrics of data center power supply as: (1) resilience of a subset of subsystems whose single failure leads the system availability to be less than the minimum for a tier IV design, and (2) resilience of two subsystems whose double failure results in the system availability to be less than the minimum for a tier IV design. A tier IV design requires the data center power supply system availability to be no less than 0.9999. Hence, lower values imply better availability, operation time and hence resilience. In Refs. [35,77], a deterministic, static resilience metric that accesses the interdependence between critical infrastructure (CI) is developed based on the availability concept in reliability theory. In this metric, energy storage units are integrated into the nodes of a communication CI in order to assess its electricity dependence on the power grid CI. This metric is mathematically formulated as follows: where U N/S is the unavailability (1-Availability) of the system without storage, ξ s is the storage capacity of a storage unit s, and μ is the sum of repair rates related to a direct transition from a failed state into an immediate operational state. In Ref. [78], risk based metrics, value-at-risk and conditional-value-at-risk, are adapted from risk-averse financial planning to quantify the resilience of the distribution system. The value-at-risk metric specifies the minimum value of system performance which cannot be exceeded with a certain probability, while the conditional-value-at-risk metric measures the conditional expectation that the loss in system performance will exceed the loss associated with the value-at-risk metric. These metrics then quantify the operational resilience of the power distribution system by providing insight into impacts of present and future disruptions, target system performance and potential improvements attained by system enhancement strategies.

Employing smart technologies
In [79], the resilience impact of automated fault location, isolation and service restoration devices is assessed, and resilience indices are proposed for each phase of the resilience trapezoid. The indices include the expectations of maximum load loss, load interruption rate, automatic restoration time, load restored by automation, repair time, and energy not served respectively, defined as R 1− 6 for the realized scenarios, S. , The expectations of maximum load loss and load interruption rate are representative of the system robustness and absorptive capacity, and are evaluated in the disruption transition phase. The energy not served informs the degradation in system performance over the period of analysis, as a result of the disruptive event. It is the magnitude of the valley in the resilience trapezoid. The automatic restoration time, load restored by automation and repair time are active in the recovery phase. However, the proposed metrics could benefit from a weighted aggregation into a single resilience metric for the power distribution system especially since they are multivariate. In Ref. [80], the impact of smart grid technologies, renewable energy uncertainty, and restoration budget, are considered in the optimization of critical loads supply while satisfying topological and operational constraints. Towards this, a resilience metric is proposed using load prioritization, system resistance to disruptive event, and recovery of the distribution system in the period of analysis, as defined in equation (17).
where n and t are the index of buses and time respectively, n int is the number of interrupted buses, nî nt is the number of uninterrupted buses, P n,t is the load at bus n at time t, P cmax is the total power of restored loads which is maximized as an objective function subject to load priority, and T is the study period. Hence, the first term in the equation (17), is the resistance of the distribution system to the disruption, hence signifying the absorptive capability of the system with the ratio of the demand supplied after disruption to the total demand of the system. The second term is the restorative capability of the system where the ratio of the prioritized restored loads to the disruption-interrupted loads represents the recovery of the distribution system. The third term adapts the metric to the study period. This metric appropriately addresses two major resilience disruption and recovery phases.

Transmission-level resilience metrics
This section discusses PSRMs that have been developed for utilization in the power transmission system. We further compartmentalize these metrics according to employed techniques or parameters, even though metrics can fit more than one compartment, to aid readability and research.

Employing operational and infrastructural indicators
In [81], dynamic and time-based power system metrics, R 1 , R 2 , R 3 , R 4 are developed for quantifying resilience with respect to planning and operational actions of the power system for different phases of the resilience trapezoid. R 1 estimates the slope of resilience degradation during the disruptive event, hence representative of the rapidity of resilience degradation. R 2 estimates the level of resilience degradation, and by physical representation, it is simply 'how low' resilience drops in the resilience trapezoid. R 3 estimates the time the system remains in the degraded state after the disruptive event. R 4 is to the recovery phase what the R 1 is to the degradation phase, as it assesses the rapidity of system recovery. The resilience indicators for the operational and planning metrics are the amount of connected generation capacity and load demand during the event, and the number of transmission lines that are online, respectively. In addition, a metric R that utilizes the area of the resilience trapezoid is developed to quantify the absolute resilience of the system. These metrics are formulated as below, where the superscript ψ ∈{o, i} is used to represent the variable associated either with operational actions or planning strategies: , Hence, with respect to Fig. 1, R 1 and R 2 are associated with the disruption transition phase, R 3 with the outage phase, and R 4 with the recovery phase. These present adequate systematic methodology for assessing the resilience capabilities at every dynamic phase as well as absolute system resilience. The utility can therefore assess the system resilience dynamics from disruption progression through to system recovery.
In [82], a deterministic, static and energy-based resilience metric is developed for high voltage transmission systems incorporating changes in economic, environmental and social indicators as shown in equation (19). They approach resilience quantification from a societal context as a broader perspective and then a sub-problem is formulated from the engineering perspective for high voltage transmission systems. This metric aggregates over all resilience indicators, the weighted system degradation from the disruptive event, over time. The resilience indicators considered are: economic indicators of electricity cost and investment cost, environmental indicators which include fire and ice, and social indicators include blackouts and human behaviors.
where w z is the weighting coefficient associated with the change of resilience indicators, P z (t) is the resilience indicator performance, and z is the number of resilience indicators.

Employing failure and recovery assessment
In [83], the power grid resilience is enhanced by utilizing the concept of defensive islanding, as a preventive measure during weather emergencies, which aims to improve resilience by isolating vulnerable components whose failure can cause cascading events. They employ the use of fragility curves which express weather-dependent failures of power system components hence providing an advancement beyond the traditional methods towards weather emergency scenarios. They employ a stochastic severity risk index in order to determine the links that are at higher risk of failure. This risk metric is then utilized to implement a risk-based and adaptive defensive islanding algorithm that aims to mitigate effects of cascading failures when disruptive event occurs. This islanding method improves the power system resilience by identifying high risk components that could cause cascading effects, and utilizing that information to split the power system network into stable and self-adequate islands. This metric is also utilized in Ref. [84] to analyze the online spatio-temporal progression of extreme events in power system regions.
In [73], a resilience metric is proposed to quantify the adaptability of the power system to extreme events while also applied in the enhancement of the power system operation through resilience-constrained economic dispatch. This metric considers weather-induced line outages, common cause outages, and hidden outages to measure system resilience as mathematically defined below: where N is the number of evaluated nodes, P n is the load curtailment resilience indicator, while R is evaluated from P n = [0,100] to encompass larger blackouts with larger impact, and Pr(P ≥ P n ) is the cumulative distribution function of the blackout size distribution of a system. A stochastic, time-based, and planning resilience metric for multiple transmission line outages is developed in Ref. [85]. This metric is extended for resilience analysis at the component level. The resilience indicators utilized are the bus voltages and the active power output of the generators. They perform a resilience analysis on the IEEE RTS-79 test system and propose enhancement strategies based on the resilience results.

Employing system topology
In [86], several grid resilience metrics are developed given that the grid, similar to a graph, can be represented by the Laplacian matrix. These metrics are then classified by considering several features of grid resilience under two concepts: 1) grid connectivity and robustness and 2) grid operational functionality. In the first set, three metrics are developed, including the algebraic connectivity metric R 1 , the grid sensitivity metric R 2 , and the grid resistance metric R 3 . R 1 reflects the algebraic connectivity of the grid after any changes in its network topology compared to the previous state of the grid. For instance, if the connectivity of the grid is reduced (a link or node goes down), the lower the graph connectivity, and vice versa. R 2 quantifies the response of the grid to any changes in it's topology. For instance, the larger the network (components), the less sensitive the grid is to changes in it's topology, and the smaller R 2 , the more robust the grid. R 3 quantifies the opposition of the power grid to configuration changes, for instance, the removal of a transmission line. It is inversely proportional to the effective grid conductance. The algebraic connectivity metric is mathematically represented as: where x is the system state given that the topology of the system can be represented by the Laplacian matrix [γ 1 , γ 2 , …, γ n ], γ 2 is the second smallest eigenvalue of the Laplacian matrix which defines the algebraic connectivity of the grid. The grid sensitivity metric is mathematically defined as: where L + is the Moore-Penrose inverse of the Laplacian matrix, Trace(L + ) = ∑ N n=1 γ n is the sum of eigenvalues for a given grid topology, γ n is the eigenvalue of the Laplacian matrix given that n nodes of the grid are affected, and N is the total number of nodes. The grid resistance metric is mathematically represented as: In the second set, three metrics are also defined. These are the grid flexibility metric R 4 , the outage recovery value metric R 5 , and the outage capacity recovery metric R 6 . R 4 reflects the level of system resourcefulness, enabling a faster recovery process. For instance, a system with a sufficient number of generating units being accessible to the load points will have more efficient corrective actions for system stabilization. R 5 quantifies the amount of customer interruption cost that can be retrieved after each corrective action. A resilient system should have low outage recovery costs reflective of the number of outaged customers. R 6 quantifies the rapidity of restoration of the interrupted performance by a recovery action. It signifies the power capacity that could be restored from implementing the recovery process with a certain time. The grid flexibility metric, is defined as the ratio of the system performance after each recovery action to the normal system performance: where P t|ej dn,h is the active power demand at node n after the recovery action h, in response to the disruptive event e j , and P T d is the target active power demand of the system in its normal and pre-disaster operating condition. The outage recovery value metric is defined as: , (25) where C dn is the value of the lost load at node n. The outage capacity recovery metric similar to equation (39) is defined as: where P t d |ej dn is the active power load at node n at the end of the disruption time t d . These are considered deterministic as the metric parameters such as the high impact low probability contingency and the time to recovery are predetermined. In Ref. [87], several indices are developed which are aggregated and weighted towards defining physical and cyber resilience of the electric power grid at the transmission level. The former consists of four components, as in equations (27)-(30), which are weighted and aggregated towards a physical resilience metric using the analytical hierarchical process. The source-path-destination index R p1 , the MW availability index R p2 , the MVAr availability index R p3 , and the loss of load index R p4 , are defined as follows: where k n is the number of paths connecting generator n to destination substation, V n is the vulnerability index of n due to the repetitive occurrence of transmission lines in k, H n is the Hops index that reflects the vulnerability of n due to the number of transmission lines connecting n and a substation, and C n is the average cost, calculated as line impedances, between n and a substation. (28) where A MWn is the MW availability of n, A Fn is the availability factor, and P MW is the total MW load. (29) where n r is the total number of reactive reserves available in the substation, A MVAr is the MVAr availability, P MVAr is the total MVAr load.
where P s c is the critical load supplied and P c is the total critical load at each substation.
On the cyber side, the cyber resilience R c is defined as the weighted sum between attackability and security as follows: where k n and k a are number of network and attack paths respectively, SM n is the weighted security mechanism (e.g., authentication, monitoring, access control), and I is the impact value based on transmission line utilization if its control device is taken out of service. The weights are determined through stakeholder requirements through the analytical hierarchical process.

Employing reliability and risk indices
In [88], transmission grid resilience is assessed by extending the common reliability indices including loss of load probability, and expected unserved energy into resilience evaluation using topological network information such as node degree and thermal ratings of transmission lines as performance indicators for prioritized node removal, after which system resilience is evaluated. However, their metric does not address the major resilience capabilities/dimensions, only viable in the disruption transition phase, and the magnitude or impact of the disruptive event is not modeled nor incorporated. Similarly [89], adapts the yearly loss of load frequency, expected energy not served and loss of load expectation to assess the resilience of future electricity networks to climate hazards. Furthermore, in Ref. [90], resilience is assessed by utilizing the load point resilience profile which is a function of the expected probability of interruption, the expected outage duration, and the expected energy not served. The proposed resilience quantification model includes the modelling of the disruptive event, its impact, and optimal restoration. Load point restoration then proceeds with the proposed optimal restoration strategy, a mixed integer linear programming optimization problem for islanded microgrids, with distributed energy resources in islanded segments.

Generic system resilience metrics
This section reviews and specifies the attributes of the generic resilience metrics for power systems. We further compartmentalize these metrics according to employed techniques or parameters, even though metrics can fit more than one compartment, to aid readability and research.

Employing system topology
In [91], dynamic, time-based and stochastic metrics are developed to assess the importance of components such as links to network systems. In Ref. [92], a resilience metric is developed for networked systems by defining a critical performance term, K, for all nodes and links, showing the performance that the system must maintain at each time step t. Having defined the critical performance term of the system, a stochastic continuous-time resilience metric is developed as the ratio of the average critical performance for all disruptive events over time, to the critical performance of the network when no disruptive events occur. In Ref. [93], a stochastic resilience metric is developed to describe the sensitivity of the network services to disruptive events. This metric describes resilience as the probability density function of network reliability when considering α external failures affecting the network and β specific failure scenarios containing α. These networked system metrics can also be adapted to quantify power systems resilience. For instance in Ref. [94], a resilience index is developed for power grids based on the possibility of paths for delivering power from generator to loads considering load importance pre-and post-disruption with approach based on optimal power flow and axiomatic design concepts.

Employing availability and risk indices
A resilience metric is also proposed in Ref. [95] based on the availability of system components by multiplying the ratio of availability and the natural logarithm of recovery time pre and post disruption, capturing the performance and time based attributes of the system as illustrated below: where J is the number of disruptive events, P is the system availability. This metric considers the adaptive capability of the system given a number of disruptive events. In Ref. [96], a metric is proposed to quantify the system resilience to changing climate condition over a time period. The metric employs the use of the conditional-value-at-risk metric to capture the impact of disruptive events on the system for unique intensity levels, and the rate of change of system vulnerability under these intensity levels, as given in equation (33).
where P CVar is the system vulnerability assessed based on the conditional-value-at-risk, j = 1, …, N is the number of intensity levels tested, while I j is the intensity level condition under analysis. Hence, this metric captures the degradation/loss in target system performance given different intensity levels of the disruption. An interesting improvement to this metric would be to include the dynamic resilience attribute or analysis of system resilience over time.

Employing operational and infrastructural indicators
Resilience, R, is defined in Ref. [36] for communities in seismic conditions by employing the resilience triangle, a predecessor of the trapezoid. It is mathematically defined in equation (34), as a deterministic metric that quantifies the magnitude of expected degradation in system performance, P(t), over time.
They acknowledge the dynamic nature of P(t), and the need to factor probabilities of disruptive event occurrence. The advantage of this metric is it's applicability in different systems. However, unrealistic assumptions such as instantaneous impact of disruptive event and instantaneous application of restoration, could hinder effective utilization. For smart grids [14], describes metrics for assessing two phases of operation: the pre-fault phase (reliability) and beyond the pre-fault phase(resilience). The latter is characterized by resilience and is determined by both cascading/independent failures. Hence, resilience analysis commences from time of failure occurrence and is defined as the ratio of the restored to lost performance in equation (35).
where t ∈{t s , t r } and the performance criteria chosen is the line flow and voltage violations. They compare resilience achieved by utilizing different recovery strategies and state that their techniques are generally applicable to power systems and critical infrastructure. In Refs. [97][98][99] a similar resilience measure concept is defined over a set of disruptive events, while in Ref. [98], metrics for time and cost of resilience are also furnished. In addition, this metric is adapted [100], to solve the multi-objective (resilience maximization and restoration cost minimization) restoration problem of the power system as an interdependent (water system) infrastructure, as well as in other fields e.g., transportation system optimization after natural disasters [101]. In [102], a power system metric is introduced as the ratio of the real performance to the desired target performance, P T (t), of the system, as shown in equation (36).
They focus on the technical dimension of resilience, and take into account time-dependent, inter-hazard interactions where past hazards/ disruptive events can affect the system state in future events. For interhazard analysis, the different ranges of time to full system recovery, , which mark periods of hazards in the past, current and future times, respectively compute the system resilience, the current potential resilience and the future potential resilience. The first is computed from historical data, the second, from current system parameters, and the third is computed from simulated system evolution and improvement in adopted strategies. The metric is also used, albeit for distribution systems, in Ref. [103] to access the resilience-impact of adding microgrids to interdependent gas-power networks, as well as in Ref. [104], to evaluate dynamically the resilience impact of ice disasters on power transmission systems considering their strength and location. The metric is also adapted in the framework proposed in Ref. [105] for measuring resilience at spatial and temporal scales for a community defined based on the socio-cultural, economic, environmental and physical infrastructure such as gas, power, and water distribution networks, e.t.c. Also [106], adapts this metric in the resilience assessment of the interdependent traffic-electric power system when subject to hurricane disruptions.
In addition [107], extends equation (36) to compute the expected resilience as shown in equation (37), which considers multi-dimensional resilience through sequences of disruptive events. The system performance levels are determined by the amount of flow delivered.
where E[IA] is the expected impact area (damage to performance) from disruptive events and λ is the annual rate of occurrence of disruptive events. However, they focus on current potential resilience where system target parameters are fixed to the current time settings of the system. This metric as defined in equation (37) is also adapted in Ref. [108], for optimizing network resilience by slight modifications to the system structure. In addition, this metric is adapted in Ref. [109] to develop an evaluation approach, that captures the interactions between attackers and system operators, for the joint impact of physical and cyber attacks on power distribution networks for the purpose of post-disruption loss minimization. In Ref. [110] the same authors adapt the resilience index in Ref. [109] and extend their evaluation approach to assess the value of timely distribution resources dispatch post disruption. In Ref. [111], a stochastic, dynamic and planning resilience metric that assesses the expected ratio of the real performance to the target performance of a power transmission system is developed. It quantifies the expected annual resilience of a power transmission system and it's stochasticity arises from the modeling of some of it's parameters such as P(t). They evaluate the infrastructure resilience under single and multiple disruptive events, modeled by a Poisson process. The expected annual resilience in equation (38) is then reduced to further account for single and multiple disruptive event scenarios.
where T is the yearly horizon (T = 1 year = 365 days), j is the index of events which includes event co-occurrences of different hazard types, J (T) is the total number of event occurrences during T, t j is the occurrence time of the j th event, and AIA J (t ej ) is the area between the actual performance curve and the targeted performance curve, called impact area. The larger the impact area, the lower the system resilience.
In [112], power system resilience metrics are developed modeling the power-water system interdependence and considering the weighted sum of bus voltage magnitude, line thermal limits, thermoelectric cooling water demand satisfaction, overall load supply satisfaction, water distribution system pump load supply satisfaction, as resilience indicators. This component level metric considers the water-energy nexus with diverse performance indicators, hence providing a relatively comprehensive resilience metric. In Ref. [113], a stochastic, time-based and dynamic resilience metric is defined and optimized subject to power flow constraints. Two resilience-based metrics are then developed using this optimized metric, the optimal repair time and the resilience reduction worth. The former is defined as the time when a component is restored so as to maximize resilience, while the latter is defines capacity of resilience reduction of a component due to delays in the restoration of that component. The performance measure is the amount of flow received by a demand node at a specified time. The optimized system resilience metric focuses on the effect that the recovery of system components have on the global system resilience, and as such defines resilience at a time, t, during restoration, as the ratio of the total restored amount of performance to the target system performance, over time.
where PL dn (t) is the amount of flow received by demand node n at time t, P(t 0 ) is the system performance at time t 0 , P dn (t) is the power demand at n. Similarly, equation (39) is adapted in Ref. [114] to analyze the stochastic effect of uncertainty of repair time and resources on system restoration after disruption by optimizing the priority allocated to the intensity and time of system repair with the objective of resilience maximization, while in Ref. [115], it is adapted to assess the network resilience of cyber-human-physical complex systems such as supervisory control and data acquisition systems.

Employing resilience capabilities
In this context [11], develops a dynamic, time-based metric to evaluate power system resilience through hurricane scenarios. This metric is based on a slight modification of the trapezoid (Fig. 1) in order to account for two stages of recovery, the first stage is marked by initial recovery efforts while the second stage marks the final restoration of the system.
where S p is the speed recovery factor, and is equal to for t r ≥ t r* , otherwise is equal to ; a is a parameter controlling the decay of system's resilience attributable to the final recovery time, t δ is the maximum acceptable time post-disaster before recovery begins (slack time), and t r* is the time to complete initial recovery actions. In equation (40), the decay factor and slack time captures the system robustness and resourcefulness towards recovery. This metric is efficacious because it captures a realistic condition of initial system stabilization, and the subsequent decay in resilience over time, after this initial condition. In Ref. [116], a weighted convex combination of the absorptive, adaptive and restorative capabilities of the system is developed as a resilience metric mathematically defined as: where t rt is the preferred system time to recovery which could be well informed by expert opinion (stakeholder decisions). In the above metric, the first component weighted on w 1 is the absorptive capacity of the system, the second component weighted on w 2 , the adaptive capacity and the third component weighted on w 3 , is the time to recovery. This metric is relatively comprehensive since it assesses resilience at all phases of the resilience trapezoid, however as described in Ref. [32], the adaptive capability should be able to reflect the "long-term adaptation" as well as the"short-term coping", hence the metric could benefit from appropriately adjusting the integrals for the second component to account for adaptability, short-term, through self reorganization, as well as system adaptability, long-term, against future disruptions. Having reviewed metrics developed for quantifying power systems resilience, Table 2 standardizes the reviewed metrics using the DPs  developed, while Table 3 summarizes the reviewed metrics into the categories and attributes discussed. The next section discusses available sources for obtaining real input data needed for resilience analysis.

Common inputs for resilience assessment
Accurate inputs are crucial to the resilience quantification processes. In this section, this review discusses common inputs in resilience evaluation of the power system starting with the fragility models of power infrastructure to the data inputs which can enhance realistic resilience assessment.

Fragility curves
Frequently, there is a need to assess power system resilience by utilizing component analysis. In addition, some components could be more crucial to the functionality of the system and therefore metrics are often developed to assess this criticality [91]. In this context, to assess power systems resilience, the impact of disruptive events on the system components have to be identified. Toward this end, a relationship is established between the weight (force) of disruptive event and the level of damage (failure) expected from the component as a response to this force. This relationship is actualized through the utilization of fragility curves [64,117], which are defined by fragility functions. A fragility function describes the failure probability of a component which depends on the potential intensity of the disruptive event. For power system components, the fragility curve typically follows a lognormal distribution [16] and should be site-specific [118]. In order to obtain the fragility curve of a component, data such as wind speed or damage level are needed. To obtain these data, two main methods exist based on statistical and simulation models [119][120][121].

Statistical models
These models employ statistical methods [122] such as generalized linear models, generalized additive models, and accelerated failure time models. An overview of these models is presented in Ref. [119] where different statistical fitting methods are compared.

Simulation models
These models use simulations in order to better imitate realistic event scenarios [72,102,111,119,123,124]. For example, to better mirror the effects of wind speed on a transmission line, computer or physical simulations could be executed by exposing the line to different wind speeds in order to observe its damage levels. More details on fragility functions and on power system fragility curves are further discussed in Refs. [16,123,[125][126][127][128].

Data for resilience evaluation
For power system resilience quantification purposes, data could be obtained from the following group of sources: power utilities, system operators, weather and government agencies. Owing to the fact that disruptive events have a low probability of occurrence, together with the fact that in some cases these data are sensitive for public release (e.g., equipment failure data), it becomes elusive to obtain them. To bridge this gap, this section provides a summary of sources that provide real data pertinent to power system resilience quantification.

Failure data
In [72], real failure and outage data (e.g.,failure duration, number of affected customers) were obtained from the service area of a power utility which lost functionality due to disruptions caused by the hurricane Ike. Similar data can also be found in Refs. [71,129] for disruptions caused by the hurricane Sandy. In Ref. [121], an investor-owned power utility that covers a three state-service area in the Gulf Coast region provides statistical data for number of outages, number of affected customers, number of damaged transformers, poles, overhead lines, switches, overhead lines, and underground lines caused by nine hurricanes. In addition, statistical data pertained to the number and duration of unscheduled outages, and the number of affected customers are provided for the state of Arizona in Ref. [130]. In Ref. [131], an online platform reports and visualizes utility level power outage information for every 15 min. In Ref. [132], data related to the damage level of the components (e.g., generators, transmission lines, transformers, substations), as well as recovery process cost and time data are provided for the Chilean power grid which experienced vast failures engendered by an 8.8 Richter scale earthquake in 2010. In Ref. [133], failure data, hurricane exposure, and hurricane damage statistics are provided for Texas investor-owned utilities.

Disruptive event data
In [134], windstorm speed data from 1851 to 2011 is provided. However for weather related data, the national oceanic and atmospheric administration (NOAA) and the national hurricane center dominate. In addition, these bodies provide a number of tools for disaster analysis including nowCOAST, a GIS web-mapping to real-time coastal information, and Digitalcoast, which provides vulnerability maps and GIS data. In Ref.
[135], a comprehensive database from 1851 till date, including raw hurricane observations and historical weather maps, is provided from the hurricane research division at NOAA. In Ref. [136], a historical hurricane track visualization tool that provides data such as maximum sustainable winds, pressure and dynamics of hurricanes is open source. In response to wildfire threats, utilities have significantly invested on wildfire monitoring systems and analytical tools which generally rely on observations from remote automated weather stations to evaluate current weather conditions [137] which are disseminated [138] and retrieved [139,140] from many sources. Other resources for obtaining real input data include the multidisciplinary center for earthquake engineering research, united nations office for disaster reduction, the federal emergency management agency, the transmission availability data system, the residential energy services network.
Furthermore, these data could be utilized in analysis and resilience assessment by integration into power system failure models such as fragility curve and component failure models, in order to test the system response to external forces introduced by the disruptive events [134]. In general, five different methods are utilized for deriving these failure models which provide input data for resilience metrics development and assessment. These are the analytical, experimental, empirical, judgmental, and hybrid methods [16,141]. The analytical methods are utilized to develop data models [16,142] when there are insufficient disaster-related component failures [16], with the Markov approach dominating in the use of the analytical method [142]. Experimental methods involve the deliberate failing of system components which could be cost intensive for large-scale system analysis. The empirical methods employ data from observations, and field measurements [71,143,144]. Logically, it is employed when there are tangible amounts of failure records and hence may be better suited for reliability analysis compared to resilience. Further, judgmental methods utilize opinions of experts in the field of the required data, but could also be riddled by bias and uncertainty given limited knowledge of these low probability events. The hybrid method combines data acquisition characteristics from the previously mentioned methods. Analytical methods are more often utilized with resilience analysis considering that these disruptive events are high impact but with low probability of occurrence, in addition, it is cost-effective compared to experimental methods as well as eliminates bias and uncertainty with judgemental methods. It is worth mentioning that analytical techniques are also preferred for small-scale system configurations because of their simplicity and low computational burden [145].

Gaps and challenges
In this section, the gaps and challenges associated with quantifying power systems resilience are enumerated. In discussing these gaps and challenges, the major findings and implications of these findings are furnished, while recommendations are made where applicable. Hence, this section not only recognizes the gaps and challenges evident in studies on power system resilience, but in discussing shortcomings and making recommendations to meet these gaps, the authors identify future directions that can serve to enhance resilience analysis and assessment.

Failure detection and self-restoration
According to Table 2, which summarizes reviewed metrics, it is noted that fewer metrics address the rapidity of detecting failures [107], self-restoration of the power system to initial stabilization [66], and effects of stakeholder decisions [55]. Rapid failure detection can greatly improve rapid recovery. When failure is accurately identified, the system response can be directed to avoid common-cause failures and also recovery strategies would be more effective. For instance, the rapid detection of line failures caused by vegetation contact can enable quick reconfiguration strategies and avoid overloading of available paths. As a critical infrastructure, the self-restoration of the power system is a priority. The self-restoration addresses the system effort towards restoration with little or no external interference and is often referred to as the self-healing ability of the system. This can greatly contribute to a deeper understanding of the effects of long-term planning strategies, and the capacity of the system to serve critical loads. For instance, when a utility invests in efficient reconfiguration techniques or even DERs, the effect on self-restoration after a disruptive event would be getting the system to an initial recovery level (operational) e.g., power supply to critical loads, before the utility has to dispatch repair crews for infrastructure recovery.

Integrating the effects of stakeholder decisions
The effects of stakeholder decisions in power system resilience cannot be overemphasized. Power system stakeholders have important decisions to make in the face of a high impact disruptive event. For instance, stakeholders make decisions about resource locations and allocations, the sensitivity settings of reclosers, de-energization when wildfire threats occur [146,147]. Therefore, the effects of stakeholder decisions should be considered while quantifying the system resilience. An example of this is the February 2021 snowstorm outage in Texas which has been blamed on the stakeholder's financial structure of deregulation and free markets putting priority to cheap prices rather than a resilient grid as there is no incentive for utilities to prepare for winter. "I think we unfortunately found out that the electric grid and the coupled natural gas infrastructure were not well prepared to deal with this rare but not unprecedented event," Overbye [148,149]. The effect was a massive reduction in system performance over the timeline of a week while costs went up 41,000% per unit of energy [150,151]. Another instance, is the October 2018 wildfire threats of northern California that led the PG&E utility to shut off power to a sizeable number of customers due to heightened wildfire risks from high winds [152], even though no wildfires occurred landing the utility a couple of lawsuits. However just one month further in November, a wildfire occurred when utility stakeholders decided not to shut off power allegedly due to the fact that management bonuses are tied not to safety, but rather the lack of customer complaints [153]. These instances promulgate the necessity of integrating stakeholder decisions to power grid resilience analysis.

Multi-facet interaction between resilience indicators
Apart from network measures which consider different resilience indicators, the reviewed metrics generally consider a resilience indicator without considering the dynamics of the chosen indicator with other resilience indicators. For instance, hardening and increasing transmission line capacity could be a good infrastructure improvement, however the strategy may also increase the risk of congestion if for instance, hardened high capacity lines fail during a disruptive event, hence further reducing the operational resilience. Another example can be illustrated by the minimization of energy cost as a resilience indicator. This could imply worse resilience if the minimal costs come from cutting off smaller, more expensive, but more operationally flexible generators as opposed to larger, cheaper generators which may not be easy to reschedule, ramp up, or down, during high impact emergencies.
In addition, for most cases where real data is not integrated, the models have often assumed thresholds, recovery times, degradation times, and other intricate parameters. The authors suggest that the accuracy of models could greatly be increased with adequate modeling that factors the availability of resources, extent of system damage, accessibility of damage to repair crew, economic costs, and budgets of the power system and other system specific parameters.

Modeling component failure
We observed that only in a few reviewed papers, component fragility curves have been utilized in determining component failures. This could be attributed to the fact that, to the best of our knowledge, only a few fragility curve types, the hurricane-windspeed fragility of transmission lines [16,142,154], and extreme heat waves and drought impact on generation units [155], has been developed. However, methods exist that can relate the effects of other influential disruptive event variables to the fragility of a variety of power system components. In this paper, we suggest the use of techniques offered by artificial intelligence in order to account for these relationships which could be non-linear. In order to obtain data that correlates the event variables with component failure, the authors suggest utilizing failure dates. The values of event variables in the days of failure occurrence can be used as training data towards predicting the target value of component failure. E.g., the real-time wind speed (for hurricanes) or temperature, landuse, terrain (for wildfires) can be among data collected and related to the component degradation with time. When the disruptive event is on the cyber side, we suggest the use of the system connectivity maps, communication packets analysis, and artificial intelligence [156], towards component failure analysis [157,165], where analytical techniques could be used in power system testbeds to model component failure due to high likelihood of data unavailability.

Spatio-temporal assessment
We encourage quantitative PSRMs that integrate the spatio-temporal properties and impacts of disruptive events. Since the high impact events such as hurricanes, and wildfires do not only have a temporal aspect but also a spatial aspect, the authors recommend the implementation of spatially and temporally detailed models towards resilience quantification. Toward this end, statistical and heuristic methods have been utilized to illustrate the spatio-temporal aspects, however, the use of Geographical Information Systems (GIS) resources are highly encouraged. On the same hand, by using GIS enabled resources, factors such as slope and topography which are influential to intensity of high impact events, could be adequately accounted for. Without these factors, the authors believe the models may be less efficient in real event scenarios, or simply generate a gross imbalance of utility assessments with respect to the event impact and progression. In addition, while we identify system resourcefulness, redundancy and time variations as design parameters for addressing uncertainty in disruptions, we also encourage the aggregation of uncertainty scenarios towards resilience quantification. Scenario generation and reduction algorithms have been implemented in Ref. [84]. From the review methodology, the authors identify metrics which mostly address this criteria as [81,86],and [92].

Minimum requirements of a quantitative resilience metric
Related to differing resilience definitions, quantitative metrics need to have specified minimum requirements in order to be referred to as a resilience metric. For instance, if a metric addresses the disruption transition phase, but not the outage nor recovery phases, does it qualify as a resilience metric? In this review, our opinion towards achieving standardization is that any resilience metric should address the four main phases of the resilience trapezoid in order to be recognized as a quantitative power system resilience metric. Otherwise, appropriate nomenclature should be used to define such metrics. For example, if a metric is defined to assess the resistance of the power grid to a disruptive event, then such metric be referred to as a "Robustness" metric as opposed to a "Resilience" metric.

System functionality vs. performance as resilience indicators
Specifically, the loss of functionality implies loss in performance however, loss in performance does not necessarily imply loss of functionality. For instance, a system to deliver a target 100 MW, delivering 80 MW is a loss in performance, while the system not delivering power at all is a loss in functionality. Through our review, we found that these concepts require clarification when developing resilience measures.

Challenges identified
Here, we discuss the perceived challenges in resilience quantification. Some of these challenges can be related to the reason why the above discussed gaps exist.

Resilience definition and quantification
Subtle disparities exist in defining power systems resilience. For example, in Refs. [158,159] resilience is defined as the system's resistance to an external attack, in Refs. [14,160] resilience is defined based on rapidity of service restoration, in Ref. [73] resilience is calculated as a measure of system adaptability to extreme events, while in Ref. [161] resilience is defined based on risk management approaches. Hence some metrics quantify resilience based on only certain phases (adaptive, absorptive, outage, or restoration), raising the question: Should a metric be referred to as a resilience metric if it does not comprehensively inform all phases of the resilience trapezoid? This challenges power utilities, regulatory authorities, and stakeholders in developing metrics that could provide standard quantification protocols for power systems resilience and aid in effectively exchanging resilience related information. This necessity is fundamental for developing resilience requirements and standards for power system planning and operation.

Smart grid technologies integration
The majority of existing resilience metrics lack in capturing the impact of grid modernization technologies, including advanced control and telecommunication methods, distributed energy resources, demand side resources, flexible loads, outage management systems, unmanned aerial vehicles, and smart switching on quantifying power systems resilience. For example, if distributed energy resources (e.g., solar generation) that could locally supply the electricity demand are system integrated, then system dependency on the main grid is reduced, generally leading to higher power system resilience levels. Although some efforts have been made in including some of these technologies in resilience metrics [35,77,162], these studies are still in pioneer stages.

Scarcity of historic data
The scarcity of real data could create a great challenge towards developing metrics and models that could more accurately capture power systems resilience. In this regard, historic data, including outage and failure data, number of affected customers, and damage levels, from previous disruptive events are in great demand to calibrate resilience metrics and models. However, obtaining historic data becomes elusive because: (1) disruptive events have a low probability of occurrence (2) the majority of these data contain sensitive information including the location of critical power system components, and therefore unavailable for public usage. Often, real data are replaced with data derived from statistical approaches.

Integrating interdependencies
The vast majority of critical infrastructure systems (water, gas, transportation, communication) are highly dependent on electric power and vice versa e.g., gas powered generators. In many cases, this interdependence if properly managed could offer additional energy flexibility to power systems [163]. Hence, it is important for resilience metrics to capture these interdependencies between the power grid and the critical infrastructure they serve. For instance, during the Texas snowstorm outages, because the power grid was down, the water distribution system was down, the internet service providers were down, as these systems were dependent on the power grid [164]. Hence, another great question posed to the power system resilience community is: Should the effect of disruptive events on interdependent systems be integrated when developing, analyzing, and evaluating power system resilience?

Conclusion
This paper presents a comprehensive analysis in power systems resilience and proposes a categorization scheme for quantitative power system resilience metrics. The proposed scheme classifies these metrics broadly under the distribution-level, transmission-level, and generic system metrics and also discuss these metrics with respect to their attributes. Furthermore, this paper standardizes reviewed metrics and generalizes them to functional requirements of a resilient system by developing an Axiomatic Design Process for functional requirements and design parameters towards satisfying resilience objectives. The review methodology presented in this paper aims to contribute toward the development of standardized power system resilience metrics. In addition, this paper furnishes statistics associated with the publications of quantitative power system resilience metrics over the past decade, as well as several sources and methods for obtaining real/simulated input data for utilization in resilience quantification of power systems. Finally, the paper discusses the gaps and challenges identified in the review, highlighting their implications while making recommendations towards improving power system resilience analysis, with one of the major highlights being the resilience quantification question posed to the resilience community as follows: Should a metric be referred to as a resilience metric if it does not comprehensively inform all phases of the resilience trapezoid?

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.