Abstract

A structured collection of tools for engineering resilience and a research approach to improve the resilience of a power grid are described in this paper. The collection is organized by a two-dimensional array formed from typologies of power grid components and business processes. These two dimensions provide physical and operational outlooks, respectively, for a power grid. The approach for resilience research is based on building a simulation model of a power grid which utilizes a resilience assessment equation to assess baseline resilience to a hazards’ profile, then iteratively selects a subset of tools from the collection, and introduces these as interventions in the power grid simulation model. Calculating the difference in resilience associated with each subset supports multicriteria decision-making to find the most convenient subset of interventions for a power grid and hazards’ profile. Resilience is an emergent quality of a power grid system, and therefore resilience research and interventions must be system-driven. This paper outlines further research required prior to the practical application of this approach.

1. Introduction

Power grids play an important role for modern society [1]. A failure in a power grid demonstrates a lack of engineered and engineering resilience to one or more hazards. A failure in a power grid may result in follow-up failures in the grid and other infrastructures [2]. Bo et al. [3] mapped and summarized 23 major blackouts from 1965 to 2012, representing major failures in power grids. Among the surveyed literature, no additional major blackouts were found, which is partially validated by the list of billion-dollar weather and climate disasters [4] in the US. Statistical studies show that major outages happen more often than can be concluded from statistics on minor and intermediate outages [5, 6].

Therefore, according to theoretical distributions and history of disasters, the world will experience major outages in the future. The probability of an outage in a specific power grid may be reduced by the application of resilience research to improve the resilience of power grid infrastructures.

This paper describes an approach for resilience research to improve the resilience of a power grid. It is based on mapping existing tools for resilience enhancement in a matrix-based classification for the follow-up targeted resilience research. This paper is limited by the tools arising in the literature review and so does not provide the complete list of existing tools. The literature does not provide a way to rank the tools for their effectiveness so no prioritization is available.

2. Methods

Literature review was performed to identify academic articles according to two schemas: functional and summary. The functional schema facilitated the search for tools for resilience enhancement, definitions of resilience, and experience of blackouts. The summary schema enabled the identification of summaries and reviews of resilience assessment frameworks.

Search strings for advanced search in Scopus database are shown in Table 1. The primary focus of this article is to identify and classify the tools for resilience enhancement. Other search topics provide context.

The functional review assumes that the search strings provide source papers for an incomplete yet sufficient study coverage of the topics. The functional search queries selected source papers based either on matching titles (TITLE) or on keyword values (KEY). SCOPUS allows an advanced search producing a search engine results page with a set of tools for filtering on the left panel. Using the database’s filtering capabilities, the initial list of results was cleared of non-English (e.g., German or Chinese), non-article (a peer-review is expected for most articles), and irrelevant topics (e.g., medicine or biology). The toolbox cites 54 papers, reduced from 174 papers.

The summaries’ review assumes that a reasonably complete review of resilience assessment frameworks, for the purpose of this research, is possible via study of existing review papers on this topic. Papers for review were selected by the relevance of their titles and abstracts.

3. Results

3.1. Power Grid

A power grid is a system that produces, transports, and consumes electrical energy. Generators convert fuels and other energy sources into electrical power. Step-up and step-down transformation stations border transmission lines with substations in between. The UK power grid uses several voltages with 400/275 kV in the transmission grid and 132/33 kV in distribution grids, similar architecture in other countries. Accompanied by visualization (see Figure 1), components of a power grid are listed below:(i)Producer is a part of a power grid that produces electrical energy regardless of the amount of energy or stability of production. A hydropower plant, fossil-fuel plant, wind generator farm, and a microgrid are examples of producers of electricity.(ii)Step-up/down substation is required to transform current, e.g., from a producer's 20 kV to 400/275 kV of transition grid and from 400/275 kV to 132/33 kV of distribution grid.(iii)Power line is an overhead or underground electricity transmission line.(iv)Substation is a station that transitions and controls the power flow.(v)Consumer is a part of a power grid that consumes electrical energy regardless of the amount of energy or stability of consumption. A factory, a distribution grid, and a microgrid are treated as consumers of electricity.(vi)Control is a hardware, software, or organizational part of a utility company. While control is not shown in Figure 1, it is a critical part of a power grid.

3.2. Hazards

Mukherjee et al. [9] analyzed billion-dollar blackouts in the US between 2000 and 2016, while Bie et al. [10] provided statistics on of the major blackouts in the world. While the lists of causes are slightly different, there is a major difference in the distributions of blackouts by causes between these two regions (see Table 2). For both the US and world the first three causes are responsible for over 85% of blackouts, 86.9%, and 88.67% respectively. Additional research is required to explain the difference in causes and magnitudes.

Type and intensity are the basic characteristics of hazards when considering the resilience of engineered systems such as a power grid. Type, e.g., precipitation, earthquake, and intensity, e.g., 300mm in 24 hours, 4 Richter scale, differ greatly but can have similar effects, e.g., outage of a power grid component. The impact of a hazard is at least infrastructure-, technology-, and asset-specific. Hazards and their likelihoods are specific to geographic regions, so despite both being island nations, the likelihood of earthquakes is much smaller in the UK than in Japan. A country-specific hazards’ profile must be addressed during resilience research and for engineering the resilience of a national power grid. For example, National risk register of civil emergencies, 2017 edition [11] contains a list of hazards for the UK. An analysis of global risks [12] is annually produced by the World Economic Forum, Black Sky hazards [13] were defined by Electric Infrastructure Security Council, and academia also produces lists of hazards [14].

3.3. Resilience

In 2015, Fisher [15] mentioned the existence of 70 definitions for resilience, although a list and analysis of definitions were absent. 42 instances of definitions for resilience were collected during literature review. These were grouped by similarity and the group selected for this black sky study was the one to include definitions from governmental or international organizations likely expressing the need for strategic decision-making, such as the US White House [16], the UK Cabinet Office [17], or the UN Office for Disaster Risk Reduction (UNISDR) [18].

The definition by UNISDR [18] is selected for this paper, resilience is the ability of a system, community, or society exposed to hazards to resist, absorb, accommodate to, and recover from the effects of a hazard in a timely and efficient manner, including through the preservation and restoration of its essential basic structures and functions. The intended meaning is expressed in the analysis below.

This definition lists abilities, e.g., resist, recover; however it does not provide a rationale for the completeness of this list. This definition states that this ability belongs to an entity, e.g., system; we treat the power grid infrastructure as a ‘system’. This definition states that that resilience is a function of hazards. This definition also states that resilience is conditional, e.g., exposed to hazards, effects of a hazard, and suggests either a secondary condition or an insight how resilience should be measured, via timely and efficient manner. Resilience is manifested via preservation and restoration of essential basic structures and functions, noting that not all the entity’s elements need to be restored

From the literature, the following assumptions, abstractions, and explications were made for this paper, namely,(i)A system has a normal and a disrupted state; a hazard changes a normal state of a system to a disrupted state. Resilience is the ability of an entity to execute the opposite transition, namely, from a disrupted state to a normal state, or to prevent a transition into a disrupted state.(ii)A shorter time period in a disrupted state indicates a more resilient system, as is the ability to make a faster transition from a disrupted state to a normal state. Also, limited disruption to critical parts of the entity and fast recovery of critical parts reflects greater resilience to the scale of disruption caused by hazards.(iii)Infrastructure is a complex system constructed from many alternative assets utilizing different technologies that work on different physical principles. For the purposes of this paper we assume a predetermined power grid configuration. Therefore, the manifestation of resilience is the preservation or recovery of basic functions. Basic functions are available to the customers in volume through time in locations.(iv)Time, volume and location are possible input data for resilience assessment although location is mostly disregarded in this paper as it is focused on resilience of a whole infrastructure instead of its parts and the lack of theoretical means to address physical and likely multilayered virtual subgrids.

In a normal state, an infrastructure delivers a normal volume of basic functions, while an infrastructure in a disrupted state, in comparison to expected delivery in a normal state, delivers a reduced volume. A normal output is delivered by an infrastructure working in normal conditions, while the same infrastructure under a pressure of a hazard delivers a smaller volume. The nearer the volume to the normal output volume under pressure by a hazard, the greater the resilience, and the smaller the difference between the outputs; see Figure 2.

3.4. Resilience Assessment

Hosseini et al. [19] provide a classification of approaches to resilience assessment. The general measure approach is used in this study because it is quantitative and generic. Yodo and Wang [20] listed quantitative resilience assessment frameworks for engineering systems, and this is used as the initial source of ‘general measure’ approaches.

A system produces output over time, which can be visualized via an area chart with volume on Y axis and time on X axis. This generates three main branches of resilience assessment, namely, (i) volume-focused that utilizes Y axis, (ii) time-focused that utilizes X axis, and (iii) volume-over-time-focused that utilizes X and Y axes. Most frameworks in Yodo and Wang [20] paper belong to this category.

Quantitative resilience assessment frameworks use mathematical equations. For the ease of understanding the idea behind a branch and an equation, representative examples were selected from Yodo and Wang’s paper. Volume-focused framework is referring to [21], time-focused is referring to [22], and volume-over-time is referring to [23]. The corresponding mathematical equations are listed in Table 3, while Figure 3 provides a supplementary visualization.

3.5. Business Processes

One or more organizations maintain a power grid and stable supply of electric energy. Just as a power grid may be described by components (see Figure 1, [7]), an organization may be described by business processes (see Figure 4, [24, 25]). The business processes of an organization influence the abilities of the organization to maintain the resilience of their products, in particular the operate business processes that focus on adding value to the customer. If operate processes are able to ensure that the power grid (the engineered system) is resilient, then the capabilities of people, technology, policies, and systems (the engineering system) that underpin these processes are resilient. Short descriptions [26] of business processes are provided below.

Manage processes consist of a subset of processes formulating organizations’ vision and mission, methods to achieve those, and methods for effective utilization of resources. Set directions focus on formulating the vision and mission of the organization. Formulate strategies focus on identification and definition of strategies to reach the vision and mission. Direct business focuses on use of resources in actions to reach the vision and mission.

Operate processes consist of a subset of processes that deliver value to the customer. Develop product focuses on designing the product or service that would add value to the customer.

Get order focuses on finding customers and setting contracts with the customers. Fulfil order focuses on fulfilling the demand of the market. Support product process focuses on increasing value of the product or service.

Support processes consist of a subset of processes ensuring that operate processes are running. Manage finance focuses on having a sufficient cash flow. Support personnel focuses on ensuring sufficient human resources for operate and other processes. Manage technology focuses on creating an environment without bottlenecks due to a lack of technology.

Corporate learning focuses on increasing the quality of human resources.

The criticality of the different classes of business process is not addressed; however, ‘value-adding processes’ have direct influence on operations. However some non-value adding processes, for example, manage technology, may have a direct bearing on the ability of operate processes to function.

3.6. Toolbox

Components of a power grid (see Figure 1) and business processes of an organization operating a power grid (see Figure 4) are sufficiently descriptive for high-level resilience modelling of a grid under operation. Together, these categories form a two-dimensional conceptual space. Tools to improve the resilience of a power grid may be assigned to relevant cells in the two-dimensional grid based on which component of the power grid and which business process are targeted by the tool. The tools identified in the literature review have been reviewed and allocated to the grid (see Table 4). A dash in a cell indicates that there is the absence of tools to improve resilience associated with the business process (row) and power grid component (column). Gaps may present opportunities for new tools to improve resilience or may indicate areas where resilience is either not a priority or has not yet been addressed.

Tools for improving resilience of a power grid are listed and briefly described below. These tools were suggested or mentioned in journal papers reviewed for this research. Names of these tools are taken from the papers with minor changes where needed to improve clarity. Tools are alphabetically ordered. The core idea of each tool is taken from its respective paper, and a short description is adjusted for the current paper or created when none is explicitly given. Each tool is described according to the following template: Name (abbreviation) [source] Short description. Statement on categorization.

Some of the tools are mentioned in multiple papers, yet usually a single citation is provided below indicating a reasonable reference. Because of the specifics of the literature review, it is assumed that the list of tools is sufficiently representative, yet it cannot be considered complete. Some tools are dependent on or are alternatives to other tools. This has not been addressed in the respective short descriptions. Another research may allocate tools into different categories, as a valid deductive or cogent inductive logical and ontological foundation have not been found nor created, and the current allocation is intuitive.

Assist customers (AC) [27] is with survivability features at blackouts, such as backup generators, power storages, or control components. This service is clearly associated with the end-user's part of the power grid, and it might be related to an idea for a new consultation service for the industry. It is associated with the consumer component and product development and product support business processes.

Backup power (BP) [2729] improves resilience, mostly in a form of gas-fired generators. A backup power source is installed on the customer side and is an alternative means to order fulfilment process. It is associated with the consumer component and fulfils order business process.

Blackout drill (BD) [28] is a preparatory measure for better disaster management. Blackout drills are associated with the control component and corporate learning business process.

Contingency plan (CP) [30] is a tool such as cutting off consumers by a set of criteria to minimize economic loss. Other types of contingency plans might be deduced for grid operators and consumers. Contingency plans are likely to be related to the control component and strategy formulation and directing business processes.

Control switch (CS) [29] may improve resilience, e.g., after receiving a signal from a smart meter a control software switches to a backup power or changes typology of a grid. This tool is associated with the control component and order fulfilment business processes.

Controlled islanding (CI) [3133] may improve resilience of a power grid. The core idea is based on splitting a grid into islands of balanced production-consumption, usually to avoid cascade failure, minimize lost load and speed-up restoration. Controlled islanding is related to the control component (though producers and consumers might be affected as well, and redesign of the grid would affect most of the physical components) and order fulfilment business processes.

Crew staging (CRS) [34] is a preparatory measure to improve resilience. Crew staging is likely to be associated with the control component and support personnel business process.

Damage assessment (DA) [27, 3436] provides the size and extent of the damage and resources required. Damage assessment is likely to be associated with the control component and direct business and manage technology business processes.

Damage prediction (DP) [34] is a preparatory measure to improve resilience. Damage prediction is likely to be associated with control component and direct business and manage technology business processes.

Demand-side management (DSM) [37] is a process of managing energy consumption. Demand-side management is related to control and consumer components and order fulfilment business process.

Disaster forecasting (DF) [30], if the industry is notified about the disaster in advance, damage might be minimized. Disaster forecasting is related to the control component and business direction and technology management business processes.

Disaster management group (DMG) [38] is used to manage the impacts of power outages. Groups in Germany are formed from local fire brigades, administrative departments, and a disaster management authority [38]. The UK has similar groups in energy [39] and other sectors. Disaster management group is a part of the control component and business directing business process.

Distributed automation (DIA) [29] enhances the resilience of distribution system via accurate and in-time control. Distributed automation is associated with the control component and order fulfilment business process.

Distributed generation (DIG) [29] enhances the resilience of distribution system via local sources of energy. Distributed generation is associated with the producer component and order fulfilment business process.

Gas-fired generator (GG) [40] is used for distributed generation. Distributed generation is associated with the producer component and order fulfilment and technology management business processes.

Integrate black-start resources (IB) [41] may improve resilience. It is associated with the producer component and business direction business process.

Load shedding (LS) [42] is an emergency control action to avoid cascade failure by cutting a subset of customers. Load shedding is associated with control and consumer components and order fulfilment business process.

Maintenance scheduling (MS) [4345] may improve resilience of the power grid. Maintenance is associated with the control component and technology management business process.

Manage vegetation (MV) [41] is mostly by cutting trees near overhead power lines. It is associated with the power line component and technology management business process.

Microgrid (MG) [4652] is a group of interconnected consumers and producers of energy resources that act as a single controllable entity with respect to the grid. Microgrid may be associated with producer, consumer, control components, and order fulfilment and technology management business processes.

Mobile emergency generator (MEG) [53] is used for temporal distribution generation during emergencies. It is associated with producer component and fulfils order business process.

Mobile unit transformer (MUT) [54] is used during failures of stationary transformers or their inability to process the load. It is related to substation component and fulfils order business process.

Modelling using IEEE bus test systems (MO) [33] and many other authors have mentioned that IEEE bus models are useful for evaluating for resilience changes to power grids. IEEE bus is associated with substation and control components and technology management business process.

Mutual assistance framework (MAF) [27] has been set up in Europe to aid restoration after major power disasters with access to spare parts and workforces. The framework is associated with the control component and business direction business process.

Network reconfiguration (NR) [5559] is as automatic and dynamic change of topology of the power grid. Network reconfiguration is likely associated with the control component and order fulfilment business process.

Optimal reactive reserve (OR) [60] that meets demand spikes under heavily loaded conditions allows the avoidance of voltage instability problems. Optimal reactive reserve is likely associated with the producer component and strategy formulation business processes.

Outage management system (OM) [61] can dramatically decrease the durations and sizes of power outages. Outage management system is associated with the control component and business direction business process.

Overhead structure reinforcement (OS) [62] improves resilience of the grid, which can be achieved by use of robust materials, optimization of tower foundations for soil type and weather conditions. Overhead structure reinforcement is associated with the power line component and technology management business process.

Performance prediction of renewable-based resources (PP) [63] improves resilience of the grid by overcoming uncertainty and variability of renewable-based production of electricity. Performance prediction is associated with producer and control components and order fulfilment business process.

Phasor measurement unit (PMU) [6468] is a device that provides synchronized, real-time, dense, and highly accurate measurement of current and voltage phasors. PMU is associated with the control component and technology management business process.

Post-incident investigation practices (PI) [35] are used to study major outages post factum and improve recovery for the future. These investigation practices are associated with the control component and business direction and technology management business process.

Power system stabilizer (PSS) [69] dynamically provides supplementary feedback signals aiding power system control, thus adding the grid's resilience. PSS is associated with the control component and order fulfilment and technology management business process.

Prepositioning of mobile emergency generators (pMEG) [53] is a technique for optimization of location mobile emergency generators. It is associated with control component and fulfils order business process.

Real-time statistical analysis (RSA) [70] may identify signals of an approaching blackout; thus utilization of this analysis followed by blackout preventive actions aids the grid's resilience. The real-time analysis is associated with the control component and business direction and order fulfilment business process.

Reallocate transmission routes (RAT) [41, 71]: a grid with elevated or reallocated substations (and power lines) might be less prone to extreme weather and floods thus more resilient. Reallocation of substations is associated with substations and power lines components and strategy formulation and technology management business processes.

Redundant transmission routes (RED) [71]: construction of redundant transmission lines improves resilience of the power grid. Transmission routes are associated with substations and power line components and strategy formulation and technology management business processes.

Reinforce structure (RS) [41] increases resilience of the grid. It is associated with all power grid components and technology management business process.

Research and development of transformers for resilience (R&D) [35]: continuous R&D of transformers for resilience improves resilience of the power grid. This activity is associated with the substation component and technology management business process.

Resilience-constrained hourly unit commitment (RCUC) [72]: this technique optimizes the loading of generators. It is associated with the produces and controller components and fulfils order business process.

Restoration drills (RD) [35] increase resilience of the power grid. Restoration drills are associated with the control component and corporate learning business process.

Restoration management (RM) [34, 73] may be used to increase resilience of the power grid. It is likely to be associated with the control component and business direction business process.

Restoration priority (RP) [74] may reduce loss load thus improving resilience of the power grid. Restoration priority is likely to be associated with the control component and business direction business process.

Satellite technology (ST) [75] may be used to detect power outages in real time, thus improving resilience of the power grids. It is associated with the control component and technology management business process.

Supervisory control and data acquisition (SCADA) [6, 76, 77] is a data collection tool for control and management of the grid. SCADA is associated with the control component and technology management business process.

Smart grid (SG) [10, 65, 7884] is a more resilient version of the power grid. A smart grid utilizes advanced data collection and analysis. It is associated with the control component (yet have physical and virtual sensors in all other components) and order fulfilment and technology management business processes.

Smart meter (SM) [10, 40, 55, 61, 84] is a device that feeds data for dynamic control of the grid (e.g., production, load). Smart meters are installed to all components of the grid but are associated with the control component and order fulfilment business process.

Standards’ development (SD) [35] for utility cyber control systems is a strategy for increasing resilience of the grid; this can be generalized to development of power grid standards for resilience. Standards’ development is associated with the control component and technology management and corporate learning business processes.

State estimation (SE) [68, 77] provides a real-time state of the power system, which is relevant for reactive and proactive control of the power grid. State estimation is associated with the control component and technology management business process.

Transportable energy storage system (TESS) [85] is proposed for post-disaster restoration of large distribution grids via initialization of microgrids and consists of an energy storage, means of transportation, control of transportation, and application scheme.

Undergrounding power lines (UL) [41, 71] could enhance the resilience via replacement of overhead power as the underground power lines are less prone to wind-induced damage, lightning, and vegetation contact. The price of higher installation and repair costs could be offset against cost of damage or interruption to service. Undergrounding is associated with the power line component and technology management business process.

4. Discussion

4.1. Simulation Modelling

Volume- and time-driven resilience assessment frameworks do not provide insights into points of failure, triggers, probabilities of failures or triggers, and asset conditions. These are highly relevant information to link both hazards and tools to power grid components. They are needed for resilience research and if a resilience assessment framework does not address these factors, then it has limited application. However, resilience assessment frameworks that do address these factors are subsector specific and, moreover, asset-specific and therefore have an absence of a systemic and cross-sectoral understanding of resilience. They would also require interdependency studies and cross-sectors resilience-driven projects.

The need for generic and the need for specific resilience assessment frameworks are a contradiction. This contradiction can be removed by the abstraction between two stages of resilience calculation. The system output is calculated at the sector- or asset-specific first stage by the means of real system or a simulation model. These sector-specific outputs are processed by the same function into a generic resilience values at the second stage.

Resilience is an emergent quality of a system to a hazard. Due to the constructional and behavioral complexity of the system of interest, the power grids, this emergent quality cannot be predicted without a system. A hazard and its impact on the power grid from the producers to the consumers increase complexity of the resilience research, while the number of potential tools for resilience engineering and the number of their combinations makes it a challenging research domain.

It is impossible to replicate most of hazards in a real power grid, and for many threats, testing is highly expensive to emulate even on a highly limited scale if legal, e.g., a cyberterrorism or electromagnetic interference, which makes physical experimentation as a method for resilience research a very limited application. Mental experimentation has limited use for a conceptual study of resilience which attempts to quantify the improvement from alternative interventions and is an unreasonable method for resilience engineering of complex systems using a toolbox. Post-event studies of major outages may provide some important information and insights, and the lack of observational capabilities on site during an emergency can be overcome with multitude of sensors logging power grid supply and consumption; however, access to this data is usually limited due to organizational reasons; it is a unique case that cannot be replicated, and it is ex-post evaluation limiting its usability for resilience engineering. Inductive reasoning is partially applicable to resilience research as it relies on a strong and cogent selection of facts, which is limited for the reasons above, similarly, with formalization for the follow-up logical and mathematical reasoning. Theoretical reasoning would be a method of predicting results, but the theory for engineering resilience does not exist; therefore, it is an inapplicable method at this moment. A model may selectively address the constructional and behavioral complexity of a system and provide insights into the resilience value of alternative subsets of tools without the costs or barriers of real-world experimentation which makes modelling and especially computational modelling, the most suitable method for ex-post and ex-ante resilience research.

Banks et al. [86] classify simulation models as either static or dynamic, deterministic, stochastic, discrete, or continuous. A model for resilience research and engineering must be dynamic since static models have very limited capabilities in addressing behavioral aspects; stochastic as complex multidomain deterministic rules are very challenging to define; discrete because continuous models are mostly based on differential and integral calculus that if applicable has highly specialized application, e.g., for thermodynamics. Dynamic stochastic discrete models appear most suitable; however, multiple formalisms for simulation modelling fit these characteristics, and multiple formalisms might be used within a complex computational model with some submodels being static, deterministic, or continuous.

Some architectural considerations could be made for the modelling- and toolbox-driven resilience research. It can be safely assumed that resilience research would incorporate massive simulation modelling. A colossal number of interactions between elements are expected. A state or a change of state of elements before an element may result a change of state in this element. A high-level and light-weight computation is a preferred option unless required otherwise. It is beneficial to represent a power grid as a network of assets producing, transforming, redirecting, or consuming electric energy, because it fits the typology of assets and is efficiently computable within vector, matrix, or tensor algebras and is applicable and partially transferrable to modelling of infrastructures from other sectors that is especially relevant for multisectoral resilience research.

Interfaces for control and information exchange model and submodel are critical, as is parallelization of modelling and computation to improve computational durations. A hazard may affect the behavior of consumers, which is one of viable research objectives, and it is commonly addressed with agent-based modelling. Toolbox-induced changes in a model assume the same domain language to describe assets within the model and tools from the toolbox, and ontology can address both the structure of the language and the database of assets and tools.

According to Kelly et al. [87], five modelling approaches are most commonly used for integrated environmental assessment and management. Each of these approaches has been used for modelling complex systems: system dynamics [88, 89], Bayesian networks [90, 91], coupled component models [92], agent-based models [93, 94], and knowledge-based models [95]. In additional to the summary of these approaches (see Tables 1 and 3 in [87]), Kelly et al. [87] provided a heuristic for selection of one of these approaches under standard application (see Figure 1 in [87]). However, as is shown in review papers [96107] a more comprehensive heuristics on a larger spectrum of modelling approaches would benefit the resilience research and engineering.

4.2. Toolbox

The toolbox is a collection and description of tools to improve resilience of a power grid. A grid has a certain level of resilience to a hazard, and by changing the tools available to the grid its resilience may stay the same, and it may be increased or decreased to the same hazard. A change is based on intervention of a subset of tools into the grid, and tools within a subset might be utilized with a different intensity. Simulation modelling is the most convenient method to evaluate the resilience of a changed grid.

Power grid components and organizational business processes are used to structure the toolbox, which currently may be associated with a two-dimensional array, as a subset for a change. A network-based model of a grid may also be represented with arrays. Therefore, this approach on a theoretical level allows utilization of one of the most convenient and efficient methods for mass computation, vector operations. Vector, matrix, and tensor algebras provide a mathematical apparatus for this and more complex computations, e.g., incorporating dynamics or multilayered virtual subgrids; combinatorics and set theory could support scenario design.

Qualitative resilience assessment frameworks [19] which could be used to direct attention to certain ‘cells’ in the toolbox for these frameworks are based on best practices for resilience.

Examples of such frameworks are a semiquantitative approach proposed by Shirali et al. [108] qualitative frameworks that could be transformed into a semiquantitative index and subsets of resilience indicators proposed by van der Merwe et al. [109].

The toolbox shows the areas currently lacking tools to improve resilience. While this could be partially explained by insufficiently full and systematic literature review, this also indicated the lack of interest in this area, and if the lack of interest is unreasonable, then the toolbox shows gaps in the current state of resilience research and engineering.

While the current two-dimensional classification provides a valuable categorization for the tools to improve resilience, it might be insufficient because it does not address the constructional complexity of a power grid and its components. The component dimension might be replaced or extended with system’s representation (see Figure 2.3 in [110]), systems engineering processes (see Figure 1.1 in [110]), domain ontology [111], or systems holarchy [112], though the latter is more suitable for grid modelling, but a strong relationship between the holarchy and the toolbox must be established beforehand. Other classifications might be used as well.

Similar toolboxes could be created for resilience research and engineering of other sectors of infrastructure. Multisectoral toolboxes would require an additional research, probably resulting in a higher-dimensionality of classification matrices and generic functional-constructional descriptors of tools.

4.3. Resilience Value

Most of the general quantitative resilience assessment frameworks in [19, 20] utilize volume, time, or volume-over-time data to calculate resilience value. These variables and different operators from different branches of mathematics form alternative mathematical equations for calculation of resilience value. Each equation is supported by at least one line of argumentation which could be probed and developed further. Other ideas may and would likely result from other equations, for example,(1)Minimum resilience value is taken as the resilience of the system, and the volume might now be the lowest delivery. Resilience value is calculated for each time value at the first step, and the minimum resilience value is selected at the second step.(2) Lowest (minimum) delivery volume defines the resilience of the system. The minimum value is selected for the equation with the associated time and this time is used to select the baseline value.(3) This equation can be further extended by incorporating more statistical values:(a) mean (the average) of delivered and baseline volume during at disrupted state,(b) median (the middle value of an ordered list) of volume,(c) mode (the most common value),(d) filtering out outliers with the calculation of averages (or minimum, as above),(e) using other statistical measures and statistical analysis techniques, e.g., quartiles with the associated box-and-whisker plots.(4) Statistics may provide additional insights. For example, skewness may indicate whether the system fails or recovers faster, while kurtosis may indicate whether the system has extreme drops of performance or how fast the hardest part of the system is processed. While these can be easily deduced from the visualization, the quantitative statistical technique is instrumental for automatic optimization of the infrastructure.

An equation can be used on natural (e.g., litres) and normalized (e.g., percent) data. Some considerations might require nonlinear normalization (e.g., onto a logarithmic scale for similar reasons the Richter magnitude scale is a logarithmic scale) of the input data or resilience values, in this case an equation might undergo adjustments. The nonlinear normalization might be the answer to the intensity aspect of the hazards’ profile, but this requires additional research.

The significance of each tool in isolation with respect to resilience improvement is questionable unless a sound logical and pragmatic reasoning is provided; usually, papers are lacking in that aspect. In the event the toolbox is applied for resilience research, then one or more equations would be selected, combined, and developed. Simulation modelling would support any tool and related equations, subject to limitations of the platform.

4.4. Decision-Making

Several preconditions must be met before application of a resilience assessment framework as resilience values are calculated for strategic decision-making and operationalization of the decisions. If resilience value is insufficient, then a change might be introduced into the system resulting in a new resilience value of the changed system. The difference between the old and new resilience values represents the impact of the change on resilience, and the delta resilience might be compared to delta sustainability or monetary investment.

Overall, resilience engineering is likely a continuous activity and resilience of a complex system can be addressed at different stages to disruptive events; see Table 5.

Multiple methods can be used to define the criteria to consider whether the output of each resilience assessment framework indicates a ‘good’ level of resilience. If resilience value is used for decision-making, it is multicriteria decision-making, and thus this task is about naming complementary criteria for the multicriteria decision-making. Three approaches to multicriteria decision-making are listed below from the least to the most comprehensive.

Single criteria indicators: with resilience value as supply/demand ratio, the criteria are the threshold value for the volume of delivery or capacity margin.

Multicriteria n-lemmas: an example of such concept is a well-known costqualitytime triangle with a heuristics rule that for any system it is possible to meet two qualities only. Similar triangle is associated with the resilience of electricity system [113] presenting sustainabilitysecurity of supplyaffordability triangle, where resilience is a part of security and decarbonization is a part of sustainability indicators.

Evaluation frameworks: HM treasury provides guidance on how to appraise and evaluate policies, projects, and programmes; see The Green Book [114].

5. Conclusions

A structured collection of tools for engineering resilience and an approach for resilience research to improve the resilience of a power grid infrastructure is described in this paper. The collection is a two-dimensional array formed from a classification of components of a power grid and a typology of business processes. These two dimensions provide a physical and operational outlook for a power grid. The approach for resilience research is based on building a simulation model for a power grid which utilizes a resilience assessment equation to assess its baseline resilience to a hazards profile, then iteratively selecting a subset of tools from the collection and introducing these as interventions in the power grid simulation. Calculating the difference in resilience associated with each subset supports multicriteria decision-making to find the most convenient subset of interventions for a power grid and hazards profile.

This highlights the importance of a portfolio which is strategies which should take account of variety of natural disasters relevant to the regions/geographical areas of the power grid (in addition to general hazards), as well as compensating for design decisions which can compromise resilience.

The approach outlined of iterative testing of subsets of tools assumes simulation modelling. The simulation model should be in-line with the structured description of the elements of the collection and the possible paths of impact of hazards. Hazard is a mandatory element of resilience experience; however, resilience of a system to a hazard is less relevant then resilience of a system to a hazards’ profile. Matrix algebra, set theory, and combinatorics allow automatic construction of scenarios within this approach. Volume, time, or volume-over-time resilience assessment framework could be selected or designed for the simulation modelling architecture. The resulting resilience value, in combination with other factors, could be used for multicriteria decision-making on the quality of the subset.

Resilience is an emergent quality of a power grid system, and therefore resilience research and interventions must be system-driven. Usually, multiple infrastructures are utilized by social or production systems, and a hazard often affects multiple infrastructures as well-illustrated by assessment of volcanic hazards by Wilson et al. [115], which also states the importance of hazards’ profile for a country. Moreover, interdependencies [1, 2] between infrastructures may impact the recovery and resilience of a single infrastructure. Therefore, simulation-based search/optimization for a resilient infrastructure would benefit from a simultaneous search/optimization for multisectoral resilience; e.g., Najafi et al. [52] described resilience improvement of power-water distribution system. However, simulation-based resilience research and engineering require in-depth single- or multitopic analysis of hazards, infrastructure components, vulnerabilities, tools, regions, simulation modelling techniques, and search or optimization algorithms for the follow-up model driven resilience research and engineering.

Data Availability

Published articles provide the data used to support the findings of this study. The selection rules for these articles are included within the article.

Conflicts of Interest

The manuscript has no conflicts of interest with the source of funding or otherwise.

Acknowledgments

We kindly acknowledge EPSRC grant funding reference EP/N010019/1 regarding the ENCORE project on engineering complexity resilience.