Production, Manufacturing and Logistics
Optimizing system resilience: A facility protection model with recovery time

https://doi.org/10.1016/j.ejor.2011.09.044Get rights and content

Abstract

Optimizing system resilience is concerned with the development of strategies to restore a system to normal operations as quickly and efficiently as possible following potential disruption. To this end, we present in this article a bilevel mixed integer linear program for protecting an uncapacitated median type facility network against worst-case losses, taking into account the role of facility recovery time on system performance and the possibility of multiple disruptions over time. The model differs from previous types of facility protection models in that protection is not necessarily assumed to prevent facility failure altogether, but more precisely to speed up recovery time following a potential disruption. Three different decomposition approaches are devised to optimally solve medium to large problem instances. Computational results provide a cross comparison of the efficiency of each algorithm. Additionally, we present an analysis to estimate cost-efficient levels of investments in protection resources.

Highlights

► We propose a bilevel model to protect median systems against worst-case losses. ► The model identifies which facilities to harden to speed-up recovery time following disruption. ► The model considers the possibility of multiple disruptions over a planning horizon. ► We propose three decomposition algorithms to optimally solve medium to large instances. ► We examine how changes in the protection budget impact overall system efficiency.

Introduction

In this paper, we consider the problem of reducing the impact of component failures on service and supply systems. Our specific aim is to decide which components to harden or protect, not necessarily with the intent of fully preventing component failures, but more precisely to speed-up the recovery time of the system following a possible worst-case disruption pattern.

System failure is a highly relevant and timely issue in the design and operation of modern, well-functioning infrastructure networks (Murray and Grubesic, 2007). In practice, component failures may occur for any number of reasons, including equipment breakdowns, industrial accidents (e.g., fires) and even deliberate sabotage or attack (e.g., a terrorist strike). Such failures make a system partially or wholly inoperable for a given length of time and may entail significant direct or indirect costs. This is particularly so in relation to attack and subsequent disruption of critical infrastructure networks (Church et al., 2004, Kamien, 2006) such as electric power grids, transportation hubs and public health facilities.

In many cases, preventive steps can be taken such that when and if failures occur, the ensuing downtime of affected components is reduced. The obvious benefit of this is the improved speed with which the system can be restored to full operational status, thus limiting the overall cost that may be incurred during system downtime. At the extreme end, it is sometimes possible to effectively reduce recovery time to zero, in which case a component becomes fully protected from failure and its attendant costs.

General measures designed to avoid disruption and reduce recovery time include adding built-in redundancies, expanding capacity, installation of structural reinforcements and barriers, preventive maintenance, monitoring and inspection (Parry, 1991). In areas prone to flooding, for example, a variety of measures are often taken to prevent failures and or speed-up recovery time: pumps and backup power generators can be put in place; vital road links can be repositioned on elevated terrain; levees and storm drain systems can be built or expanded.

In what follows, we propose a model for protecting an uncapacitated median type system against worst-cases losses, incorporating both facility recovery time and the possibility of multiple disruptions over time. Referred to as the Fortification and r-Interdiction Median problem with facility recovery Time (FRIMT), we adopt a defender-attacker or fortification-interdiction type framework (Brown et al., 2006) where a planner seeks to allocate protection resources among facilities in order to reduce the length of time customers must be assigned to more distant facilities than their closest one following a worst-case attack by an interdictor. The length of time a facility is out of service is referred to as the facility’s recovery time. Protection is assumed to take place once at the beginning of the planning horizon as opposed to being implemented over time. The interdictor, however, has the ability to strike a fixed number of facilities over a given time horizon. Our model is formulated as a bilevel mixed integer linear program (BMILP), where the upper level models the planner’s protection decisions and the lower level models the interdictor’s optimal response to a given protection strategy.

It should be noted that the use of an interdiction framework does not necessitate the existence of an intelligent attacker. The attacker subproblem is merely used as a device to estimate a worst-case damage scenario for any feasible protection strategy. The model is more widely applicable to problems involving natural disasters when the impacts of disruption are severe enough, as is often the case with critical infrastructure systems, to warrant a highly risk-averse decision making criterion based on minimizing the maximum possible damage.

Recent work on interdiction and fortification based optimization models has tackled a number of issues related to our current problem. In Church et al. (2004) a simple interdiction model is formulated in order to identify the r most critical facilities within an existing p-median network whose combined failure would result in the largest increase in total demand-weighted distance. Church and Scaparra (2007) build upon this by introducing fortification into the problem. A BMILP is developed in which q facilities may be protected in order to minimize the maximum possible damage. Two different solution approaches for this model are discussed in Scaparra and Church, 2008a, Scaparra and Church, 2008b. Variations on this basic protection model have subsequently been proposed that include different model facets such as facility capacities (Scaparra and Church, 2010), a random number of possible losses (Liberatore et al., 2010, Liberatore and Scaparra, 2011), the propagation of disruption over large areas (Liberatore et al., 2012), capacity expansion and security budget constraints (Aksen et al., 2010), and network flow systems (Cappanera and Scaparra, 2011).

All the above protection models can be cast as multi-level defender-attacker models, whose general framework is introduced in Brown et al. (2006). Some practical applications of defender-attacker models to critical infrastructure protection can be found in Brown et al. (2006) for electric power grids, subways, and airports and in Qiao et al. (2007) for water supply networks. Other relevant articles based on game theoretic approaches include Azaiez and Bier, 2007, Zhuang and Bier, 2007, Jenelius et al., 2010, Levitin and Hausken, 2010.

For the most part, recovery time and the concept of investing protection resources to reduce recovery time have been completely disregarded in the literature, including all of the above. Instead, most models have been constructed as being static, without any thought for the post-recovery phase during which a system is brought back into full operation.

Two exceptions are Losada et al., 2009a, Holmgren et al., 2007. In Losada et al. (2009a), recovery time and the availability of multiple attack windows are introduced into the interdiction of a median facility network. Fortification of facilities is not considered. Our present work extends Losada et al. (2009a) by incorporating fortification where different amounts of resources may be invested to reduce facility recovery times to varying degrees. Holmgren et al. (2007) consider the problem of evaluating different protection strategies for electric power grids. Protection resources can be allocated for protection and/or recovery. This model differs from our model in several ways. For example, in Holmgren et al. (2007), all disruptions occur at the same time. Additionally, no solution approach is proposed for optimizing their bilevel model. Instead protection strategies are evaluated only for a few interdiction scenarios which are defined a priori and only involve one or two vulnerable components. The protection strategies are then compared with each other according to different criteria.

Besides the model itself with its time-dynamic characteristics, one of our other main contributions is the development of several decomposition based methods for optimally solving FRIMT. Bilevel programs such as FRIMT are notoriously difficult to solve optimally. One widely used approach to solve bilevel models is problem reformulation, which usually involves dual transformation of the lower level (Israeli and Wood, 2002, Lim and Smith, 2007, Losada et al., 2009b) or replacement of the lower level by its KKT optimality conditions (Wang et al., 2000, Arroyo and Galiana, 2005). Given the presence of integer variables in the lower level of FRIMT, however, such approaches cannot be readily applied.

The literature on bilevel problems with integer variables in the lower level suggests that general solution approaches are inefficient at solving instances with a realistic number of variables (e.g., Moore and Bard, 1990). Other general approaches presented in Caroe and Tind, 1998, Sherali and Fraticelli, 2002 have not been tested on a wide variety of problems, making it unclear as to their practical efficiency. Gabriel et al. (2010) suggest a general method based on Benders decomposition and solve instances of moderate size. However, their solution approach does not guarantee the optimality of a solution. Other general and problem specific approaches for solving bilevel models can be found in Labbé et al., 1998, Scaparra and Church, 2008a, Garg and Smith, 2008, Taşkın et al., 2009, O’Hanley and Church, 2011. A comprehensive solution methodology for mixed-integer nonlinear bilevel programs can be found in Gümüş and Floudas (2005).

To solve FRIMT both optimally and efficiently, we propose three different exact decomposition algorithms. In all three, FRIMT is split into two interlinked subproblems: an upper level Relaxed Master Problem (RMP) and a lower level interdiction SubProblem (SP). Each protection strategy found by the RMP is fed into the SP to determine an optimal interdiction pattern. The result is a feasible bilevel solution to FRIMT. Cuts are then generated based on the solution to the SP and then added to the RMP. The methods each differ with respect to the specific form of the RMP and the type of cuts that are generated.

In the first decomposition method, a Benders type approach is employed whereby Adaptive Benders optimality Cuts (ABCs) are used to progressively tighten an upper bound derived from the RMP. Each ABC computes an exact value for an interdiction pattern found by the SP in previous iterations. The cuts are adaptive in that the dual variables used to define a Benders cut are not fixed but are directly optimized over during solution of the RMP. Unlike previous approaches (Gabriel et al., 2010), ours is guaranteed to converge to a proven optimal solution. In the second approach, the RMP produces no bounds and is simply a collection of feasibility cuts, known as Super-Valid Inequalities (SVIs) (Israeli and Wood, 2002, O’Hanley and Church, 2011). SVIs, unlike regular valid inequalities, remove integer solutions from the feasible space but are guaranteed not to remove all optimal solutions unless one has already been found. The SVIs are used here as a way of forcing the RMP away from clearly dominated solutions. The algorithm terminates when a sufficient number of SVIs has been added to make the RMP infeasible. Lastly, we propose a hybrid decomposition method which relies on the combined use of ABCs and SVIs.

The remainder of the paper is organized as follows. In Section 2 we revisit the interdiction problem with recovery time first presented in Losada et al. (2009a) and propose a much more efficient and scalable formulation. In Section 3 we present the bilevel formulation of our model FRIMT. Section 4 provides details of the three decomposition methods proposed to solve FRIMT. In Section 5 we compare the computational performance of the three methods on several problem instances. We also examine how changes in the protection budget impact overall system efficiency with the aim of identifying cost-effective levels of investment in protection. Finally, in Section 6 we provide a summary of the main contributions of the paper as well as discuss some problem insights and suggested areas of future research.

Section snippets

A reformulation for the r-Interdiction Median problem with facility recovery time and frequent disruptions

The r-Interdiction Median problem with facility recovery Time and frequent disruptions (RIMT) was first presented in Losada et al. (2009a). The basic modeling assumptions of RIMT are as follows. A supply system with p uncapacitated facilities provides service to n customers. Each customer is served by his closest operating facility unless the closest facility is out of service due to some kind of failure or disruption. Each facility has a recovery time associated with it, which denotes the time

Bilevel formulation of the fortification problem

Consider the following additional notation. Let q be the protection budget and mj stand for the reduction in recovery time from Gj for every unit of protection resource invested in facility j. The decision variables used are as follows. Let zjZ+ be the amount of protection resources invested in facility j, τjZ+ be the total reduction in recovery time of facility j,sjZ+ be the number of times that facility j is disrupted and xijZ+ be the number of times that customer i is served by facility j

Decomposition methods

This section provides a detailed description of the proposed exact decomposition methods for FRIMT: Benders decomposition (D-Bend), SVI based decomposition (D-SVI) and a hybrid decomposition (D-H). While the SubProblem (SP) is the same for all decomposition methods, the specific form of the Relaxed Master Problem (RMP), the type of cutting planes and the convergence criteria all differ.

By solving the RMP, a feasible protection strategy (zˆ,τˆ)Ω(Z,Γ) is found which is used to update the values

Computational performance

We tested the computational performance of the proposed approaches (D-Bend, D-SVI and D-H) on an Intel Core2 Duo T6400 2.0 GHz processor with 4 GB of RAM. The algorithms were implemented in C++ using CPLEX 12.1 (IBM) callable libraries. The computational tests were conducted on the London Ontario dataset with 150 nodes/demand points (Waters, 1977) and the UKCities dataset with 250 nodes/demand points (Liberatore and Scaparra, 2011).

In our initial testing, we solved each problem for a number of

Conclusions

In this article, the important issue of facility recovery time has been incorporated into a model that identifies the optimal allocation of protection resources in an uncapacitated median network in order to hedge against worst-case facility losses. The resulting formulation is a bilevel problem with non-convex regions due to the presence of integer variables in the lower level.

It has been shown that classical Benders decomposition can be adapted to solve optimally moderate to large instances

Acknowledgments

This research was supported by EPSRC Grant EP/E048552/1. This support is gratefully acknowledged. We also thank the referees for their valuable comments. In particular, we thank one of the referees for suggesting a tighter and more elegant version of our Super-Valid Inequality.

References (41)

  • J.M. Arroyo et al.

    On the solution of the bilevel programming formulation of the terrorist threat problem

    IEEE Transactions on Power Systems

    (2005)
  • J.F. Benders

    Partitioning procedures for solving mixed integer variables programming problems

    Numerische Mathematik

    (1962)
  • G. Brown et al.

    Defending critical infrastructure

    Interfaces

    (2006)
  • P. Cappanera et al.

    Optimal allocation of protective resources in shortest-path networks

    Transportation Science

    (2011)
  • C.C. Caroe et al.

    L-shaped decomposition of two-stage stochastic programs with integer recourse

    Mathematical Programming

    (1998)
  • R.L. Church et al.

    Protecting critical assets: The r-interdiction median problem with fortification

    Geographical Analysis

    (2007)
  • R.L. Church et al.

    Identifying critical infrastructure: The median and covering facility interdiction problems

    Annals of the Association of the American Geographers

    (2004)
  • A. Delgadillo et al.

    Analysis of electric grid interdiction with line switching

    IEEE Transactions on Power Systems

    (2010)
  • S.A. Gabriel et al.

    A Benders decomposition method for discretely constrained mathematical programs with equilibrium constraints

    Journal of the Operational Research Society

    (2010)
  • A.M. Geoffrion

    Generalized Benders decomposition

    Journal of Optimization Theory and Applications

    (1972)
  • Cited by (0)

    View full text