Post-Disaster Supply Chain Interdependent Critical Infrastructure System Restoration: A Review of Data Necessary and Available for Modeling

The majority of restoration strategies in the wake of large-scale disasters have focused on short-term emergency response solutions. Few consider medium- to long-term restoration strategies to reconnect urban areas to national supply chain interdependent critical infrastructure systems (SCICI). These SCICI promote the effective flow of goods, services, and information vital to the economic vitality of an urban environment. To re-establish the connectivity that has been broken during a disaster between the different SCICI, relationships between these systems must be identified, formulated, and added to a common framework to form a system-level restoration plan. To accomplish this goal, a considerable collection of SCICI data is necessary. The aim of this paper is to review what data are required for model construction, the accessibility of these data, and their integration with each other. While a review of publically available data reveals a dearth of real-time data to assist modeling long-term recovery following an extreme event, a significant amount of static data does exist and these data can be used to model the complex interdependencies needed. For the sake of illustration, a particular SCICI (transportation) is used to highlight the challenges of determining the interdependencies and creating models capable of describing the complexity of an urban environment with the data publically available. Integration of such data as is derived from public domain sources is readily achieved in a geospatial environment, after all geospatial infrastructure data are the most abundant data source and while significant quantities of data can be acquired through public sources, a significant effort is still required to gather, develop, and integrate these data from multiple sources to build a complete model. Therefore, while continued availability of high quality, public information is essential for modeling efforts in academic as well as government communities, a more streamlined approach to a real-time acquisition and integration of these data is essential.


Introduction
Critical infrastructure systems provide the backbone for socioeconomic vitality and security of urban areas. These systems are defined by the US Department of Homeland Security (DHS) as follows: Critical infrastructure are the assets, systems, and networks, whether physical or virtual, so vital to the United States that their incapacitation or destruction would have a debilitating effect on security, national economic security, national public health or safety, or any combination thereof (DHS, 2014).
A supply chain interdependent critical infrastructure system (SCICI) is composed of many systems, including but not limited to: transportation, power, communications, and water, which are interdisciplinary in nature. In addition, these SCICI exhibit complex interdependencies that must be captured to create models that are representative of the true system conditions.
Effective modeling of critical infrastructure restoration must incorporate ideas and tools from a wide spectrum of research areas including: simulation-based optimization, structural engineering, human behavior modeling, geographic information systems (GIS), and supply chain management. In general, recent disaster management studies use either a qualitative (Carlson & Doyle, 1999;Haimes 2005;Amin & Wollenberg, 2005) or quantitative methodology (MacKenzie et al., 2014;Adams & Stewart, 2014). These efforts fail to capture full system complexity by not combining qualitative and quantitative methodologies and ignoring the interdependencies that lead to emergent behaviors. In addition, the majority of restoration strategies in the wake of large-scale disasters have focused on short-term emergency rescue and recovery methodologies (Holguín-Veras and Jaller, 2011;Hale and Moberg, 2005;Widener and Horner, 2011). Few consider medium-to long-term restoration strategies that reconnect urban areas to the national SCICI. The mediumto long-term restoration of these systems requires longer time lines and larger financial investments than short-term emergency response, and so a methodology specific for these phases is necessary.
A survey paper by Altay and Green (2006) found that of 110 articles relating to disaster operations management research, 43.6% relate to the mitigation phase, 21.8% focus on preparedness, 23.6% relate to response, and only 10.9% are related to recovery (12 articles). Further, most previous studies focus only on a single aspect of one system within the SCICI (Shinozuka et al., 2007;Ouyang and Dueñas-Osorio, 2011;Rosato et al., 2008), or on emergency response processes (Bruneau et al., 2003;Vugrin et al., 2010;Reed et al., 2009). A review of disaster recovery studies categorized by disaster management lifecycle do not build a comprehensive framework that identifies the data required to build such a model but assume that the data are available (Altay and Green, 2006;Alvarez et al., 2014;Kondaveti and Ganz, 2009;Feng and Weng, 2005;Miller-Hooks et al., 2012). Operations research-style quantitative research typically focuses on game theory or inventory/sourcing models (MacKenzie, et al., 2014).
To map restoration strategies of the SCICI in the aftermath of a disaster one must first build a comprehensive framework that realistically models the SCICI in a normal environment. This requires a large amount of data be integrated across many disciplines. One tool that is useful for this research is geographic information systems (GIS) technology. GIS can be used to examine the interdependency among critical infrastructure systems (Sinton, 1992;Ramachandran et al., 2015a) or depict geographic correlations within critical infrastructure elements (Burrough, 1990;Goodchild and Haining 2004;Greene, 2002;Ramachandran et al., 2015b). But a multi-dimensional approach to this modeling has yet to be considered (Mitchell, 2005;Zeiler, 2010;Openshaw, 1994).
Models required for planning the restoration of SCICI systems must capture real-world complexities and use real-time data to be useful to decision-makers. Geospatial data plays a key role in SCICI restoration; thus, there exists the need to understand accessibility issues and inherent uncertainties associated with such data. While federal, state, and local entities routinely use GIS technology with subsets of SCICI data in disaster planning activities, using these data to map infrastructure elements, their interdependencies, and their restoration in the aftermath of an extreme event has seldom been done (Fletcher, 2002). As an important first step, this article documents the use of publically available data for the creation of complex SCICI models.

Method
The emphasis in modeling critical infrastructure systems has been on developing methodologies and algorithms, rather than on incorporating real-world data. Most studies have taken a one-dimensional approach wherein it is either assumed that the required data is hypothetically complete and available, or synthetic data is generated for analyses when needed. It is difficult to understand all the complex interactions that exist between infrastructure elements and systems based on such approaches. In this study, the transportation infrastructure system within SCICI is used as an example to illustrate its complex interactions with other SCICI systems and categorize, integrate, and analyze the data required to properly model this system. The transportation (logistics) infrastructure system presented here includes the transport mode (road, rail, air, and water) infrastructure, the freight that is moved through these modes, and the storage of that freight. As with any system that forms a component of the larger SCICI system, a model of this component system must be created with the understanding that it be integrated into a larger SCICI modeling framework. The construction of a restoration model of any element of SCICI damaged due to a large-scale disaster can be divided into five work-flow phases: acquisition and integration of data, SCICI system modeling, SCICI interdependency determinations, hazard damage simulation, and restoration modeling. A work-flow diagram for the transportation infrastructure system is shown in Figure 1. Each phase requires different types of input data, typically in diverse formats (including non-digital formats) and stored in different databases on different computers. While this presents a challenge to the modeling effort, the identification and integration of these data are essential for creating realistic SCICI system models.
The acquisition and integration of data phase incorporates all data necessary to make a realistic model of the pre-disaster SCICI system for the region under consideration. For the transportation infrastructure system this would consist of: (1) freight data -storage/distribution facilities data, modes of transport and their capacity, and flow data, and (2) infrastructure data -with respect to the capacity the infrastructure can sustain and the location of each infrastructure element. Typically these data are not readily available in digital databases, may be proprietary, and/or come from multiple sources, making its integration daunting.
The SCICI system modeling phase combines the data from the previous phase to construct a model of the SCICI system and how it operates to perform the tasks necessary to accomplish the overall SCICI goals. The transportation infrastructure system model incorporates the freight data, system capacities, and the available transportation network from the acquisition and how it works together to move goods throughout the region being considered.
In the SCICI interdependency determinations phase, the interdependencies are mapped between SCICI systems both internally and to the external regional, national, and global supply chain elements. This is crucial to any restoration efforts. Through these interdependencies it becomes possible to detect critical points of failure that can cause a cascade effect damaging many elements upon the failure of a single element.
The hazard damage simulation phase gathers information related to the critical points and determines how potential hazards might affect these weak points in the SCICI. This allows for the testing of restoration modeling before the onset of a large-scale disaster. In the event of a disaster, the actual damage itself would be the input data for the restoration optimization model rather than simulated damage.
Finally in the restoration modeling phase scheduling and work flows are created to return the SCICI system back to the pre-event capabilities. Optimization techniques are applied here to develop plans that allow for the reassembly of the transportation system in a relatively efficient manner. In the case of the transportation system at hand, this would involve both reconnecting the transportation modes and restoring the capacity of those connections to pre-event levels. After identifying the data required to model the SCICI systems it is necessary to acquire these data. Given the amount of data that must be collected there are several challenges. Table 1 shows data requirements for mapping the transportation system of SCICI and also identifies several difficulties in acquiring these data. Transportation is restricted to the transportation of physical goods (as opposed to information, services, electricity, or the like). This is accomplished through one or more modes of transportation (air, rail, pipeline, water, or road). Hence, the data required for these different transportation modes include, but is not limited to: capacity, location, and freight forwarding capabilities. Further, much of the data required to model the transportation of goods is owned by private companies who are generally unwilling to share such information. As a result, acquiring the necessary datasets or resources can be time-consuming and introduce many uncertainties. To account for this, no proprietary data is represented in the following discussion of the different data types.

Freight/Freight Flow Data
Freight data include information about commodities shipped, their weight, manufactured goods versus raw materials, and the value of materials that are transported. In addition, the mode of transportation (rail, road, air, water, or pipeline) used to ship the goods and the holding capacities of each mode for a given area are included in these data. Freight flow data are typically measured in tons of goods transported and recorded as tons/commodity/mode by the National Transportation Atlas (NTAD, 2010). The primary source for freight data is the Commodity Flow Survey (CFS) of 2013 (U.S. Census Bureau, 2013). It is a public database that contains information on domestic interstate freight. Data are fed into this database through a variety of sources, but the primary problem with these data is their resolution and completeness (LeBeau, 2006). Data gaps can, in part, be removed by estimating values for a commodity using a gravity model of spatial interactions, which can be used as a method for determining facility locations (Holguin-Veras and Jaller, 2011; Nan Liu and Vilain, 2004;Peréz Lespier et al., 2015). Origins, destinations, and modes also require estimation due to the gaps in freight data. In general, these data provide enough information to form estimates for missing data (Transportation Research Board, 2003). More accurate data likely exists, but it is proprietary in nature. Since most freight transportation companies are privately owned; the modes used, commodities shipped, routing (including transshipment facilities), and tonnage are either under-reported or the data is not available to the public. In these cases it is necessary to estimate the missing data based on the publically available data. The data regarding commodities passing through a state are generally available, and from this information the flow of commodities through a particular area can be estimated. The tonnage transported can be a major factor in assigning priorities within restoration models (e.g. the greater the tonnage transported, the higher the priority that mode of transport has during the restoration process).

Transportation Infrastructure Capacity Data
Infrastructure capacity data incorporates holding capacities of infrastructure facilities that aid freight flow such as cargo hubs. When considering hubs that store goods and commodities, the multimodal nature of modern cargo transportation systems is important. Goods may arrive by river or sea, be stored in a waterhub, be picked up by a truck and subsequently stored in a road-hub. There are four main types of hubs considered here: Water-hubs, Rail-hubs, Road-hubs, and Air-hubs. 1) Water-Hubs form the largest and most diversified hubs in the transportation system. They facilitate transportation services for many types of products via barge or ship. They are also multimodal hubs that act as transfer points for many types of products from water modes to other modes such as rail, pipeline, air, or truck. An inherent problem with the data associated with water-hubs is that a variety of information unique to that hub is needed. 2) Rail-Hubs are most commonly rail freight yards. These hubs require a great deal of space for multiple tracks and are therefore most likely to be located on greenfield sites within or near major industrial zones. Rail-hubs generally have very large holding capacities and also act as multimodal hubs. 3) Road-Hubs usually store freight which is very diverse and bulky. They also act as multimodal hubs, shipping and receiving goods from road, rail, air, and water. Road-hubs are generally located just off major interstates to reduce transportation time. 4) Air-Hubs are located at airports connected to major road networks that allow for the rapid flow of people or cargo. These constitute the smallest hub connection due to the relatively high costs involved with air transport.
The data required for these hubs include freight handling data (what equipment is required for loading), information about the facilities required to accommodate ships, trucks and trains (berths, loading bays and freight yards respectively), total capacity data according to type of goods they can store (cold storage, hot storage, hazardous material etc.), and freight flow. Most of the transportation data for road and rail is

Geospatial Data
Geospatial datasets contains the location information associated with various types of data and as such forms the base into which other data are integrated. The geospatial data include the locations of hubs, warehouses, utilities, infrastructure, and all other objects or materials that could be damaged and in need of repair or replacement from the impact of a large-scale disaster. Most of these data are available or can be derived from geospatial-centered websites like The National Map (TNM) of the U.S. Geological Survey. A shortfall of these data is their static nature. Most geospatial data are updated yearly or over the course of several years, so as new warehouses and hubs are built, the geospatial data will not convey these new sites until the next update cycle. Also, the extraction of these data from such geospatially located sources as orthoimagery can be quite time consuming and require specialized personnel for the process. The advantage of these data is their free availability, large area coverage, and accurate overview of ground features.

Restoration Data
Restoration data are records containing information on rebuilding or recovery activity rates. These data include the number of skilled workers available for restoration activities, raw material stockpiles, necessary equipment accessibility, the time required for teams to assemble within a given area, and collaborations between invested agencies: federal, state, and local. These data come, in part, from personal interviews with people experienced in disaster reconstruction and from published agency reports on restoration activities. Typically these data are not available in electronic format and, for the most part, integrating these can be time consuming. Much of these data are specific to the type of disaster experienced. Nevertheless, elements are often generalizable and can be used in developing restoration estimates for most damage estimates.

Hazard Data
The damage experienced by the transportation sector will, of course, depend on such variables as the type of disaster, its severity, duration, the vulnerability of the infrastructure, and the like. The actual damage experienced must ultimately be input data into any reconstruction optimization model, nevertheless, for the purpose of testing such a model a damage estimate can be simulated. Such a simulation requires hazard risk evaluation data as well as SCICI survivability estimates. Much progress with such simulations has already been made by FEMA (2003) and can be accessed in the HAZUS-MH software which provides simulations of some network vulnerabilities to different hazards

Role of GIS in Data Acquisition and Integration
GIS offers tools that make the acquisition and integration of SCICI system data more tractable. Data layers from The National Map of the U.S. Geological Survey include orthoimagery, elevation, hydrography, transportation, place names, and land cover, and can be downloaded directly into a GIS database (Sugarbaker and Carswell, 2011). The orthoimagery serves as an excellent, if rather memory-extensive, base map from which to hang existing data sets and to extract further SCICI data. The orthoimagery projection is used as the default coordinate system into which all other data will be projected. Anything that is visible in the orthoimagery can be extracted by digitization as new SCICI data features (e.g., the locations of culverts, cell towers, electric power lines, bridges, pumping stations, etc.). Further analyses of the orthoimagery also provide the ability to estimate capacity of these infrastructure elements as well (e.g., number of road lanes, number of rail tracks, dock lengths, electric line voltages, etc.). In addition, many local and regional government agencies (state departments of transportation, state departments of commerce, city utility districts, etc.) have data that can be integrated into a GIS database. To create the transportation system network, GIS is used to represent real-world features that are populated by discrete identifiable objects to build network analysis models based on graph theory representing transportation elements as vertices and edges.

SCICI Interdependencies
One of the main characteristics of SCICI elements is the multiplicity of interdependencies between them. For example, a water pumping station, in order to function, requires electricity to run the pumps, communication to control how much water needs moved, water lines through which the water will move, and roads to access the station. In any attempt to return functionality to a pumping station after a large-scale disaster, it is necessary to know the local interdependencies such as which electrical lines ran into the power station, what roads access it, what cell tower communicates with it, and through which lines water moves into and out of the station. Less obvious, but equally important, it is necessary to understand that these connected elements are interdependent on far field elements such as which power station feeds electricity to the sector of the pumping station, which substations transform the power into usable voltages, what communication path moves from the controller to the pumping station, what bridges are available to move material and manpower to the pumping station for repair, where are there any damaged water lines between this pumping station and those before and after it. The main contribution in the acquisition of all these data and their integration into a GIS is the ultimate ability to map out these interdependencies through the SCICI model.

Results
The modeling techniques presented here make use of the high resolution imagery provided by The National Map to identify both the location of system elements and their proximity to one another. This spatial information identifies the interfaces between the systems and captures the interrelationships that give rise to complex responses. The interrelationships are driven by the system specific information, in the case of the transportation infrastructure system this is the freight and infrastructure data. In order to test the efficacy of the integration of these data into the proposed modeling techniques, the St. Louis, Missouri metropolitan region was chosen as a test area. This area is covered by 2268 orthoimagery tiles from The National Map with cell-lengths ranging from 0.15 m to 0.6 m. These tiles constitute the base map onto which other data layers are projected. Considerable transportation data (particularly roads and rail lines) are available from state (in this case, Missouri and Illinois) departments of transportation. Much of the rest of the data are extracted from orthoimagery by heads-up digitization or other sources as shown in Table 2.
Many of the features that need to be digitized have a three-dimensional structure (e.g. cell tower, electric poles, etc.). To reduce the effects of parallax, features extracted from the orthoimagery are preferentially digitized at their base (for example, where a pole and its shadow meet). It should be noted, however, that since these data are extracted from the orthoimagery, only elements that are visible from the air can be digitized. Some elements (such as sewer lines and water mains) can be interpolated based on surface features in high-resolution orthoimagery (man-hole covers or fire hydrants), whereas others (buried telephone lines, electric lines and fiber optic cable, and gas mains) are best obtained from other sources which are often more difficult to obtain. In addition, where high resolution imagery is not available (typically outside of urban settings) the level of detail would correspondingly decrease.
In spite of there being relatively few SCICI databases available to the general public that can be used for realistic models of disaster restoration, a considerable amount of infrastructure data can be gleaned from public sources, as shown in Figure 3 for South St. Louis, Missouri. This indicates that a large amount of data applicable to SCICI systems is available from public datasets alone. To date, 640 Gigabytes of data have been acquired for review. While this is rather large for real-time processing and model manipulation, the size of the data needed to describe actual infrastructure elements such as bridges, culverts, road networks, electric grid, communication networks, dams, locks, rail networks, water facilities and docks for the St. Louis metroplex is less than 100 Megabytes. This presents a complicated tangle of infrastructure elements, but with preprocessing it can now be fit into a model that will begin to piece together the interdependencies  of the SCICI which is crucial to their restoration in the wake of a large scale disaster. A small example of one part of this database is presented in Table 3. However, even with this rich source of SCICI data, severe limitations still remain. One of these is that orthoimagery data must, by its very nature, be considered a static data source. It is a picture of the SCICI environment at the time of the flyovers, and these are not updated until the next flight cycle occurs which is generally between 3 to 5 years. Changes made to SCICI between data cycles cannot be incorporated into the model by this method.
Also, the labor intensive digitization on such a massive scale of infrastructure elements introduces many human errors into the data including features that are missed, erroneously added, misinterpreted, or digitized inaccurately. While this is potentially serious in individual cases, the sheer quantity of the data should permit the proper interdependencies to emerge; which is the ultimate objective. Again like the damage assessment simulation, the input of the infrastructure, once the techniques for mapping of the interdependencies is complete, will be input by individual communities. As electrical grids, water distribution systems, gas lines, etc., become more 'smart', (Amin and Wollenberg, 2005;Gao et al., 2012;Gungor et al., 2011) data can be fed directly into the model from sensors, giving a dynamic, real-time dimension to the analysis.

Conclusion and Future Work
Integrating hazard, human intervention, restoration, geospatial, freight flow, and infrastructure data for each SCICI element helps create a complex model of SCICI. This complexity arises not from the data itself, but in the interaction of SCICI processes which these data map (for example, an electric pole is not complex, but what happens to a water pumping station, a warehouse refrigeration unit, and several traffic lights if that pole were to be destroyed can lead to complexity). While separately these processes are complicated, in essence it is their interaction and interdependence that generates nonlinear behavior (complexity).   Table 3: For illustrative purposes, a small sample of the infrastructure database is displayed, specifically a few elements of the telecommunications tower infrastructure are displayed. 'FID' and 'OID' refer to internal indices, while 'Air Photo Verified' refers to whether a tower is visible on orthoimagery.
However, all SCICI elements have a common property: they all have complex components which interact with each other. The larger the scale of the SCICI, the more complex its systems, and the more it starts to display unexpected and nonlinear behavior. It is this behavior that can lead to cascading failures throughout many of the SCICI elements when a single unit fails. A major goal for modeling and optimization techniques is to see such failures in the system and rapidly repair and even improve complex infrastructure. This research addresses a gap that exists in literature associated with the acquisition and integration of the different types of data which must be brought together in order to build complex and robust models of supply chain systems. Geospatial infrastructure data is the most abundant of these data, and while much of it is acquirable through public sources, a serious effort is required to gather, develop and integrate these data. Continued availability of public geospatial data is of paramount importance because no single utility or private firm has access to the various sources of data necessary to model supply chains that feed their own function. Further, much of the modeling is done in academic communities outside government circles which preclude access to restricted or classified data.
The bulk of the freight flow transportation data are proprietary, this requires that reasonable assumptions be made regarding data that are not accessible. Nevertheless, this research suggests that there is sufficient data available in public domain to create a realistic model of the transportation system, and that this model is scalable to the other elements of SCICI.
Future work will increase the quantity and diversity of real-world data to expand through the other SCICI elements. Mapping the interdependency between SCICI elements is essential to the construction of supply chain modelling. These interdependencies are important due to the complexity of the systems. Further, sophisticated modelling and optimization techniques need to be created to explore the efficiency of restoration schema.

Competing Interests
The authors declare that they have no competing interests.