Airline disruption management A literature review and practical challenges

Airline disruption management is an active ﬁeld of research. In recent years, there has been an increase in publications, in particular, in of works integrating two or more resources (i.e., aircraft, crew, passengers) in the recovery analysis. Given that more than 50% of the papers have been published after the last lit- erature review paper (Clausen et al., 2010), this paper provides a critical review and classiﬁcation of the literature between 2009 and 2018 regarding airline disruption management, including aircraft, crew, passenger, and integrated recovery. Furthermore, this paper discusses various ways to close the gap between the reality faced in Airline Operations Control Centers (AOCCs) and deﬁnes a set of potential future lines of research. The objective is to minimize fuel costs as well as passenger compensation and inconvenience cost. A rolling horizon is used to decrease computation times. The model is tested on a network of Kenya Airways, an international hub-and-spoke carrier. For the case study the ﬂight schedule of 8 days, consisting of 250 ﬂights, was considered. A full day of operations is solved in less than 60 min.


Introduction
Poor weather conditions, congestion at hub airports, and aircraft mechanical problems are just a few of the causes that prevent airlines from operating their flight schedules as planned. Flight cancellations, departure and arrival delays can occur. These irregularities in operations are called disruptions. Disruptions are very common in the airline industry, greatly impacting the realized operational performance. To mitigate the effect of these disruptions, intervention by the airline is necessary to repair flight schedules, aircraft schedules, crew schedules, and passenger itineraries. Consequently, disruptions may result in a significant increase to an airline's operational costs, e.g., additional crew overtime, increased fuel usage, passenger delay compensation, or re-accommodation cost. For a clear overview of the problem, recovery process, resources involved, and objectives considered, the reader is referred to Chapter 10 of the second edition from Belobaba et al. (2015).
According to statistics from EUROCONTROL (Walker, 2017), in the third quarter of 2017, almost 24.0% of all scheduled flights in Europe suffered from delays, which is equal to around 6500 delayed flights per day. Ball et al. (2010) showed that in 2007, the total delay cost in the airline industry in the United States (US) was $32.9 billion from which $8.3 billion was of additional expenses for fuel, crew, and maintenance. Because of the significant associated costs, the use of efficient and accurate recovery processes is of great importance to the airline industry.
There have been a few publications reviewing the literature regarding airline disruption management. Clarke (1998) presented the first overview of the state-of-the-art current information systems, and decision support systems used in operations control centers regarding irregular operations. This overview is based on field studies at several airlines. Filar and Prabhu Manyem (2001) reviewed literature in the area of recovery from schedule disruptions, incorporating the perspective of airports. More recently, Kohl et al. (2007) offered an introduction to airline disruption management, provides a description of the planning processes, and delivers a detailed overview of the numerous aspects of airline disruption management. Furthermore, they report on experiences from project DESCARTES, a development project on airline disruption management supported by the European Commission. In the same year, Ball et al. (2007) described models for aircraft, crew and passenger recovery. Furthermore, it provides a survey on the topic of developing schedules that provide operational robustness as a proactive alternative to schedule recovery. Clausen et al. (2010) provided a comprehensive review of the literature covering L.K. Hassan, B.F. Santos and J. Vink Computers and Operations Research 127 (2021) 105137 airline disruption management, including aircraft, crew, passenger, and integrated (i.e. combining aircraft, crew and passenger in one model) recovery. Furthermore, an overview of model formulations and common network representations is provided. This paper updates the previous literature surveys. Three online bibliography databases (Web of Science, SCOPUS, and Google Scholar) were searched for peer-reviewed publications written in English proposing decision support models for airline disruption management. Furthermore, conference papers that are published in conference proceedings were also included, if the publication is indexed at SCOPUS. The focus of the survey was on the literature presenting decision-support solutions to be used during operations at the Airline Operations Control Centers (AOCCs). For this reason, the survey excludes papers not presenting a modeling solution. Furthermore, we opted to not include papers addressing airline schedule robustness. The schedule robustness research can be classified as being a disruption mitigation effort, done at a tactical stage during scheduling and before disruptions are known. In this paper, we focus on the disruption management problem that takes place at the control stage, during the operations and when disruptions may occur. Nevertheless, there is also a vast literature on the topic of airline schedule robustness. The reader may refer to, e.g., Ahmadbeygi et al. (2010), Burke et al. (2010), Duran et al. (2015), and Cadarso and Luis (2017) for good reference works on airline schedule robustness.
The following search query was used during the search:((''airline recovery" OR ''aircraft recovery" OR ''crew recovery" OR ''passenger recovery" OR ''schedule recovery" OR '''integrated recovery") AND (''airline" OR ''aircraft")) OR((''disruption management" OR ''irregular operations") AND (''airline" OR ''aircraft")) This query resulted in a total of 110 papers, from the earliest possible start date until June 2020 (Fig. 1). The analysis of these papers provides some interesting insights: (1) there is an increasing interest for solving airline disruption management problems, (2) more than 50% of papers have been published in the last 10 years, after the last literature review paper (Clausen et al., 2010), and (3) since then, there is an increase in the number of publications that integrate two or more resources in the recovery process (i.e., aircraft, crew, and passengers). Therefore, this paper reviews and classifies the airline disruption management literature from 2009-2020, analyzing methodological trends, such as the integration trend, and discussing the existing gap between the capabilities of the state-of-the-art and the requirements for implementing these tools in practice.
This paper is divided into two more sections. In Section 2, we critically describe aircraft, crew, passenger, and integrated recovery as presented in the literature. Section 3 concludes the paper and describes various ways to close the gap between the reality faced in AOCCs and the capabilities of the state-of-the-art.

A review of disruption management
The complete airline recovery process is a very large and complex problem that is commonly divided into several sequential stages. These stages are broadly categorized as schedule, aircraft, crew, and passenger recovery, also defining clear boundaries for research in this area. Schedule an aircraft recovery is commonly solved at once. In this section, we will review the airline disruption literature. We start with an overview of the research efforts until 2009, followed by a detailed analysis of the literature from 2010 until June 2020. The latter is divided according to the resources modeled and classified them according to the type solution methodology, i.e., exact optimization methods, (meta-) heuristics, hybrid methods, multi-agent systems, and other methods.
Exact optimization methods, such as branch-and-bound algorithms, as implemented in commercial linear programming (LP) solvers, dynamic programming, and conic programming, guarantee finding the global optimal solution. With most optimization problems, exact methods are the method of choice. With NP-hard problems, such as airline recovery, the situation is different since the computation time grows exponentially with problem size and exact methods can become intractable. Even medium-sized problems use extensive computation time to solve, which makes them unfit for operational use. To overcome these problems, (meta-) heuristics can be used. These methods are commonly applied to solve computationally intractable combinatorial optimization problems to a sub-optimum, such as greedy, genetic, or simulated annealing algorithms. The effectiveness and quality of solutions depend on the heuristics ability to adapt to a particular problem, exploit the problem structure, and avoid getting stuck in local optima.
Some authors have adopted hybrid methods. These are methods involving the combination of exact methods with algorithmic techniques (e.g., mixed-integer linear programming (MILP) with column generation or decomposition techniques) or the combination of different heuristics in a single solution technique. Multi-Agent Systems (MAS), an emerging approach in airline disruption management, are software systems composed of multiple interacting intelligent agents. Here, intelligence may be algorithmic search, reinforcement learning, or procedural approaches among others. MAS typically refers to software agents, but could equally well be humans. In the context of airline disruption management, MAS usually represents the Operational Control Center of the airline by adopting autonomous but interacting agents that try to solve the aircraft recovery, the crew recovery, and the passenger recovery problems at the same time (Castro and Oliveira, 2007). Finally, in the category others, we included techniques like constraint programming or simulation approaches.
Section 2.1 presents the initial efforts on the topic of airline disruption management. Section 2.2 will present the literature focused exclusively on aircraft (and schedule) recovery. Section 2.3 presents the literature covering crew recovery. Passenger recovery will be discussed in Section 2.4. Several publications that integrate two or more stages of the recovery process. Section 2.5 discusses papers that integrated aircraft and passenger recovery while Section 2.6 discusses papers covering integrated aircraft and crew recovery. Literature that integrated the full recovery process, that is schedule, aircraft, crew, and passenger, is presented in Section 2.7. At the end each subsection, a summary table is presented, describing each publication according to the type of network used to represent the problem (i.e., time-space, connection, and timeband networks, following the description presented by Clausen et al. (2010)), the type of solution technique and a short description, some key functionalities and the dimensions of the largest case study presented. Some fields in these tables are empty, meaning that no information was presented in the paper or that the respective field is not relevant for that publication. Teodorović and Dušan (1984) were the first authors that discussed the minimization of passenger delays in the aftermath of schedule perturbations. The authors considered the case one or more aircraft fail, having the delaying of flights and the swap of aircraft as recovery options. The authors formulate the problem as a network in which flights are represented by nodes and arcs are used to represent time losses per flight. The objective is to minimize the lost time by passengers. Their methodology utilizes branch & bound methods and is based on the assumption that the airline operates only one aircraft type. Furthermore, mainte- nance constraints are ignored and the model was tested on a network of eight flights operated by three aircraft. Passengers are explicitly modeled, but they assume that all itineraries contain only a single flight leg. Teodorović and Dušan (1990) extended this work by considering airport curfews as constraints and flight cancellations as a possible recovery action. A dynamic programmingbased approach is used where the goal is to minimize the total number of canceled flights and the total passenger delay. The model was tested on a network of 14 aircraft and 80 flights. Crew and aircraft maintenance constraints were added in a following work of the same authors, Teodorović and Dušan (1995). In this new study, several disruption types, such as crew unavailability and flight delays, were included as well. The paper presents a heuristic based on the First In, First Out (FIFO) principle and a dynamic programming based sequential approach. The model determines aircraft and crew rotations while minimizing the total number of canceled flights. The model was tested on 240 generated instances. Four to five different disturbances were arbitrarily generated for each of the 240 numerical instances, so that the developed models were tested on over 1,000 different situations.

Aircraft recovery
The aircraft recovery problem can be formulated as follows: given a flight schedule and a set of disruptions, determine which flights to delay or cancel, and re-assign the available aircraft to the flights such that the disruption cost is minimized. These disruption costs are defined as those costs over which the airline still has control at the time of disruption, such as aircraft operating costs and compensation to be paid to passengers for canceled or delayed flights. These recovery problems are generally formulated as cost minimization models, rather than profit maximization models, since when disruptions occur the airline's revenues are fixed since tickets have been sold in advance. What remains is the search for the lowest cost operation to complete the itinerary sold to the passengers. Before 2009, the majority of publications focused on aircraft recovery, in part because (1) aircraft are the most constraining and expensive resource and (2) aircraft recovery is a smaller and simpler problem than crew recovery (which involves complex regulations and pilots' preferences). Despite this, aircraft recovery is still an active research subject, where the efforts have been focused on increasing complexity to better represent real-world networks and decreasing the computation time.
Most studies addressing aircraft recovery consider aircraft unavailability and airport disruptions as the main disruption types handled. Two different approaches were followed to implement airport disruptions: either by considering airport disruptions to be binary (e.g. normal operations or closed), for example, Eggenberg et al. (2010), , or by considering flow reductions as a percentage of the initial airport capacity, for example, Liang et al. (2018). From the 27 papers reviewed, 20 (74%) considered aircraft unavailability while 12 (44%) considered airport disruptions. Common recovery actions for papers addressing the ARP include flight delays (93%), flight cancellations (89%), and aircraft swaps (89%).
Since recovery models are a representation of reality, some assumptions are needed to model the disruption problem. A common assumption, followed by almost all papers, is that crew is always available to perform the flights in the recovered schedule (e.g., Sousa et al., 2015;Vos and Santos, 2015). Another common simplification is the exclusion of airport capacity constraints or slot availability (e.g., Liu et al., 2010;Arias et al., 2015). Furthermore, the majority of papers do not include maintenance constraints in their models (e.g., Sousa et al., 2015;Zhao and Tong, 2018), which in reality would limit recovery options. Finally, nearly all studies assume that departure times for all non-disrupted flights are certain and will not change, i.e. that no other disruptions occur.
This section discusses and classifies papers published after 2009 addressing the aircraft recovery problem. The section is divided by solution technique category. An overview of the papers discussed in this section is provided in Table 1.

Exact optimization methods
Aktürk and Atamtürk (2014) were the first to successfully integrate cruise speed control to deal with the Aircraft Recovery Problem (ARP). The authors consider the option of speeding up flights to reduce delays, at the cost of higher fuel costs. Due to the nonlinearity of fuel burn in cruise speed, the authors use a conic quadratic optimization approach to solve the problem with minimization of recovery-related costs like swap, fuel consumption, and passenger delay. Environmental cost and constraints were integrated next to the additional fuel cost of speeding up flights. It is stated in the paper that significant cost savings can be achieved with cruise speed control, making it a suitable recovery approach to include in aircraft recovery studies. Xu et al. (2015) presented a time-band approximation model to approximate delay cost considering a stochastic flying time. The model is formulated as a MILP model and solved using a commercial LP solver. With data on the actual flying time and the planned flying time from 400 flights in a day of Sichuan Airlines, the authors create a uniform probability density function which predicts the flying time of flights. The model is tested on a network of generated data with 3 aircraft and 11 flights. Xu and Haiwen (2016) extended the work by presenting the weighted time-band approximation model that incorporates a simplex group cycle approach. Here the model is tested on data from China Airlines. Liu et al. (2010) presented a hybrid heuristic that combined an adaptive evaluated vector (AEV) and an inequality-based multiobjective genetic algorithm (GA) formulation that was used to search for Pareto solutions to the daily short-haul recovery problems. The AEV was used to guide the search and the GA was to provide the multi-objective solution. Although considering aircraft swap and retiming options, the model does not consider flight cancellations as a recovery method. The presented model is tested on a daily flight schedule of a Taiwanese airline with 7 aircraft (single fleet) during a 1-h airport closure, impacting 39 flights. The heuristic presents results in 3.6 min on average (7.5 min max). Despite the short computation time, this model still takes more than the 2 min run time required during operations, as suggested by Vink et al. (2020). Wu and Cong (2012) developed a model based on flight strings instead of individual flights. They transform these strings into a time-space model that considers maintenance constraints and regulations. The model is solved with a heuristic that was developed by the authors called the Iterative Tree Growing with Node Combination. The model is tested on a dataset from China Airlines consisting of 170 flights, 5 fleets, 35 aircraft, and 51 airports.  transformed the aircraft recovery problem into a vehicle routing problem with time window modeling. The formulation considers aircraft recovery and passenger delivery. In the model, aircraft are vehicles, passengers are commodities and airports are nodes. Each aircraft rotation is considered a route. The model only considers aircraft ferrying and departure delays as recovery options, while in reality more options are available. The problem is solved with a genetic algorithm that is tested on a small network from a regional Chinese airline. For three different disruption scenarios the GA solved within 100 s. Zhu et al. (2015) presented a two-stage stochastic recovery model to deal with the ARP. The first stage is a resource assignment model to minimizing delay and cancellation cost. The second stage re-times the aircraft routings obtained in the first stage, with the objective of minimizing the expected cost on the resource strategy of the first stage plan due to uncertainty of aircraft recovery time. The authors use a stochastic algorithm framework combining Greedy Simulated Annealing (GSA) and a simple re-timing strategy. Based on different scenarios of restoration time, the second stage model can be decoupled as several linear models.

(Meta-) heuristics
In the same year, Sousa et al. (2015) presented a similar study using Ant Colony Optimization (ACO). The proposed algorithm combines the Aircraft Assignment Problem (AAP) with the ARP and aims to minimize the operational cost and (re-) schedules flights dynamically by using a rolling time window. Two different experiments, both using real data from a commercial airline, were conducted. On a problem with 100 flights, the ACO outperforms (non-truncated) branch & bound and Depth First Search (DFS) in terms of solution quality, although it takes 40% more time on average. Hu et al. (2017) presented a solution approach for solving a multi-objective recovery problem by combining e-constraints and neighborhood search methods. The e-constraints method is in charge of seeking the Pareto front for the multi-objective ARP and the neighborhood search algorithm is responsible for improving the locally feasible solutions of the ARP in each iteration of the e-constraints method. The problem includes three conflicting objectives, the first objective minimizes the total deviation from the original flight schedule, the second minimizes the maximum flight delay time, and the third objective minimizes the number of aircraft swapped. The methodology is tested on real-world empirical data for a Boeing 737 fleet consisting of 104 aircraft from a major Chinese airline covering 410 flights. The computation times range between 12 and 20 min, depending on the disruption instance. Zhang (2017) proposed to use feasible lines of flights (LOF) as the basic variables in the model, where LOFs are defined as a sequence of flights flown by one aircraft within one day. A twostage heuristic is presented to reduce the number of included LOFs, thereby reducing the run-time. In the first stage, LOFs are scored and selected based on the number of swaps (less is better) and the number of flight legs included in the LOF (more is better). In the second stage, flow balance constraints for the aircraft were aggregated by creating constraints for each airport only. The disruptions included in the model are airport closures and aircraft unavailability due to unplanned maintenance. The approach is tested on five real-life test scenarios. The largest instance included 44 aircraft and 638 flights, the computation time was 150 s. Šarčević et al. (2018) described a methodology where the artificial bee colony (ABC) algorithm presented by Karaboga (2005) was applied to the aircraft disruption problem. The proposed approach is implemented as part of the Aircraft Manager agent of the multiagent system MASDIMA developed by Castro et al. (2014). The system is tested on a month worth of real airline data, however, dimensions of the case study and required runtime are not given. Zhao and Tong (2018) presented a weight-table heuristic algorithm for the aircraft recovery problem. The authors only consider disruptions from airport closures due to bad weather conditions. All common disruption recovery options are considered, however, maintenance constraints are not included in the model. A single case study consisting of 6 aircraft and 31 flights. The computation times are not presented. Khaled et al. (2018) proposed a multi-objective integer linear programming problem for the tail assignment problem which minimizes the operating cost and the deviation from the original solution. The recovery problems focus on long-term disruptions (e.g. airport closures for significant periods of time or multi-day technical problems with aircraft), and the model does not include the possibility of delaying flights. The -constraint method is used as a Pareto frontier to generate multiple efficient solutions. The proposed model computes solutions in less than 30 s for the adapted test case involving 111 flights and 10 aircraft.

Hybrid heuristics
Gao et al. (2009) developed a greedy simulated annealing algorithm, combining characteristics of Greedy Randomized Adaptive Search Procedure (GRASP) and Simulated Annealing. The combination of heuristics improves the efficiency of the neighborhood selection and decreases the probability of local optima. The objective of the model is to minimize the total passenger delay time.
One drawback of the model is that the objective function does not take into account all cost incurred by irregular operations e.g. the cost of ferrying and fleet substitution is not taken into account. Eggenberg et al. (2010) extended the work of Bard et al. (2001) and presented a column generation algorithm where a time-band network model is used. Each unit (that is, a plane, a crew member or a passenger) is associated with a specific recovery network and the model considers unit-specific constraints. The column generation algorithm ensures global feasibility according to the structural constraints of the problem. The usual multi-commodity approach struggles with considering unit-specific constraints, which the authors overcome with the proposed solution. While the result presented in Table 1 seems promising and the majority of instances solve within 100 s, the authors report that for the most computationally expensive case the run time exceeds 1 h. The case instances are tested with a single fleet type.
A hybrid heuristic was also used by Xiuli and Zhao (2012), who combined a Greedy Random Adaptive Search Procedure (GRASP) with Ant Colony Optimization (ACO). Compared to the original GRASP algorithm, it provides a high global optimization capability. The authors state that the model was tested on a multi-fleet network with 50 aircraft and more than 5 aircraft types. However, no results are presented.
Whereas other researchers validated models with a static disruption scenario, Vos and Santos (2015) established a dynamic framework, named Disruption Set Solver (DSS) for the aircraft schedule recovery. The framework handles disruptions as they happen and builds on the solutions of previous disruptions. The framework relies on the combined usage of an efficient aircraft selection algorithm and a linear-programming model which can track the status of individual aircraft on parallel time-space networks. The framework is applied to a set of real disruptive days in the operation of Kenya Airways. In 93.3% of the times, the DSS found solutions within 10 min. Furthermore, the authors showed that the solution costs are underestimated when computed using a static approach.  were the first to adopt the iterative fixed-point method for integer programming (presented by Dang and Chuangyin (2015)) for the construction of feasible flight routes. Two methods are presented to divide the solution space into independent segments and solve them with distributed computation. Since the segments are independent, the calculation of integer points can proceed parallel on each processor. The first method attempts to divide the solution space into segments that contain roughly equal integer points. For long haul problems, another division method is proposed where the original flight routes are taken as initial points. The algorithm is compared to the solutions obtained using a commercial LP solver. In the majority of cases, the number of partial feasible flight lines, which have to be calculated for finding an optimized airplane reschedule, is much fewer compared with the number needed by LP solver. This makes the method a promising alternative to further develop in the future.  extended the work by considering multiple fleets, while  focused on disruptions caused by airport closures. Liang et al. (2018) developed a framework where a master problem was used to select routes and subproblems were used to generate routes. Airport capacity constraints are explicitly considered in the master problem while maintenance constraints are considered in the subproblems. In the suggested framework, aircraft are allowed to swap their planned maintenance, if all constraints regarding maximum flying hours, the maximum number of takeoffs/landings, etc. are satisfied. The approach is based on a column generation framework. The proposed framework is validated and tested on eight real-world scenarios, which are based on the scenarios used as benchmark problems for the Airline Operations Research Competition organized by Sabre Airline Solutions (2016). For all scenarios, a solution was found in less than 357 s. Moreover, the authors modeled flight delay as continuous instead of discrete intervals. A comparison is presented where the continuous flight delay solutions are compared to discrete flight delay solutions where there is a 30-min interval between flight delay options. The authors show that the continuous flight delay results in lower disruption cost, however a comparison with different delay interval times is not presented.

Other methods
Given the inherent uncertainty of ARP, several authors presented (partially) stochastic approaches. Arias et al. (2015) combined constraint programming with a simulation approach to solve the Stochastic Aircraft Recovery Problem. The goals of the model are to restore the original flight schedule as much as possible, minimizing the total flight delay and the number of canceled flights. The robustness of the solutions is assessed by comparing the standard deviation from the simulation results with the variation of the probability distribution that was used for generating the stochastic delays and the expected propagation. The proposed model is tested with real data from a commercial airline with a total of 51 flights, 13 airports, and 11 aircraft. The proposed model can match the optimal solution in 14 cases out of 20. According to the authors, the results suggest that the inherent uncertainty of the ARP makes it a suitable candidate for combining simulation and optimization methods. Guimarans et al. (2015) described a methodology for the Stochastic Aircraft Recovery Problem (SARP), which considers the stochastic nature of air transportation systems. The methodology is based on the Large Neighbourhood Search metaheuristic, combined with a simulation run at different stages to ensure robustness. A Constraint Programming formulation is developed to solve the deterministic ARP. Flight cancellations are not considered as a recovery option, however, aircraft may be ferried. The proposed methodology was tested on several instances with different characteristics, some of which were obtained from real data provided by a Spanish airline. The stochastic recovery problem was also considered in a recent paper by Lee et al. (2020). The authors propose an innovative reactive and proactive approach to solve the ARP problem. By forecasting systematic delays at hub airports, their study optimizes recovery actions that respond to both realized disruptions and anticipated future disruptions. The authors combine a stochastic queuing model to capture airport congestion, with a commercial flight planning tool, and with a dynamic integer programming solution to model the disruption recovery. A solution based on a look-ahead approximation and sample average approximation is proposed to solve the modeling framework.
In recent years, a few papers have been published where simulation-based approaches have been used to solve the ARP.  and Rhodes-Leader et al., 2019 combined a symbiotic simulation system. That is, a simulation approach that combines a high-fidelity simulation model and a low-fidelity physical model work together for the benefit of both models (Aydt et al., 2008). In their case, the authors propose an adapted version of the integer programming (IP) model presented by Zhang et al. (2015) to reduce the complexity of the solution space considered for the simulation model. The IP model generates a set of good solutions that are then used as initial solutions in the simulation model to guarantee a faster and effective high-fidelity simulation system. Table 1 shows the overview and classification of the discussed literature regarding the ARP including case dimensions and CPU times.

Discussion
The complexity of this problem is evident from the fact that only three papers have adopted an exact method and over 80% of publications use heuristics methods to solve the aircraft recovery problem. Still, several relevant advances have been observed in the last decade in terms of the computational efficiency of the solutions proposed. In fact, several authors claim to solve (quasi-) realworld problems in about one minute or less (Gao et al., 2009;Eggenberg et al., 2010;Sousa et al., 2015). Unfortunately, they only consider a single fleet, which does not represent the reality at most airlines. Most other papers do not consider all recovery options common at airlines or do not take maintenance constraints into account, thereby simplifying the problem. Xiuli and Zhao (2012) considered all recovery options, maintenance constraints and includes multiple fleets. However, it does not present the number of flights in the case study nor the computation times. In the majority of papers, the delay costs are calculated by using constants to express the average delay cost per minute. Similarly, a constant parameter is used to express the average cancellation cost of a flight. This approach usually underestimates the cost, due to the non-linear relation between goodwill loss and the amount of delay . In the last years, several authors have proposed a simulation-based approach to solve this recovery problem. However, computational times are usually omitted from the discussion.

Crew recovery
The crew recovery problem (CRP) can be formulated as follows: given a flight schedule and a set of disruptions, re-assign to each (recovered) flight the necessary cabin and flight crew such that the disruption costs are minimized. For crew recovery, these disruption costs can include direct crew costs (e.g., remuneration or overtime compensation) and cost for deadheading crew. For studies that include flight cancellation as a recovery action, cancellation costs can be included in case a flight cannot be staffed. Alternatively, some authors opt to use minimizing the number of crew schedule changes as a proxy to the minimization of the crew recovery costs. The CRP is typically the second problem that is solved in the sequential solution approach. It is considered harder than the ARP since all regulations and restrictions dictated by government regulations, union agreements and airline-specific policies have to be taken into account. As shown in Table 2, in the period 2009-2018, there have been six publications on the CRP.
Most studies addressing crew recovery only consider a single disruption type, such as flight delays (Novianingsih et al., 2015) or crew unavailability (Castro and Oliveira, 2009). Only two studies considered both disruption types (Liu et al., 2013;Zhu et al., 2014). Interestingly, that only Castro and Oliveira (2009) and Chen and Chiu Hung (2017) considered crew unavailability as disruptions. From the 6 papers reviewed, most (83%) considered crew deadheading as a recovery action while 67% included crew swaps. Only half considered flight cancellations as a recovery action. Castro and Oliveira (2009) only considered flight delays and did not consider crew deadheading or crew swaps as a recovery action.
A common limitation of studies that focus on the Crew Recovery Problem is that only flight crew are considered and cabin crew are always assumed to be available. While flight crew is generally the more constraining resource of the two, cabin crew availability will limit recovery options in reality.

Multi-agent systems
Castro and Oliveira (2007) and Castro and Oliveira (2009) were the first to use Multi-Agent Systems (MAS) to represent the Airline Operations Control Center (AOCC) as an organization of agents. In these papers, the authors present a Distributed MAS for integrated disruption management. However, the authors only discuss the application of their modeling framework to a crew recovery case study. The MAS model for integrated recovery is discussed by the authors in later papers, included in Section 2.7. The MAS has several specialized agents that compete to find the best solution for each subproblem. Besides operational cost, the authors introduced a process of quantifying quality cost, which represent the importance that different passengers give to flight delays. The authors solve the crew recovery problem from a real airline, although no case dimensions (e.g., regarding the number of crews) are given.

(Meta-) heuristics
Chang (2012) developed a genetic algorithm (GA) to solve the pilot recovery problem. The GA uses the original in-feasible schedule as input and solves the problem while considering maximum flying hours and minimum rest time constraints per day (8-in-24 h rule) and per week (32 h in 7 days rule). The objectoriented matrix chromosome structure is introduced by the author, where each row consists of CHROMOHEADS which correspond to a pilot and each column consists of CHROMOCELLS which correspond to the flights assigned to that pilot. The mutation rate for the GA equals the sum of the violated hard constraints divided by the number of hard constraints multiplied by the number of cells in a chromosome. The GA was implemented to reach the optimal recovery schedule in a short time. For a problem consisting of 668 flights, 70 crews and a recovery period of 18 days, the algorithm takes approximately 10 min.
Intrafleet and interfleet models for the solution of crew recovery problems were developed by Liu et al. (2013). Both models are set covering problems, where the former is a 0-1 set covering problem and the latter is a general set covering problem. Various solution approaches are discussed, and a simulated annealing algorithm is developed for models that are difficult to solve. Regulations are taken into account by only considering legal crew pairings. To limit the problem size, the time window was set to 24 h and a maximum of 6 crews were considered per missed connection. The results show that although widely used in practice, the intrafleet model can lead to inferior solutions since it limits to solution space. The objective of the algorithm was to cover all flights, so costs were not considered. On average, the interfleet model reduces the objective function by 40%. Novianingsih et al. (2015) presented a custom three-stage solution method. First, all possibilities for crew swaps are identified and executed if possible. Second, if swaps are not possible, a heuristic is used to construct new crew schedules. Third, the solution is then improved by applying an improvement procedure. The model was tested on a one day network of 214 flights covered by 48 crew pairings. Regulations regarding flying hours were incorporated by only considering legal pairings. Based on the results, the authors assume that their method can solve the crew scheduling problem in polynomial time.
Chen and Chiu Hung (2017) proposed an evolutionary approach for optimizing crew roster recovery problems with rosters for Castro and Oliveira     multi-day flight duties. First, crew roster recovery problems are formulated as combinational optimization problems with multiple objectives and constraints. Second, a variant of the non-dominated sorting genetic algorithm II method is used to explore Pareto solutions. The study only considers crew unavailability disruptions and crew deadheading and crew swaps as recovery options. As a result, it is assumed that the flight schedule will never change. The approach is tested on real-world bi-weekly pairings, in which there are 270 pairings and 1048 flights. The execution time of the recovery algorithm is approximately 18 min. Zhu et al. (2014) proposed a constraint programming model where an algorithm based on sequential, least slack, and greedy principles were designed to search the solution space. The objective was to minimize the total recovery cost and the temporalspatial requirements, deadheading, and time legalities (8-in-24 h rule) were considered as constraints. The model does not require the crew to be back at their base at the end of the time window. The paper focuses on a two-pilot flight crew with a one day recovery time window. To reduce the deviations from the original schedule, the authors added a search rule to the algorithm which assigns the original crew to execute flights. A case study shows that the proposed method is feasible for solving the crew re-scheduling problem. Since legal requirements become more complicated and challenging with longer time windows, the authors mention that it would be interesting to see how the efficiency of the model develops on a larger network with severe irregularities. Table 2 shows the overview and classification of the discussed literature regarding the CRP including case dimensions and computational times. There has been much less attention to this problem than to the ARP. The reason for this could be the complexity of the problem, compared to the ARP, given the several regulation constraints that have to be considered when managing crew. This fact is also observed by the fact that no research considered exact methods and, still, the computation times are considerably larger for these eight papers than for the most promising ARP works.

Passenger recovery
Arguably, passenger recovery is the most relevant problem for airline disruption management since high passenger delay cost and continuous flight disruptions will lead to a potential loss of goodwill and long-term reputation damage. Passenger recovery can be formulated as follows: given a recovered flight and crew schedule and a set of disrupted passenger itineraries, re-assign to each disrupted itinerary the (recovered) flights necessary (given seat availability) to accommodate passengers from their current position to their destination while minimizing cost. These passenger recovery costs can include both hard and soft costs. Hard costs are directly incurred when a passenger cannot complete its scheduled itinerary (e.g., compensation for delay and cancellation as stipulated by government regulations). Soft costs are the potential losses of future revenue as a result of passenger inconvenience, possibly causing the passenger to switch to a different airline in the future. These costs are approximations made by the airline and can differ per passenger class or frequent flyer status. Alternatively, these passenger disruption costs are minimized by minimizing the total number of passenger delay minutes.
For the soft cost, nearly all papers that focus on Passenger Recovery (either stand-alone or in combination with Aircraft and/or Crew Recovery) assume linear delay costs -i.e., a 2-h delay is twice the cost of a 1-h delay. Cook et al. (2012) studied the inconvenience experienced by passengers as a function of delay duration. The study has shown that the delay cost as a function of delay duration can be represented as a sigmoid function. Studies that incorporate such a relation generally use a piece-wise linear relation for delay costs, if they seek to prevent a nonlinear recovery model.
As shown in Table 3, in the period 2009-2020, there has been one single publication simply addressing the passenger recovery problem as a stand-alone recovery problem. In that work, McCarty and Cohn (2018) presented a two-stage stochastic to deal with the rerouting of passengers, re-accommodating passengers as soon as a delay is known and before the length of the delay is realized. In the first stage, passengers are preemptively assigned to new itineraries as soon as it is known that a flight will be delayed and in anticipation of the delay's impact. The second stage further modifies itineraries for passengers who miss connections after the delay has been realized. Benders decomposition is used to solve the problem within reasonable computation times. The presented method is tested on a case study using a real-life flight schedule with 15 generated delay variations of a single flight. The case study consists of 1144 flights and in the different test instances, there are 50, 100, or 200 passengers on the delayed flight. For the 15 test instances, the final destination of each passenger on the delayed flight is randomly selected. On average, all test instances were solved within 115 s.

Aircraft and passenger recovery
As mentioned in Section 1, there has been a trend towards integrating more than one resource in recovery models. Sequential optimization approaches do not fully capture the interdependencies between aircraft, crew, and passengers and therefore usually result in sub-optimal recovery solutions. The papers in this section attempt to overcome these downsides by simultaneously solving the aircraft and passenger recovery. The overview of the papers addressing both aircraft and passenger disruptions is presented in Table 4.
Of the studies addressing aircraft and passenger recovery the majority considers aircraft unavailabilities (81%) and flight delays (75%) as disruption types, less than half consider airport disruptions (44%). From the 16 papers reviewed, all considered flight delays as a recovery action and the majority of papers considered flight cancellations (88%), aircraft swaps (94%), and/or passenger itinerary changes (88%) as well. This means that two studies (Santos et al., 2017; do not explicitly model passengers and their itinerary recovery. Hu et al. (2015) presented an integrated integer programming model based on an approximated reduced time-band network and a passenger transiting relationship. The authors extend their earlier work to model multi-fleet aircraft routing. The objective is to minimize the total cost associated with the reassignment of aircraft and passengers to flights. One assumption the authors make is that all passenger itineraries are comprised of a single flight leg. A feasibility study is conducted to find the conditions under which aircraft and passenger recovery are possible. The authors test the model on 10 scenarios with real data of a Chinese airline with over 180 aircraft, 113 fleets, and over 620 flights. All scenarios take less than 172 s to solve with a maximum optimality gap of 8.74% compared to the LP relaxation.

Exact optimization methods
Using a mixed-integer non-linear programming model, Arikan et al. (2016) modeled the aircraft recovery problem and the passenger recovery problem. The authors employ several recovery actions such as re-timing departures, canceling passenger itineraries, and flight planning (cruise speed control). The goal of the model was to minimize passenger related costs and fuel costs.
Due to the non-linearity of the cost associated with fuel consumption, an LP model is no longer applicable. However, the authors reformulate the non-linear model as a conic quadratic mixedinteger programming model, similar to Aktürk and Atamtürk (2014). The authors used a time-space network representation to model the aircraft and passenger itineraries. The paper shows the impact of cruise speed control on the airline disruption problem and the ability to reduce cost, showing that cruise speed control is a feasible recovery technique. In a later paper , the authors mentioned that the proposed formulation is not flexible, such that it cannot be extended (easily) with other entity types, such as aircraft crew and passengers, and recovery actions. In the same paper, the authors propose a more generalized network structure, which will be discussed in Section 2.7.
Recently,  extended the set of traditional recovery actions by considering flight planning. The same timespace network representation from Bratu and Stephane (2006) is utilized. Departure time decisions are incorporated by creating copies of flight arcs, while the cruise speed control alternatives are incorporated by generating a second set of flight copies for different cruise speed alternatives for each departure time alternative. This approach requires a discretization of the cruise speed options and increases the size of the generated network. Due to the intractability of the original formulation, the authors propose an approximation model that deals with larger airline networks. The model is steered away from solutions that would result in passenger disruptions, by explicitly assigning costs to avoid delaying flights that carry connecting passengers. A case study was performed on data from a major European airline with about 250 daily flights in a hub-and-spoke network. The computation time is limited to 120 s. Based on the airlines' historical data, 60 scenarios are considered. The authors conclude that their enhanced recovery models reduce total costs and passenger-related delay costs for the airline, compared to existing approaches. Santos et al. (2017) presented an integer linear programming model that incorporates airport limitations in terms of bay availability, taxiway capacity, and runway separation. The objective is to minimize fuel costs as well as passenger compensation and inconvenience cost. A rolling horizon is used to decrease computation times. The model is tested on a network of Kenya Airways, an international hub-and-spoke carrier. For the case study the flight schedule of 8 days, consisting of 250 flights, was considered. A full day of operations is solved in less than 60 min.

(Meta-) heuristics
In 2009, the French Operational Research and Decision Support Society (ROADEF) organized an OR challenge regarding disruption management for commercial aviation, which was proposed by Amadeus. This challenge resulted in several publications. Bisaillon et al. (2011) formulated a large neighborhood search (LNS) heuristic that combined fleet assignment, aircraft routing, and passenger assignment. The heuristic cycles through three phases: construction, repair, and improvement. These phases destroy and repair parts of the solution in iteratively. The model constructs aircraft routes and passenger itineraries for the recovery period to minimize operating cost and impact on passengers. The first two phases produce the initial solution while taking into account the operational and functional constraints. The third phase considers large schedule changes and tries to improve the solution while maintaining feasibility. This work won the ROADEF 2009 challenge. Sinclair et al. (2014) improved the work of Bisaillon et al. (2011) by making changes in each of the three phases, to find better final solutions. In the construction phase, the aircraft that caused the highest cost when canceled were prioritized. In the repair phase, the focus was on re-booking passengers with disrupted itineraries as well as covering flights that were canceled in the construction phase with spare aircraft. In the improvement phase, the authors attempt to accommodate disrupted passengers by delaying flights. The improved model was tested on the ROADEF 2009 dataset. The algorithm found 17 best solutions for 22 instances in five minutes and 21 best solutions in 10 min.
The experiments of Zegordi and Hessameddin (2010) showed that their ACO algorithm can build a revised schedule in less than 26 s for the same problem described in JJafari and Niloofarafari and Niloofar (2010). According to the authors, the method was implemented at an airline. The algorithm does not consider scenarios where aircraft from different flight rotations recover each other, thereby limiting the solution space. Jozefowiez et al. (2013) presented a three-phase heuristic. In the first phase, the disruptions are integrated in the schedule. Each disruption is solved by a separate algorithm, flight legs are removed and passenger itineraries are canceled to return a feasible solution. The second phase attempts to re-assign disrupted passengers with the same origin and destination to itineraries, using a shortest path algorithm. In the third phase, new flight legs are added to the schedule in an attempt to recover the remaining disrupted passengers. Passengers are grouped by itinerary and based on the size of the group a prioritization is made. This work was also one of the finalists of the ROADEF 2009 Challenge. Although it did not perform as well as Bisaillon et al. (2011), the algorithm did not keep iterating the full 10 min but reached a feasible solution for all cases in less than 4 min. Zhang et al. (2016) developed a three-stage sequential heuristic framework to solve the integrated aircraft and passenger recovery problem. In the first stage, the flight schedules and aircraft rotations are recovered. The next two steps iteratively solve the flight rescheduling problem and the passenger recovery problem. A time-space network representation is used together with a mixed-integer programming formulation of the model. The proposed algorithm is tested based on the same data sets used by the ROADEF 2009 challenge. The algorithm can beat the finalists of the challenge on all datasets. Hu et al. (2016) proposed a mathematical model based on the flight connection network and the passenger reassignment relationship. To solve the problem, a heuristic based on a Greedy Randomized Adaptive Search Procedure (GRASP) is adopted. The heuristic is tested through experiments based on generated and real datasets. For all test instances, a solution was found within 100 s. The authors compare the results of the heuristic to a sequential solution approach and show that their heuristic is able to find higher quality solutions. However, the solution costs are not compared to a global optimum, so the (near-) optimality of solutions is not presented.
In a recent paper, Yang and Tianshun (2019) presented a multiobjective genetic algorithm to solve the aircraft and passengers' recovery problems. The authors considered passenger preferences when accessing the options of accepting an itinerary change or demanding the ticket refund. The objectives considered were the minimization of the costs incurred by the airline and the minimization of the utility loss experienced by the passengers. The authors study the effectiveness and efficiency of the algorithm proposed with a couple of case studies. Although, the effectiveness is clearly demonstrated, the authors conclude that the efficiency of that algorithm decreases as the number of delayed aircraft increases.

Hybrid heuristics
JJafari and Niloofarafari and Niloofar (2010) presented an assignment model for solving the aircraft recovery problem and reassigning disrupted passengers simultaneously, using sequential recovery stages within the time window. The objective is to minimize the sum of aircraft assignment costs, delay costs, cancellation costs, and disrupted passenger costs. The proposed approach uti-lizes a wide range of recovery actions. The model used aircraft rotations and passenger itineraries instead of flights. The study did not consider maintenance constraints. Jafari and Niloofar (2011) extended the work. Due to the high complexity of the algorithm, the method was only tested on disruptions with 13 aircraft of 2 fleet types. The authors do not demonstrate that the method is computationally efficient, nor do they show that the model can deal with disruptions that reflect operations of a larger airline. Zegordi and Hessameddin (2010) solved the same problem with an ACO heuristic, which was discussed in the previous section. Mansi et al. (2012) proposed a heuristic based on exact methods and an oscillation strategy. In the first phase, the heuristic solves a relaxation of the problem to find a feasible solution for aircraft and passengers close to the initial schedule. If no feasible solution is obtained, a dynamic programming algorithm refines the alternatives and generates a feasible solution. In the second phase, the oscillation strategy alternatively destroys and constructs aircraft routes and passenger itineraries and assigns them to aircraft and passengers simultaneously. This work received the second prize in the challenge.  extended on the work in Sinclair et al. (2014) and Bisaillon et al. (2011) by presenting a postoptimization column generation heuristic that reduces the model size to improve solutions within reasonable run-times. By defining dual variables after solving the LP relaxation, the reduced costs of the variables are calculated. The variables with negative reduces cost are considered when resolving the LP problem. The model was tested on the ROADEF 2009 Challenge dataset and found best known solutions to all scenarios. The authors suggest future research should focus on implementing a rolling-time horizon with the column-generating algorithm. Vink et al. (2020) extended the work from Vos and Santos (2015) by considering passengers' itineraries and aircraft maintenance requirements when solving the ARP. The authors modeled passengers' delay costs by precomputing a delay cost matrix for both direct and connecting passengers. Maintenance constraints are directly considered and parallel-time space networks are used to track the route of each aircraft. The problem is formulated as a MILP problem that is dynamically solved. That is, a recovery solution is produced every time new information about disruptions is made available. The authors claim that to make such an operation tool a solution has to be found within 2 min. To cope with this requirement, the authors propose a selection algorithm, which iteratively solves the MILP by considering selections of sub-sets of the fleet. The selection algorithm proves to be efficient, providing an initial solution within a couple of seconds and producing a near-optimal solution within 22 s on average. Table 4 presents an overview of the papers discussed in this section. As can be seen, 14 papers were published between 2009 and 2020. From these, 43% used exact optimization solution methods while the remaining 57% used heuristic methods. It is important to refer that six of these 14 papers use the dataset provided in the ROADEF challenge, showing the impact of this challenge in the literature.

Discussion
The ACO approach by Zegordi and Hessameddin (2010) seems promising since it considers all relevant recovery options and maintenance constraints while still managing to solve a real-life case in 26 s with 61% optimality gap. It is unknown how the computation times scale with problem size. Another promising paper is the work by Hu et al. (2015), Hu et al. (2016). The authors are able to solve several real-life instances in under 100 s with small optimality gaps. Unfortunately, the authors do not currently consider maintenance constraints. Recently, Vink et al. (2020) proposed another interesting approach. The authors discuss an operational tool, solving the disruption problems in a few seconds while explicitly considering connecting passengers' delay costs.
Finally, it is interesting to observe that two recent papers ; Arikan et al. (2016)) considered changing flights' cruise speeds as a recovery option. This functionality increases the computational complexity of the models presented but it reflects the option available to the airlines to change their flight times to recover from disruptions.

Aircraft and crew recovery
In this section, publications on simultaneous aircraft and crew recovery will be discussed. To the best of the authors' knowledge, there are no pre-2009 papers that present solutions on the combined aircraft and crew recovery. Aguiar et al. (2013) were the first to suggest a solution method for this problem, as will be discussed below. In fact, Table 5 shows that only four papers covering both aircraft and crew recovery were published until today. None of the papers addressing aircraft and crew recovery consider all common disruption types (flight delays, aircraft unavailabilities and airport disruptions) in their models and case studies. Aguiar et al. (2013) was the only study that considers aircraft unavailabilities. Furthermore, none of the studies consider crew unavailabilities as a disruption. From the 3 papers reviewed, all considered flight delays and cancellations as a recovery action. Maher (2016) did not consider aircraft swaps as a recovery action, while Zhang et al. (2015) was the only study that regards utilizing reserve aircraft as a possible recovery action. None of the studies considers reserve crew as a recovery action.

(Meta-) heuristics
Le and Mei Long (2013) extended the work presented in  to include crew recovery. As in the previous work, the authors use flight strings to represent a sequence of flights. An iterative tree growing algorithm with nodes combination method is proposed to speed up the computational time. The authors consider maintenance requirements and pilot union regulations. A case study using data from a Chinese airline is presented.
In the same year, Aguiar et al. (2013) used and compared several different meta-heuristics such as hill-climbing, simulated annealing, and genetic algorithm to solve the aircraft and crew recovery problem. For the aircraft recovery, a multi-objective approach was developed. Hill-climbing, simulated annealing, and genetic algorithm were used to solve the ARP. The genetic algorithm outperformed the other heuristics, although all heuristics performed well. The solution of the ARP serves as the input for the crew connecting problem. To solve the CRP, hill-climbing and simulated annealing algorithms were developed and tested on data from TAP Portugal. For the crew connecting problem, the simulated annealing algorithm performed best in terms of crew cost. None of the results are compared with the global optimum, so although feasible solutions are given, the quality of those solutions cannot be determined. Zhang et al. (2015) proposed a two-stage heuristic for the integrated aircraft and crew recovery problem. In the first stage, the aircraft recovery with partial crew considerations model is built. This model is based on the traditional multi-commodity network model for the aircraft schedule recovery problem. In the second stage, the crew schedule recovery with partial aircraft consideration model is built. The authors propose a new multi-commodity model for the crew schedule recovery. The two stages are run iteratively until no improvement is found. The proposed algorithm is compared to the integrated model of Abdelghany et al. (2008) and a sequential algorithm. The algorithm can improve the solutions of the other two algorithms for all scenarios. Although the algorithm had a higher run-time, it never exceeds 72 s.

Hybrid heuristics
Maher (2016) proposed a column-and-row generation framework that extends the existing branch & price (B&P) models and reduces the problem size. The model employs departure delays and cancellations as recovery techniques. The proposed model is compared to a column generation model. On average, the column-and-row generation model had a 27% lower run-time.
The authors tested the model on both a point-to-point and a hub-and-spoke network with 262 and 442 flights respectively. Table 5 shows the overview and classification of the papers regarding aircraft and crew recovery. It is curious to observe that the integration of these two resources has not received much attention, despite the resources are closely related for several airlines. Nevertheless, the few works published present a comprehensive analysis of the problem. Le and Mei Long (2013) and Aguiar et al. (2013) included all common recovery options and maintenance constraints in the model formulation. While Zhang et al. (2015) also included all recovery options and maintenance constraints. No computational times are presented by Le and Mei Long (2013). But both other papers seem to produce a recovery solution in a few seconds. Unfortunately, both papers do not present a comparison with the global optimum, therefore the solution quality of the heuristics cannot be assessed. Maher (2016) considered the generation of new crew duties as a recovery solution but did not consider a multiple fleet formulation of the model. For the hub-and-spoke formulation the solutions were found in 2-15 min for all scenarios.

Integrated recovery
Both from a mathematical and computational perspective, the integration of all recovery stages (aircraft, crew, and passengers) is a difficult task. The purpose of this integration is to minimize the total disruption cost. This is achieved by weighing the disruption cost related to aircraft, crew, and passengers simultaneously to find the recovery solution that overall results in the lowest cost for the airline. To the best of the authors' knowledge, the first proposal of a truly integrated approach was the PhD Thesis of Lettovsky (1997), where the author formulated the 'Airline Integrated Recovery' problem which consists of aircraft routing, crew assignment, and passenger flow. The thesis presents a linear mixed-integer mathematical problem that captures the availability of the aforementioned resources. A decomposition scheme is presented where the 'Schedule Recovery Model' master problem controls the three sub-problems known as the 'Aircraft recovery model', 'Crew recovery model', and 'Passenger flow model'. The solution is derived by applying Benders' decomposition. A limitation is that the model only considers the cockpit crew and not cabin crew. This subsection will present the fully integrated recovery papers between 2009-2018, with the overview presented in Table 6.
Almost all papers addressing integrated recovery consider airport disruptions. Other typical disruption sources considered are flight delays (50%) and aircraft unavailability (30%). Castro et al. (2014) is the only work to consider crew unavailability as a disruption source. In terms of recovery action, except for Ogunsina et al. (2019), all papers reviewed considered flight delays, cancellations, aircraft swaps, crew deadheading and passenger itinerary changes as recovery actions. Most also considered crew swaps (80%) and reserve crew (80%).  can be considered the most complete in terms of disruption types handled and recovery actions included since it considers all disruption types but crew unavailability and in addition to all recovery actions mentioned above also includes reserve aircraft, aircraft ferrying, and cruise speed control.    developed a new flight network representation for the integrated recovery problem, based on the flow of each entity (aircraft, crew, and passenger) through the network. With the proposed flight network, the problem size is kept within limits so that real-time solutions can be provided since it does not require discretization of departure times and cruise speed decisions. Similar to Aktürk and Atamtürk (2014), the authors implemented aircraft cruise speed control and proposed a conic quadratic mixed-integer programming formulation. The model explicitly models passengers, thereby evaluating the passenger delay costs more realistically. The authors test the model on a network of a major U.S. airline. The effect of the the pre-processing methods, the cruise speed control, the passenger delay function, the severity of the disruptions, and the length of the recovery horizon on the optimality gap and run-time are evaluated.

(Meta-) heuristics
Zhu et al. (2016) proposed a model for the integrated recovery problem based on a sampling-based algorithmic framework. The recovery process is divided into two parts. In the first part, a multi-stage IP model reconstructs the flight schedule and fleet assignment. The second part creates IP models for crew schedule recovery and passenger re-accommodation. All feasible reconstruction solutions in the current time period are obtained by relaxing crew and passenger constraints. By optimizing the crew recovery and passenger re-accommodation heuristically based on random samples of the reconstruction solutions for future time stages, the upper and lower bound of each solution is estimated. The algorithm is tested on the flight network of a Chinese airline with 250 flight legs, 65 aircraft in six families, and 85 crews.  presented an integrated optimization approach that resembled the one used by Lettovsky (1997), where they distinguish between four different phases: schedule recovery, aircraft rotations, crew assignment, and passenger assignment. In the first phase, the schedule is repaired by flying, canceling, delaying or diverting flights. Then, in the second phase, aircraft are assigned to the new schedule. Third, the crew is assigned to the aircraft rotations. In the last phase, the passenger recovery ensures that all passengers arrive at their final destination. The authors tested the model with data from a regional US carrier that operates a hub-and-spoke network with 800 daily flights. The results of the proposed integrated model are compared to the results of a sequential approach. Where the proposed approach always finds a feasible solution, the sequential model only finds a feasible solution in 75% of the cases. The results show that the costs of the integrated approach are always equal to or lower than the cost of the sequential approach. The computation time of the proposed solution ranges between 20-30 min. Currently, the network is rebuilt with every disruption. The authors note that, by building the network in advance, the computation time could be reduced. Maher (2015) presented a column and row generation approach to solve the integrated recovery model. The framework is based on general column generation and Bender's decomposition, which improves the run-time and quality of the solution. Using the Big M method, costs are assigned to the objective function when disrupted passengers are not assigned to a flight that recovers the itinerary. By using the Big M method, infeasibilities due to conflicting constraints are prevented, while as many passengers as possible are recovered. Due to the integration of passengers, the runtime increases. Solution times range between 500 and 2700 s depending on the scenario. Petersen et al.

Multi-agent systems
Following their previous work, Castro et al. (2014) presented a 'Multi-Agent System for Disruption Management' (MASDIMA) and a related work analysis and comparison with MASDIMA. The proposed MAS is capable of autonomously monitoring the operations of the airline and deciding whether an event requires action or not. The MAS is adaptive to the environment and includes learning capabilities. Furthermore, the MASDIMA allows for human-inthe-loop inclusion, which improves user acceptance of the solutions by reacting and learning from user preferences. According to the authors, the main advantages of their approach are: generates integrated (i.e. that included all parts of the problem) and more balanced solutions (in terms of the objectives of each part).
In a recent paper, Ogunsina et al. (2019) proposed an automated learning approach to solve the integrated airline disruption problem. Although not presenting its implementation, the authors described the framework in which an agent uses a multidimensional Markov chain model to assess the propagation of disruptions. Based on this assessment, the agent recommends recovery solutions to a human controller that would then select one solution to be applied. The agent would learn from the selection made by the human controller to improve the automated generation of future recovery solutions. The paper also discusses two different methods for dimensionality reduction, that can be used for training in the data-driven agent-based approach proposed.

Discussion
The overview and classification of the discussed papers on integrated recovery can be found in Table 6. Maher (2015) does not include maintenance constraints and the model formulation does not allow for multiple fleet types. The model formulations of  and  are fit for use in an AOCC. Unfortunately, the computation times for the given case studies are too long for operational implementation.
The conclusion from this overview is that there are few papers considering the full integration of the three resources usually considered in the airline disruption management problem. In part, this comes from the difficulty of solving this integrated problem in a reasonable time. All studies presented in the literature declare computational times of several minutes, even for case studies smaller than most realistic problems.

Conclusion and directions for further research
In this paper, we reviewed the recent literature in the field of airline disruption management and recovery methodologies that can be used as decision-support solutions at the Airline Operations Control Centers (AOCCs). We identified the functionalities of the models presented and the characteristics of the largest case studies solved in the papers found in the literature in the last decade. We dedicated separated sections to the different recovery scopes and solution methods used. Papers presenting airline scheduled robustness and resilience were left out of this survey, despite the rich literature on the topic and the interest to airlines. The last literature review on the topic was presented by Clausen et al. (2010), so future works could address this gap and complement this paper with an overview of the solutions proposed in the literature to mitigate disruptions at the scheduling stage.
Two interesting trends have been observed in the literature review presented in this paper. The first is that in recent years more works present an integrated approach, explicitly modeling crew and/or passenger recovery as part of the aircraft recovery problem. This is a relevant development for the deployment of the proposed decision support tools in practice. Airline operation controllers require an integrated solution when solving disruption in practice. The second trend relates to the increasing number of functionalities included in the approaches proposed to better represent the realworld operational context. With the increase in computing power, several authors have included more detailed operational aspects in their models, such as the consideration of multiple aircraft fleets, the modeling of passengers' itineraries, or the introduction of cruise speed control as a recovery technique. This focus on the detailing of the functionalities of the models will increase the accuracy and added-value of the disruption models presented.
These two trends come, however, at expenses of higher computational requirements posing additional challenges in the development of very efficient solution techniques. This is particularly the case since most airline operations controllers demand operational disruption models to provide good solutions at the fleet level in one or two minutes (Vink et al., 2020). This is perhaps the major challenge for researchers working on the airline disruption management problem in the coming decade. A promising research line is the adoption of data-driven or machine learning (ML) techniques. ML techniques can either be considered as stand-alone an end-to-end solution technique (e.g., using reinforcement learning), as a support approach to provide additional information to an optimization algorithm (e.g., the ML can be used to explore the solution space or provide effective problem relaxations) or as solution technique alongside with traditional optimization techniques. For instance, future research could consider ML algorithms to approximate the lower bound of the optimization problem to create cutting planes that could tighten the relaxed version of the disruption management model. Another idea to leverage the use of ML algorithms is to have a supervised learning algorithm running alongside the optimization algorithm, either to improve the algorithm configuration or to help with the definition of promising neighborhoods considered by the optimization algorithm. A good example is the selection algorithm presented by Vink et al. (2020), which could be replaced by an ML algorithm. Please refer to Bengio et al. (2018) for a recent overview on ML approaches to solve combinatorial problems.
Moreover, one aspect that is emerging and will be a future trend is the consideration the dynamic nature of the disruption problem Vink et al., 2020). Future research, to be relevant for practical implementations, should solve the disruption problem the same way it is solved in practice -recovery solutions have to be found every time there is new information about disruptions and previous decisions can be revoked if it improves the solution and there is still time to change them. On one hand, such an approach will make the modeling approach more complex, requiring the adaptation of sequential decision models with lag decisions reconsidering previous actions. On the other hand, this can be used to break down the problem into sequential smaller problems, speeding up the computation times (but possibly compromising the optimality of the solutions found).
Another aspect that requires more attention is the consideration of proactive disruption management. That is, to anticipate future disruptions while solving current disruptions. In fact, with exception of the recent paper from. Lee et al. (2020), all research found on the topic of airline disruption management follows a reactive approach in which the flexibility to accommodate future disruptions in neglected. This could be a very interesting topic of research, appreciated by practitioners. This line of research can combine the use of robust scheduling techniques at the disruption mitigation stage with disruption management techniques at the control stage. Furthermore, it also opens the door to the implementation of data analytics to simulate systematic disruption and the use of reinforcement learning techniques to learn how to make optimal decisions when anticipating future consequences.
In addition to these findings, it important to highlight the impact that the ROADEF 2009 Challenge had to bust the interest in the airline disruption management topic. The publication of open data called the attention of many researchers. It is, therefore, important to promote the publication of research data to accelerate research in this domain and the use of existing research data for benchmark the solutions proposed.
The main conclusion that can be drawn from this literature review is that the airline disruption management problem is still a growing field of research. Several practical and methodological challenges can be identified and stimulate future research. Moreover, in a time of higher data accessibility and higher distributed computational power, the research opportunities will go beyond the traditional operations research domain and will include developments in the data and computer science domain.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.