An Integrated Problem of p-Hub Location and Revenue Management with Multiple Capacity Levels under Disruptions

This paper considers an integrated hub location and revenue management problem in which a set of capacities is available from which one can be chosen for each hub and the disruption is considered in a star-star shaped airline network.We propose a two-stage stochastic programming model to maximize the profit of the network in which the cost of installing the hubs at different levels of capacities, the transportation cost, and the revenue obtained by selling airline tickets are considered. To provide flexible solutions, a hybrid two-stage stochastic programming-robust optimization model is developed by putting relative emphasis on a weighted sum of profit maximization. Furthermore, a sample average approximation approach is used for solving the stochastic programming formulation and a genetic algorithm approach is applied for both formulations. Numerical experiments are conducted to verify the mathematical formulations and compare the performance of the used approaches.


Introduction
For the development of the world's airlines in 2017, strong demand for air passenger service drove the expansion of the airline network exceeding 20000 unique city pairs as well as 9% net profit growth in excess of $38.0 billion [1]. However, the growth of the capacity of the infrastructure failed to keep pace at a similar growth rate as air traffic demand continued to soar [2]. Therefore, scarce capacity in the airline network has become a major issue in the aviation industry.
The integration of hub location and revenue management could be a potential solution by effectively building and using the capacity to achieve competitive profit. In the present status of relevant research, hub location problem lies at the heart of network design planning in transportation systems [3]. This research area focuses on locating the hubs from a set of nodes, activating a set of links, and routing the capacities of the network while optimizing a cost-based objective function. In fact, all these capacities correspond to all the seats on the flights that are operated in this network. In revenue management, airline tickets are considered instead of the seats. Revenue management is a backbone of the airline business by controlling the passenger demand for capacitated and perishable tickets with multiple fare classes to maximize the revenue [4]. Capacity control, the core of revenue management, is for allocating the seats to different fare classes before the flight departs with the objective of maximizing the revenue [5]. This means that the installed capacities can be well allocated by a revenue management problem. The profit calculated can reoptimize the installed network by solving the hub network problem. Therefore, hub location problem and revenue management problem are closely related.
Building sustainable, flexible, and affordable infrastructure is also the key to resolve the crisis of inadequate capacity in the network. Reference [6] addresses a capacitated single allocation hub location problem and determines the installed capacities for each hub from a finite set of capacity levels with different set-up costs. This strategy can largely improve the utilization of the infrastructure and reduce traffic delays. Besides, the disruptions, from low-level disobedience of crew instruction to major disruptions involving aircraft 2 Journal of Advanced Transportation diversion, passenger deplanement, equipment failure, jet fuel supply reliability, digital disruption, and adverse weather, could degrade the capacity [1,7]. For example, airport authorities and regulators impose the restriction of the quota of available airport departure and arrival slots in adverse weather [8]. This could result in flight delay and profitability reduction. Hence, the impact of disruptions should be considered.
The uncertainty is another important factor to be considered while building the capacity. Stochastic programming is a widely used modeling paradigm to deal with uncertainty. However, the probability distribution function of the underlying stochastic parameters is hard to obtain. Robust optimization can be an effective alternative by computing over a convex uncertainty set of the parameters and bounding the maximum allowable deviation of the parameters from their nominal values to realize the worst-case optimization [9]. However, its result is more conservative. Therefore, [10] combines the benefits of both approaches and proposes a weighted sum of average and worst case profits to provide the flexibility by putting relative emphasis on average-case or worst-case value. Our paper also uses this idea to address uncertain demand.
Facing unbalanced growth between the capacity and the demand in the aviation industry, our research is motivated by the above solutions. Additionally, we present a two-stage stochastic programming formulation model to maximize the profit. In the first stage, the locations of hubs, the capacity levels for the hubs, the connections between the hubs and the non-hubs, the protection levels of the tickets with different fare classes can be decided. The booking limits of the tickets with different fare classes can be obtained in the second stage. Moreover, we also provide a hybrid two-stage stochastic programming-robust optimization model to maximize a weighted sum of the profit. To solve the problem, a genetic algorithm (GA) approach is applied. For the stochastic programming model, we also use a sample average approximation approach. Numerical experiments demonstrate the effectiveness of these two formulations.
Our contribution is providing a novel insight of solving unbalanced growth between the demand and the capacity in the airline industry. We present a two-stage stochastic programming to describe the interaction between hub location and revenue management and provide a flexible strategy for the infrastructure by considering a set of capacities available for each hub under disruptions. In addition, a hybrid two-stage stochastic programming-robust optimization is investigated to maximize a weighted sum of the profit. Our solutions can well tackle this issue.
The rest of this paper is organized as follows. Section 2 discusses relevant literature. Section 3 presents the integrated hub location and revenue management problem with multiple capacity levels under disruptions. Section 4 gives two mathematical models. Section 5 explains the research approaches in detail. Section 6 provides numerical experiments and analyzes the results. Section 7 discusses the conclusions.

An Integrated Hub Location and Revenue Management
Problem. Hub location problem is firstly proposed by [11]. This research focuses on the selection of a set of nodes to place hub facilities and the links of connecting the origins and the destinations via the hubs. Classical hub location problem mainly focuses on minimizing the cost under the assumption of satisfying all demands. However, from a profit point of view, some origin-destination ( & ) pairs are not necessary to be served if they are not profitable. As such, the hub location problem by considering the trade-off between the revenue and the cost to maximize the profit has been discussed. The obtained revenue can be classified as (1) from captured flows, e.g., [12][13][14][15]; (2) from an integration of pricing and hub location, e.g., [16,17]; and (3) from an integration of revenue management and hub location, e.g., [18,19]. In this paper, we address an integrated problem of hub location and revenue management.
The research of revenue management is introduced by Littlewood in 1972. Revenue management evolves from five streams: pricing, capacity control, overbooking, auctions, and forecasting [20]. Our study is relevant to capacity control. Early publications provide effective analysis and the approaches of revenue management on single flight leg ( [21,22]). Progress in the revenue management problem has been moved forward to larger network since the research of network revenue management was proposed by [23]. This stream can be classified by the network topology: on the one hand, for multiple legs without a specific network structure, some research ( [24,25]) decomposes multiple legs into many single-leg problems and then solves each single-leg revenue management problem. The research on the hub-and-spoke network also has received significant attention ( [26][27][28]) on the other hand. Reference [26] considers a network revenue management problem in a hub-and-spoke network consisting of a hub and spokes. Reference [27] studies the capacity allocation in a two-airline alliance under competitions in a combined hub-and-spoke network. Reference [28] addresses an airline revenue management problem with consideration of a competitor's behavior under simultaneous price and capacity competitions in a hub-and-spoke network. Reference [29] explores the impact of the structural properties in a hub-to-hub network revenue management problem. As can be seen from the above literature, we conclude that the network structure plays an important role in airline revenue management. For reviews on revenue management, we refer the readers to, e.g., [30,31].
Until [18], the integrated research of revenue management and hub location in the airline industry is proposed. In this research, they determine the booking limits for different fare classes and design an uncapacitated single allocation hub network with hub-stop and non-stop & pairs. They propose a two-stage stochastic programming formulation to maximize the profit, including the revenue from ticket sales and the cost of transportation and hub installation. In the first stage, the locations of the hubs and the central hub, the links between the hubs and the non-hubs, and the protection levels for different fare classes are determined. The booking limits of the tickets should be decided depending on the protection levels in the second stage. They propose a hybrid optimization method by combining GA with caching technique and exact solution. Compared to pure GA, their results outperform in computational time and high quality on the instances with up to 20 nodes. Reference [19] continues the integrated hub location and revenue management problem in a starstar hub network. They formulate a two-stage stochastic programming to maximize the profit. They propose two hybrid optimization methods consisting of ordinary GA and modified caching hybrid GA. Their results show the efficiency of their methods on the instances with up to 25 nodes. In our paper, we also consider the integrated research of the hub location problem and the revenue management problem. The above studies have explored the interaction between the hub location problem and the revenue management problem, but the discussion only focuses on an uncapacitated hub network.

Hub Location with Multiple Capacity Choices.
Hub location problem can be classified as capacitated and uncapacitated depending on whether the hub has limited capacity [32]. Complete review of hub location problem can be found in [3,33]. Our paper focuses on a capacitated hub location problem (CHLP), which is firstly proposed in [34]. In their hub network, all traffic can be routed via either one or two hubs and the hubs are fully interconnected. The direct connection is not allowed between the two non-hubs. They present a mixed integer linear programming (MILP) formulation with fewer variables and constraints to minimize the cost of transportation and installing the hubs. Beyond that, a variant of CHLP in which a set of capacities is assumed to be available for each hub is discussed by [6].
Earlier studies aim at a CHLP with multiple capacity choices for single or multiple allocation case. The diversity between single allocation and multiple allocation is whether each non-hub is linked to a single hub or more than one hub. Regarding the capacitated single allocation hub location problem (CSAHLP), reference [35] considers a CSAHLP with multiple capacity choices and single commodity. Four MILP formulations for this problem are presented by introducing new variables and modifying the formulations based on [34,36,37]. To identify a model which can solve the problem optimally, they compare the lower bounds provided by the linear relaxations of these formulations. Reference [38] introduces the balancing requirements into the research [6]. The balancing requirements indicate that the difference between the maximum and the minimum number of the spokes connected to the hub should not overpass a maximum value. They propose two MILP formulations to minimize the setup cost and the routing cost. Their results show that the superior model can be confirmed in terms of the lower bounds provided by the linear relaxations of these two formulations for the instances up to 50 nodes. Later, dynamic hub location problem with multiple capacity levels is introduced. Reference [39] considers multiple periods in a multi-capacity single allocation hub location problem. In their assumptions, the hub can be closed or resized and the non-hub can become a hub in each period. They present four quadratic MILP formulations including two flow-based and two path-based models. The objective is to minimize the cost of the connection from the non-hub to the hub, the transportation from one hub to another hub, the distribution from the hub to the non-hub, opening the hub, resizing the hub, and closing the hub. Their results show that the path-based formulations outperform the flowbased formulations for almost all instances. Reference [40] addresses a multi-period capacitated hub location problem for both single allocation and multiple allocation cases. In each time period, open new hubs and expand the capacity of existing hubs are allowed. They use MILP formulation to minimize the cost of transportation, operation on hubs and hub links, capacity establishment, and hub installation over the planning horizon. They also enhance the model by proposing valid inequalities. Their results can be solvable in reasonable time for the instances with up to 64 nodes and confirm the importance of time dimension and the multiperiod nature in hub network. A further step forward for the research on the multi-capacity hub location problem is to discuss the relevance with the level of congestion. Reference [41] considers a CHLP with service time and congestion for both single allocation and multiple allocation cases. They assume that service time includes travelling time through the network and handling time at hubs. The congestion of the hubs can lead to an increase in handling time. The direct connection is allowed between the two non-hubs. The MILP formulation with service time and congestion is used to minimize the cost of transportation and installing the hubs with different capacities. They also model the direct connection between the two non-hubs in multiple allocation case. They analyze the impact of different factors including service time limit, congestion factor, fixed cost, and the capacity. The above literature only discusses on building a multi-capacity hub network from a cost perspective. Their assumption is that all the nodes and the links should be served. However, if an & pair cannot generate a good profit, then there is not enough incentives to serve this & pair.
To the best of our knowledge, our paper fills gap in existing literature and advances the research in addressing the issue of unbalanced development between the capacity and the demand in the airline industry by providing competitive profit, giving flexible infrastructure and reducing the impact of disruptions. The notation and the formulation of our problem are presented in the next section.

Problem Statement
In this section, we start by presenting our problem in the first subsection. We use the network terminology in [42]. Afterwards, the assumptions and the parameters are depicted in the remaining subsections, respectively. Our problem is inspired by the integrated research of hub location and revenue management developed by [19], the research on the hubs with multiple capacity levels based on [41], and the reseach on the disruptions considered by [7].

Problem Definition.
In a star-star network, the set includes the nodes of both non-hubs and potential hubs. We denote the non-hub as and the hub as . A central hub 0 is given. In one star topology, the center is the central hub 0 and the spokes are the hubs. The connections are between the hubs and the central hub 0. We name these connections as the primary (trunk) links in this star network. The other star topology is an implementation of a hub-to-non-hub distribution paradigm. This connection can be defined as the secondary (spoke) link. In our problem, any & pair from the origin to the destination can pass through either one hub or two hubs. Figure 1 illustrates this star-star hub network in our problem.
Given that the fixed installation cost of the central hub 0 is . Once the hub is located, only one level with capacity Γ should be chosen from a set containing multiple capacity levels for hub . And a fixed cost is incurred accordingly. The maximum capacity on the primary link between the hub and the central hub 0 is 0 . And the maximum capacity on the secondary link from the non-hub to the hub is limited by . Transportation costs on the primary and secondary links are represented as 1 and 2 , respectively. But 1 is lower than 2 because the economies of scale play a role in reduced unit cost on the primary link between the hub and the central hub, where 1 = 2 , is a discount factor and ∈ [0, 1]. In our paper, the number of the flights represents the flow in this network, which is relevant to (1) the aircraft type operated on different types of the links. Only the aircraft type with capacity 1 is available on the primary link and the aircraft type with capacity 2 is for the secondary link. (2) the protection level of the tickets.
Given that there are fare classes for each flight. The protection level of the tickets and the demand for fare class impact the booking limit of airline ticket. Once the protection level is set, the impact of future adjustments can be ignored. We consider that the uncertainties are represented by stochastic parameters. The uncertainty can be captured by a finite set of scenarios. Each scenario occurs with a probability , ∑ ∈ ∑ ∈ ∑ ∈ ∑ ∈ = 1. Each scenario is relevant to the demand and disruption variation. The uncertain demand of airline tickets for the fare class under scenario is . The probability of the disruption at hub under scenario is .
In our problem, a deterministic model is difficult to describe the changes in the demand. As such, we present a two-stage stochastic programming for our problem. The first-stage decision concerns (1) the locations of the hubs as well as their capacity levels, (2) the connections between the nodes on both the primary and secondary links, and (3) the protection levels of the tickets with different fare classes. In the second stage, we seek the decision on the booking limits of airline tickets. The objective is to maximize the profit, which contains several components: (1) the revenue from tickets sales, (2) the installation cost for the central hub and for the hubs with specific capacity levels, and (3) the transportation cost on both primary and secondary links. Taking the profit into consideration, our problem measures the trade-off between the revenue by selling airline tickets and the cost by building a capacitated hub network with multiple hub capacity levels.
Central hub 0 Hub j Non-hub j Primary (trunk) link Secondary (spoke) link

Assumptions.
We start itemizing the following assumptions: (i) The number of the hubs is predefined as .
(ii) The location of the central hub 0 is predefined.
(iii) Each non-hub is connected to at most one hub , but direct connection between the two non-hubs is not allowed. Similarly, each hub should connect to the central hub 0 without direct connection between the two hubs.
(iv) A hub can process its own outbound flow and distribute its own inbound flow.
(v) Hub is capacitated. We assume that the capacity of hub indicates that the maximum flow can be processed.
(vi) All the flows traverse at least one hub and at most two hubs (not including the central hub).
(vii) Only one capacity level is installed at each hub . The installation cost is relevant to the capacity level that has been chosen for the hub.
(viii) The uncertain demand for each fare class is independent.
(ix) The hub is assumed to be at risk of the disruptions, except for the central hub 0.
(x) The arrival sequence of the demand is not considered.
(xi) Passengers who buy the tickets must appear at the time of departure. Cancellations and no-shows are not considered.

Parameters and Decision
Variable. Before presenting our model, we continue to use the notations in [19] as follows: Journal of Advanced Transportation 5 Indices.
: index for hub.
: index for scenario.
: index for customer class.
: index for capacity level. Sets.
: set of nodes.
: the number of the hubs needs to be built.
: set of scenarios.
: set of fare classes.
: set of capacity levels available for each hub.

Parameters.
0 : distance on the primary link from hub to central hub 0, ∈ .
: distance on the secondary link from non-hub to hub , , ∈ .  : uncertain demand per & pair from the origin to the destination at fare class under scenario , ∈ , , ∈ and ∈ .
: the probability of each scenario for & pair between the origin and the destination at fare class , ∈ , , ∈ and ∈ .
: ticket price on & pair from the origin to the destination at fare class , , ∈ and ∈ . 0 : capacity on the primary link from hub to central hub 0, ∈ .
: capacity on the secondary link from non-hub to hub , , ∈ . 1 : the largest seating capacity available for each aircraft on the primary link. 2 : the largest seating capacity of each aircraft on the secondary link. Γ : capacity of a hub at capacity level , ∈ .
: installation cost of hub at capacity level , ∈ and ∈ .
: installation cost of central hub 0.
: a very large integer.

Variables.
: booking limit of airline tickets for each fare class on & pair from the origin to the destination under scenario .
: protection level of the tickets for each fare class on & pair from the origin to the destination .
: binary decision variable. equals to 1 if node becomes a hub node and 0 otherwise.
: binary decision variable. If non-hub is routed to hub , equals to 1 and 0 otherwise.
V : binary decision variable. V equals to 1 if capacity level is chosen for hub and 0 otherwise.
With these sets of variables and parameters, we can obtain the mathematical models in the next section.

Mathematical Formulations
In this section, we present two models of the integrated hub location and revenue management problem with considerations of multiple capacity levels and network disruptions: (1) a two-stage stochastic programming model in Section 4.1 and (2) a hybrid two-stage stochastic programming-robust optimization model in Section 4.2.

Stochastic Programming Model.
In this section, we give nonlinear and linear formulations to describe our problem.
First, a nonlinear model can be expressed as the following deterministic equivalent program [43]: Journal of Advanced Transportation ≤ , ∈ (4) ≤ , ∈ , ∈ , ∈ ≤ , ∈ , ∈ , ∈ , , ∈ + , ∈ , ∈ , ∈ The objective function (1) is to maximize the profit. The profit is obtained by subtracting the cost from the revenue. In the first term, the revenue of airline tickets with different fare classes for all & pairs with the origins and the ends is computed. The second term denotes the transportation cost on the primary link between the hub and the central hub. The third term represents the transportation cost on the secondary link between the non-hub and the hub. The fourth term represents the installation cost of the hubs with the chosen capacity levels. The last term represents the installation cost of the central hub. Constraint (2) indicates that each non-hub must be allocated to at most one hub. That is, ∑ ∈ = 1 indicates that each non-hub allocates to one hub. ∑ ∈ < 1 indicates that not all hubs need to be connected in the network when less profit incurs. The number of the hubs is due to constraint (3). Constraint (4) guarantees that the non-hub only can be allocated to each installed hub. Constraints (5) and (6) explain that the booking limit of airline tickets at fare class is related to uncertain demand and the protection level of airline tickets under scenario . Constraints (7) and (8) (9); Second, the capacity of hub should load the flows from both primary and secondary links. The left-hand side of constraint (10) represents the leftover capacity of hub after the disruptions. The right-hand side of constraint (10) represents all the flows via hub . Constraints (11) and (12) impose non-negativity on the integer decision variables.
The above model is formulated as a mixed-integer nonlinear programming (MINLP) due to the nonlinear terms (1 − ), (1 − ), and in objective (1) and the nonlinear terms (1 − ), (1 − ), and in constraints (7) and (10). Due to the computational complexity of MINLP, we propose a linear reformulation by using a linearization technique in which we introduce auxiliary nonnegative variables, additional variables, and big-parameters.

Hybrid Two-Stage Stochastic Programming-Robust Optimization Model.
To provide flexible decisions for this integrated problem, we combine the benefits of stochastic programming and robust optimization. Consequently, we provide a hybrid stochastic programming-robust optimization model based on the above nonlinear formulation in Section 4.1. The results can provide relative emphasis on the average-case and worst-case profits by introducing a pair of weights 1 and 2 , where 1 , 2 ∈ [0, 1], 1 + 2 = 1.

Solution Approach
In this section, we introduce two approaches: (i) sample average approximation (SAA) and (ii) genetic algorithms. The details can be given in Sections 5.1 and 5.2.

Sample Average
Approximation. SAA can be effective to deal with stochastic programming problems. The basic idea of SAA is to realize uncertain parameters by generating random samples and to approximate the expected value by the corresponding sample average function. The sample size can be much smaller than the number of scenarios in the true problem and SAA can converge exponentially fast to the true problem as the sample size increases [44]. SAA has been applied extensively in the network design ( [45][46][47]). In our paper, we apply the SAA scheme to solve the linear model in Section 4.1 by randomly generating the sample size . Then the linear model in Section 4.1 is approximated by the following SAA problem: ≤ , , ∈ , ∈ , ∈

Genetic Algorithms.
As the size of the network increases, computational challenges appear when finding the optimal results. A lot of research has used GA to solve the capacitated hub location problem in reasonable computational time ( [48][49][50][51][52]). The GA is inspired by natural biological evolution, which is introduced by [53]. In initial generation, the individuals are generated randomly. In addition, the individuals need to be selected according to their fitness in each population. The offspring can be obtained by using the crossover and mutation operators to deal with two parent individuals. This process can enhance the individuals better suited to the problem.
Here, we introduce a GA approach in [19]. First, this GA can enumerate all the possible individuals by the values of and . For the MINLP model in Section 4.1, the near optimal solutions can be obtained by CPLEX based on these individuals.
We briefly mention the procedure of this approach as follows: tep 0 (initial population generation). Randomly generate a population of chromosomes.
is the number of nodes. We name this matrix as matrix .
tep 0.2. highest entries from the diagonal of matrix should be chosen. These entries can be turned into 1 and the other unselected entries on this diagonal can be rewritten as 0. These entries equating to 1 on this diagonal mean that these nodes are chosen as the hubs, namely, = 1. = 0 indicates the entries equating to 0 on this diagonal. tep 0.3. We observe the overlaps between all columns equating to 1 on this diagonal and all rows equating to 0 on the same diagonal. For these overlaps, we define the highest entry in each row as 1. On the contrary, the other entries in the same row of these overlaps should be 0. The entries equaling to 1 within these overlaps mean that these nodes are chosen as the non-hubs, namely, = 1. Otherwise = 0. The other entries in addition to the ones = 1, = 0, = 1, and = 0 can be rewritten as 0. Beyond that, if the overlaps in the same row have the same values, we choose the first highest one as = 1 from the left edge of this matrix.
tep 0.4. A 0-1 matrix can be obtained by these steps. We name this 0-1 matrix as matrix .    2.2 (crossover). The crossover operator is used for producing the offsprings. The offspring can be generated by combining the parents with a crossover probability in following rules: where 1 and 2 are the offsprings. and represent two different parents and each of them is represented by a -dimensional matrix. is a random -dimensional matrix chosen uniformly over the interval [0, 1]. Each offspring is performed by a combination of two parents based on the above equalities (42) and (43) with the same matrix . tep 2.3 (mutation). The new offspring can be mutated at a mutation probability. The mutation operator can help to obtain a global flexible solution and to recover the good genetic codes which may lose from the above steps. We consider three mutation operators: Elimination, Transposition, and Conversion. In each iteration, only one of these mutation operators can be randomly selected. These three mutation operators are as follows: (i) Elimination. The hub needs to be relocated in this operator. We name the highest entry on the diagonal of matrix as place . This place needs to be subtracted from 1. (ii) Transposition. This operator aims to change the links between the non-hubs and the hubs by transposing matrix .
(iii) Conversion. A new network structure is established by this operator. A -dimension matrix ( < ) is chosen from the bottom-right corner of matrix , namely, matrix . This random integer is created from the multiplication of the nodes and a random number drawn from a uniform distribution [0, 0.5]. We can generate a new -dimension matrix of uniformly distributed between 0 and 1 to replace matrix . tep 3 (stopping criterion). If a stopping criterion is fulfilled, then terminate the iterations and output the best solution from the population. The stop conditions should be fulfilled by any one of these following two conditions: (i) The best solution has not changed after 2 iterations.
(ii) The maximum running time of 8 hours has been reached.

Computational Experiments and Discussion
In this section, we conduct numerical experiments for our problem to verify the proposed mathematical models and to evaluate the performance of the used approaches. We use the SAA and the GA to solve the stochastic programming in Section 4.1. For the hybrid model in Section 4.2, we only apply the GA to obtain the results. First, we give all the parameters in Section 6.1. Then the results can be analyzed in Section 6.2.

Test Bed.
For parameter setting, the nodes and the hubs ( , ) in this airline network are assumed to be 4 pairs: (5, 2), (10, 2), (10,3), and (10, 5). We consider two cases for the capacity levels: (1)Case 1: every hub has two capacity levels to be chosen, 0.5 × 10 7 and 0.1 × 10 8 . For the capacity level 0.5 × 10 7 , the installation cost of hub is drawn from a uniform distribution (0.1 × 10 7 , 1.2 × 10 6 ). For the capacity level 0.1 × 10 8 , the installation cost is drawn from a uniform distribution (1.2 × 10 6 , 1.4 × 10 6 ). (2)Case 2: three capacity levels can be selected for each hub, 0.5 × 10 7 , 0.1 × 10 8 , and 1.5 × 10 7 . The corresponding installation cost is drawn from uniform distributions (0.1 × 10 7 , 1.2 × 10 6 ), (1.2×10 6 , 1.4×10 6 ), and (1.4×10 6 , 1.6×10 6 ), respectively. The installation cost is drawn from a uniform distribution (1.2 × 10 6 , 2.4 × 10 6 ). Two fare classes ( = 2) involve a business class = 1 and an economic class = 2. For the primary link, the transportation cost 11 per unit distance per flight at the business class is drawn from (7,14) and 12 at the economic class is 11 , where ∼ (0.5, 1). We consider the discount factor as 0.2 on the secondary link, which is 21 = 11 and 22 = 12 . We assume that the distance from the non-hub to the hub is 10000 and from the hub to the central hub 0 is 10000. The capacity on the primary link is drawn from a discrete uniform distribution (1,3). Meanwhile, the capacity 0 on the secondary link is drawn from (3,6). The aircraft capacity 1 used on the primary link and 2 on the secondary link are 100 and 200, respectively. The ticket price at the economic class = 2 is drawn from a continuous distribution (600, 3600). The price at the business class = 1 is 1.5 higher than the one at = 2. The demand for each fare class is drawn from the interval between low demands and high demands [3.75, 6.25]. 80% demand comes from the economic class = 2 and 20% is from the business class = 1.
About the scenarios, we set the number of scenarios as 10. The probability of each scenario is the same, namely, 1/ . In addition, the probability of crossover is 0.

Computational Results and Analysis.
In this section, we analysis the results of using the approaches in Section 5 on a series of instances. Computational experiments are implemented in a MacBook Pro using an Intel Core i7 3GHz CPU with 8GB RAM, running macOS High Sierra 10.13.3 operating system. Furthermore, all experiments are coded in Matlab R2017b which itself calls YALMIP R20171121 to run through CPLEX 12.8.0. For all the instances, one run is conducted for each approach, according to the settings in Section 6.1. Then all the results are analyzed as follows.
First, we compare the results of the SAA and the GA for the stochastic programming problem in Table 1. We consider the instances with two capacity levels = 2 and with three capacity levels = 3. In Table 1, the results of = 2 are showed from rows 4 to 7. The results of = 3 are reported between rows 9 and 12. The node and the hub are listed in the column ( , ). The columns under SAA represent the profit and the computation time of this algorithm. It is the same for the columns under GA which represent the profit and the computation time. The unit of computation time is 1 s. In terms of profit indicator, the SAA achieves slightly larger values than the GA in 6 among 8 instances. It is reasonable because the value of the best individual within a population can be regarded as the solution of our problem for the GA. However, the SAA considers the mean value of all the samples. Even though the value of one of the samples is the worst, the profit can't sharply curtailed. In particular, the worst sample has less influence on the mean when more samples are considered. In terms of running time indicator, the SAA performs better than the GA in 5 out of 8 instances. These results indicate that the SAA is superior  to the GA in computation time. This table well illustrates the SAA performs better than the GA in most instances. Second, we apply the GA to the hybrid problem in Section 4.2 for the instances with = 2 in Table 2. All the computations are done in 8 hours. In Table 2, the column ( , ) represents the number of the nodes and the hubs . The columns under the weights ( 1 , 2 ) represent the profit considering five pairs of weights. For every column, the profit gets larger as the scale of the network becomes larger. These results reflect that the profit is relevant to the instance size. In particular, larger profit can be obtained when this network with the same number of the nodes has more hubs. It is reasonable because the hub can facilitate the transport and create more business opportunities. As such, more profits can be obtained in the hub network. For each row, the profit is decreasing as 1 increases and 2 decreases. The maximum is obtained at (0, 1) and the minimum is at (0.8, 0.2). These results show that 2 has a greater influence on the profit than 1 . More specifically, the worst-case value plays an essential role in the profit compared to the average-case value. Hence, this analysis shows that the decision maker can make flexible decisions by putting relative emphasis on either worst-case or average-case value in the profit according to the practical issues.
In addition, we also collect the worst-case values by applying the GA for the hybrid model in Table 3. Single run has been carried out for the instances with two capacity levels. The worst-case values of Ψ are shown under five pairs of weights. For each column, Ψ manifests a strong increase as the instance size increases. For the instances with the same number of the nodes and different number of the hubs, Ψ becomes larger in the network with more hubs. For every row, the values of Ψ are the same even with different weights. It is reasonable because Ψ is relevant to the network size, the scenario, and the capacity level in our problem. If the values of the above-mentioned parameters are fixed, then the value of Ψ can't be changed under any case of the weights.
For illustrative purposes, we plot all the results from Tables 1 and 2 in Figure 2. This figure comprises four subfigures representing the profits in four instances: (1) = 5 and = 2, (2) = 10 and = 2, (3) = 10 and = 3, and (4) = 10 and = 5. The marker "◼" represents the solutions computed by SAA for the stochastic programming model in Table 1. The marker "e" illustrates the profits obtained from the GA for the stochastic programming model in Table 1. The marker " " indicates the results computed by the GA for the hybrid model in Table 2. In each subfigure, the -axis denotes the approach and the weights. The -axis indicates the profit. We observe that the maximum profits can be obtained by using the SAA for the stochastic programming model and the minimum profits are collected by the GA for the stochastic programming model in each subfigure. The values for the hybrid model are between the values computed by SAA and by the GA for the stochastic programming model. The values of SAA are greater than the ones with the weights (0, 1). In fact, the values with the weights (0, 1) are very restrictive because the results are the worst-case values from a robust optimization model. As such, the values of the SAA are more restrictive in solving the proposed instances. Different from the case under the weights (0, 1), the values at (0.8, 0.2) are from solving a hybrid model which considers less impact from the worst-case value so that the results are less restrictive. We also conclude that the values obtained by the GA for the stochastic programming model are less than the values under the weight (0.8, 0.2). Hence, the hybrid model outperforms the stochastic programming model in these instances. Profit with node=10 and hub=3

Conclusions
This paper addressed an integration of single allocation capacitated -hub location and revenue management in air transportation. The hubs with multiple capacity levels and the disruption were considered. We modeled this problem as a two-stage stochastic programming formulation and a twostage hybrid stochastic programming-robust optimization model to maximize the profit in a star-star network. In the first stage, the location of the hub, the link between the hub and the non-hub, the capacity level of the hub, and the protection level of tickets for different booking classes were determined. The booking limit of the tickets could be obtained in the second stage. We used a set of discrete scenarios to represent uncertain demand and network disruption at the hub. The SAA and the GA were applied to solve the stochastic programming model. For the hybrid model, we also used the GA. In the numerical experiments, we obtained the computational results when solving the stochastic programming model and the hybrid model. The performance of the SAA and the GA for the stochastic programming model was compared. Obviously, the SAA outperformed the GA in computing time and high-quality results. Accordingly, we analyzed the factors influencing the results for the hybrid model. The network size and the weight played a role in the profit and the results of Ψ. Furthermore, we demonstrated that the hybrid model outperforms the stochastic programming model. Future studies could be enriched from these perspectives: (1) there should be an extension to a competitive hub location problem by considering the factor of market competition, and (2) an exact algorithm could be developed to reduce the computation time.

Data Availability
The data used to support the findings of this study are included within the article.