A Lagrangian decomposition scheme for choice-based optimization

We consider a family of mixed-integer linear programming (MILP) problems that explicitly includes choice models using the random utiity paradigm. In order to use behavioral models published in the literature, the speciﬁ-cation relies on a simulation-based approximation to linearize the choice model. The price to pay is the large size of the approximation. To enhance the tractability of the problem, we propose a Lagrangian decomposition methodology inspired by scenario decomposition and scenario grouping in the stochastic programming framework. In addition, we develop an algo-rithm that exploits the properties of choice-based optimization in order to generate feasible solutions to the original problem from the solution of the Lagrangian problem. Hence, at each iteration of the subgradient method, we provide both an upper and a lower bound to the original problem. This enables the calculation of the duality gap to assess the quality of the generated solutions. Computational results show that the decomposition method provides solutions with optimality gaps below 0.5% and restricted duality gaps within low computational times. We show that scenario grouping leads to high quality feasible solutions and lower duality gaps.


Introduction
Choice-based optimization problems enable planners to make decisions while taking into account individual's choice behavior. They are receiving increasing attention because they allow to explicitly capture the interplay between these decisions and the expected demand provided that the decisions are explanatory variables of the discrete choice model. In this paper, we focus on parametric choice models rooted in the random utility theory. They assume that each individual associates a utility with every alternative and chooses the alternative with the highest utility. Optimization problems that consider these models can be found in facility location (e.g., Benati, 1999, Haase andMüller, 2013), revenue management (e.g., Talluri andVan Ryzin, 2004, Shen andSu, 2007), network pricing (e.g., de Palma et al., 2005, Gilbert et al., 2014b and routing and pricing for same-day deliveries (e.g., Prokhorchuk et al., 2019, Ulmer, 2020, to name a few. As utilities are random variables, the expected demand of each alternative is derived from the associated probability functions. These probabilities are highly non-convex and non-linear in the explanatory variables of the choice model, and they are not always available in closed form. Furthermore, choice-based optimization problems come with the computational burden associated with the individual representation of demand from a planning viewpoint (e.g., capacity allocation). This makes only small to moderate size problems solvable to optimality (e.g., Andersson, 1998, Benati andHansen, 2002). This is also the case in Pacheco Paneque et al. (2021), where a mixed-integer linear formulation of choice models is proposed. To overcome the stochastic nature of the choice model, we simulate realizations of the probability distribution associated with its random components, called scenarios. The individual choice probabilities, and therefore the expected demand, are approximated by means of the sample-average approximation principle on the scenarios. This comes at the cost of large mixed-integer linear programming (MILP) formulations. Although small to medium-size instances can be optimally solved with general-purpose MILP solvers, practically relevant problems might involve a larger number of individuals and a considerable number of scenarios to enhance precision.
Each scenario can be seen as an independent behavioral situation where individuals consider to make a choice. These scenarios provide the problem with a decomposable structure that can be exploited to address the computational complexity of the exact method. In this paper, we propose a heuristic method inspired by scenario decomposition in the stochastic programming (SP) framework for the general choice-based optimization problem introduced in Pacheco Paneque et al. (2021). In scenario decomposition, copies of the first-stage decisions of multi-stage SP problems are introduced for each scenario. The so-called nonanticipativity constraints, which impose that these copies should be the same across scenarios, are dualized. Then, the subproblems associated with each scenario are independently solved (i.e., Lagrangian decomposition or variable splitting in combinatorial optimization). The main advantage of this approach is that the interplay between the decisions and the expected demand is preserved in the subproblems. The bound yielded by scenario decomposition can be improved by grouping scenarios at the expense of solving larger subproblems. To this end, we test different strategies based on similarity measures (i.e., Euclidean distance).
We enhance the proposed method with an algorithm that solves a restricted MILP formulation of the original problem. This formulation uses information from the solutions of the subproblems at each iteration of the subgradient method applied to solve the Lagrangian dual. The algorithm exploits the properties of the choice-based optimization problem and allows to efficiently generate feasible solutions. Computational experiments show that the proposed methodology outperforms the considered commercial solver with respect to the best feasible solution found for a given time budget. At the same time, the generation of feasible solutions at each iteration of the subgradient method enables the calculation of the duality gap. This gap allows to assess the quality of the generated solu-tions, and therefore the performance of the method, as optimal solutions cannot be obtained for large instances.
This paper is organized as follows. Section 2 reviews the related work and discusses our main contributions. Section 3 introduces the general problem and its MILP formulation. Section 4 explains how the problem is decomposed via scenario decomposition and outlines the algorithm to generate feasible solutions. Section 5 describes the revenue maximization problem upon which the proposed methodology is tested and includes the numerical experiments. Finally, Section 6 concludes with some final remarks and future research avenues.

Related work
Because of the probabilistic nature of choice models, the formulations resulting from their integration in optimization models are typically non-linear and nonconvex. Researchers have therefore focused on the development of reformulations and efficient algorithms to solve practical problems. In Section 2.1, we provide an overview of solution methodologies in various contexts. Recent literature on scenario decomposition and scenario grouping is reviewed in Section 2.2. To conclude, Section 2.3 outlines the main contributions and discusses the positioning of the paper. Benati and Hansen (2002) introduce a logit-based model for the optimal location of new facilities. The resulting formulation is a hyperbolic sum integer programming (IP) model. They develop a branch-and-bound algorithm with a concave relaxation of the problem as a dual bound. Computational results show that only problems of moderate size (50 nodes) can be solved to optimality. For the same problem, Freire et al. (2016) propose a branch-and-bound algorithm that embeds a greedy algorithm to solve a relaxation of the original problem. Outer approximation and submodular cuts for maximum capture facility location problems with random utilities (2018) propose a branch-and-cut algorithm based on outerapproximation and submodular cuts. This approach has been recently enhanced in Mai and Lodi (2020) with a cutting-plane algorithm that requires less cuts thanks to the clustering of demand points. The last three algorithms are tested on a real instance for the location of park-and-ride facilities in New York City, with 59 candidate locations and more than 82000 customers. In general, the branch-and-cut algorithm is better in terms of number of solved instances but the cutting-plane algorithm is remarkably faster. Gilbert et al. (2014a) consider the toll setting problem on a transportation network where users are assigned to paths according to the logit model. The resulting optimization model is non-linear and non-convex, and may have several local optima. An exhaustive algorithmic study of this problem is carried out in Gilbert et al. (2015). To solve the problem, a mixed-integer approximation scheme that provides starting points from which a local search converges to near-optimal solutions is implemented. Numerical experiments on dense circular networks (20 nodes, 90 arcs, 10 origin-destination pairs and up to 10 toll arcs) show that nearoptimal solutions can be obtained. In Gilbert et al. (2014b), the path assignment is performed according to a mixed logit model, whose probabilities do not have a closed form. A similar solution method is considered to solve the problem. The large duality gaps obtained for some of the tested instances (for a given time budget) confirm the fact that the combinatorial approximations of the problems are hard to solve.

Solution methodologies
Other works consider linear reformulations of the original non-linear models. Three of such reformulations for the problem on the optimal location of new facilities are compared in Haase and Müller (2014). Numerical experiments with up to 400 customer zones and 50 candidate locations show that the linearization by Haase (2009) outperforms the other two. This formulation is later strengthened in Freire et al. (2016). Outer approximation and submodular cuts for maximum capture facility location problems with random utilities (2018) show that the proposed branch-and-cut algorithm outperforms the existing exact approaches, including the discussed linear reformulations. In the context of school location, Haase and Müller (2013) propose a linear IP formulation under a mixed logit model. The performed experiments show that real instances (up to 113000 students and 26 candidate locations) are solved (close) to optimality within a few minutes. Lin et al. (2020) formulate the optimal location of self-service lockers as a multi-ratio linear-fractional 0-1 programming model and provide two solution methodologies: an MILP reformulation for small-scale problems (networks with up to 100 nodes) and a quadratic transform with linear alternating algorithm for large-scale problems that outperforms the MILP approach. Gallego et al. (2004) introduce a dynamic programming (DP) model for a choice-based general network revenue management problem. They show that its optimal value can be closely approximated by the optimal solution of an appropriately constructed linear programming (LP) model. Since then, researchers have focused on various approximations of the underpinning DP formulation, offering provable bounds on the optimal expected revenue instead of performance guarantees on the optimality gap (Strauss et al. (2018)). Yan et al. (2008) address the problem on fleet routing and flight scheduling with both stochastic market demands (via scenarios) and market shares (via a logit model). To solve the twostage SP model, they propose two heuristic algorithms that fix a market demand scenario and approximate the resulting mixed-integer non-linear programming (MINLP) model by fixing the decisions that are explanatory variables of the logit model.

Scenario decomposition and scenario grouping
Scenario decomposition is applied to the deterministic equivalent formulation that results from considering a finite number of scenarios of a multi-stage SP problem. Carøe and Schultz (1999) developed an algorithm for multi-stage SP problems with mixed-integer variables in all stages. They introduce copies of the first-stage variables (which do not depend on the random data) for each scenario and relax the non-anticipativity constraints in a Lagrangian manner. This method is incorporated as a bounding procedure within a branch-and-bound algorithm to achieve convergence. Numerical experiments with up to 10 scenarios show that the Lagrangian dual provides considerably better lower bounds than the LP relaxation and that feasible solutions within 0.2% of the optimum are found.
Scenarios can be gathered into groups such that a copy of the first-stage variables is introduced for each group. Escudero et al. (2013) are the first to explore such an idea. They propose a cluster-based Lagrangian decomposition procedure for the two-stage SP mixed 0-1 problem. Each cluster is a set of scenarios built at random where the non-anticipativity constraints are implicitly considered. Computational experiments on instances with up to 500 scenarios show that this technique outperforms traditional Lagrangian decomposition for single scenarios both in the quality of the bounds and computational effort. This idea has been extended to the multi-stage SP problem (Escudero et al., 2016 for the binary case, Gade et al., 2016 for the integer case). Scenario grouping has also been considered together with scenario reduction techniques in van Ackooij et al. (2018) to dynamically update the groups during the iterative solution process. Crainic et al. (2014) evaluate multiple grouping strategies for the two-stage network design problem. They are defined by the type of splitting (cover or partition), grouping (at random or based on a similarity/dissimilarity measure inspired by the k-means clustering algorithm), and scenario characteristics according to which the similarity/dissimilarity is measured. Numerical experiments for up to 32 scenarios on instances with 10 nodes, up to 83 arcs and up to 50 commodities show that the covering strategy is the one reporting the highest quality solutions. More recently, Ryan et al. (2020) introduce an optimization problem for grouping scenarios to maximize the bound improvement. This technique provides stronger initial relaxation bounds when compared with random and k-means clustering (at the cost of an increase in computational time). The main advantage is that it can be incorporated into any general scenario decomposition algorithm as a preprocessing step.

Contributions
The works analyzed in Section 2.1 illustrate the complexity of solving choicebased optimization problems, especially for large instances. In this paper, we rely on the mixed-integer linear formulation of choice models introduced in Pacheco Paneque et al. (2021). The resulting MILP model is sensitive from a computational point of view. Indeed, the exact method fails at solving instances with a large number of scenarios, alternatives and/or individuals, which usually arise in real-life problems. To tackle such instances, we introduce a heuristic solution method that takes advantage of the simulation-based linearization in the MILP model by exploiting the principles of scenario decomposition and scenario grouping.
The proposed approach solves a general choice-based optimization problem. This problem aims at deciding on (some of) the characteristics (e.g., price) of a given set of alternatives such that an objective function determined by the planner (e.g., revenue) is optimized (see Section 3). Notice that these decisions are the same across scenarios. As seen in Section 2.2, they play the role of first-stage variables in multi-stage SP problems. To the best of our knowledge, this is the first time that a scenario-based decomposition method is used to solve a choicebased optimization problem.
Our method is also general with respect to the choice model. This is an important feature because we observe that, in most cases, the proposed methods are specific to the assumed choice model, which is typically the logit model because of the simplicity of its closed-form probability formula. However, advanced choice models like the mixed logit model overcome the main limitations of the logit model and have shown a better prediction power. We numerically test our method on a revenue maximization problem under a mixed logit model. The results show that it outperforms by a considerable margin a general-purpose MILP solver for a given time budget and that the duality gaps are as well restricted.

Problem definition and mathematical model
The optimization problem considered in this paper embeds a discrete choice model whose parameters have been exogenously estimated to represent the expected demand. Let N be the number of individuals in the sample (population) and J the number of alternatives that can be chosen by the individuals according to the choice model. For each alternative i, we assume a capacity denoted by c i ≥ 1, which indicates the maximum number of individuals who can choose it. We denote the number of scenarios by R. These scenarios are generated outside of the optimization problem (known as exterior approach in the SP framework).
In a choice model, U in represents the utility that individual n obtains from choosing alternative i. It consists of a systematic component and a random component that captures everything that has not been explicitly included. The behavioral assumption is that each individual wants to maximize their own utility. For each scenario r, we introduce a deterministic utility U inr that has the same systematic component as U in but replaces the random component by a realization of its probability distribution (scenario).
To formulate the problem, we denote the K decisions associated with alternative i and individual n by x in1 , . . . , x inK . These variables are assumed to be bounded, i.e., x ink ∈ [a ink , b ink ], ∀i, n, k, and might be binary. The binary variables y inr represent the availability of alternatives, i.e., they are equal to 1 if the capacity of alternative i has not been reached for individual n in scenario r. The auxiliary variables z inr prevent an unavailable alternative from obtaining the highest utility. They take value U inr if alternative i is available to individual n in scenario r, and a given low value otherwise. The behavioral assumption of the choice model is expressed in terms of z inr , i.e., individual n chooses alternative i in scenario r if i = argmax 1≤j≤J z jnr . The variables U nr = max 1≤j≤J z jnr capture the highest utility for individual n in scenario r, and the binary variables w inr indicate the choice, i.e., they are equal to 1 if U nr is achieved at alternative i. The expected demand of alternative i is then obtained by The objective function relates the planner's decisions to the expected demand. We denote it by f(x, D R ), where x is the J × N × K-dimensional vector containing the variables x ink and D R is the J-vector that contains the demands D R i . We assume it is linear in x and the choice variables. Without loss of generality, we assume f(x, D R ) is to be maximized. We define Model 2 for the general choice-based optimization problem considered in this paper.
Equations (2b) define U inr as the sum of the systematic component that de-pends on the planner's decisions and a constant term denoted by d inr . This term includes the r-th scenario associated with alternative i and individual n and other elements of the systematic component that are constant to the optimization problem. As x ink , ∀i, n, k, is bounded and d inr , ∀i, n, r, is constant, we can derive bounds on U inr : ℓ inr ≤ U inr ≤ m inr . Constraints (2c)-(2f) provide the linear formulation of the variables z inr , where ℓ nr = min 1≤j≤J ℓ jnr and M inr = m inr − ℓ nr . Constraints (2g)-(2h) provide the linear formulation of U nr , where m nr = max 1≤j≤J m jnr and M nr = m nr − ℓ nr . Constraints (2i) ensure that one and only one alternative is chosen per individual and scenario. Constraints (2j) prevent an unavailable alternative from being chosen. Constraints (2k)-(2l) handle capacity constraints on the alternatives. We assume that the access of individuals to the alternatives for each scenario is modeled by an exogenously given priority list. This list determines the access order of individuals, i.e., if individual n does not have access to the alternative in that scenario, neither does individual n + 1, and consequently the upcoming individuals. Constraints (2k) forbid the access to an alternative for a scenario when its capacity has been reached, whereas constraints (2l) ensure its availability otherwise. Constraint (2m) represents the set of linear constraints that identify the requirements of x. This mathematical model lacks efficiency for solving large instances with a general-purpose MILP solver, as showed in the numerical experiments performed in Pacheco Paneque et al. (2021). Indeed, the individual representation of the expected demand and the simulation-based linearization of the choice model, which involves the presence of the so-called big-M constraints, i.e., (2c)-(2f) and (2g)-(2h), contribute to the complexity of the formulation.

Scenario decomposition method
We introduce a heuristic solution method that exploits the properties of the choicebased optimization problem described in Section 3. It involves a subgradient procedure (see Section 4.3) that generates an upper bound and a feasible solution (lower bound) to Model 2 at each iteration. Section 4.1 describes the decomposition based on scenario groups and outlines the scenario grouping strategies. Section 4.2 presents the algorithm to generate feasible solutions from the Lagrangian solution.

Decomposition by scenario groups
Let S be the number of scenario groups (indexed by s) and R s the set that contains the scenarios belonging to group s. Notice that the scenarios are partitioned into groups, i.e., each scenario belongs to one and only one group. Both S and R s , ∀s, are determined according to one of the grouping strategies described below. The copy of x ink associated with group s is denoted by x s ink . Constraints (3) represent the non-anticipativity constraints imposing that the copied variables x s ink must be equal across groups. Constraints (3b) are redundant, but allow to obtain tighter upper bounds, as shown by preliminary experiments performed on the instances described in Section 5.2. Note that other characterizations of the non-anticipativity constraints are possible, such as relating all copies with each other (e.g., Escudero et al., 2013).
The Lagrangian relaxation of constraints (3) with associated multipliers α s ink ∈ R, ∀i, n, k, s, yields independent subproblems for each group s. As the objective The subproblem associated with group s is formulated in Model 5. It is essentially Model 2 for a reduced number of scenarios whose objective function is modified. As shown in Pacheco Paneque et al. (2021), the computational complexity of Model 2 grows exponentially with respect to R. Hence, for a given number of individuals, it is usually more efficient to solve multiple problems with a small number of scenarios each rather than a single problem containing multiple scenarios. This trade-off is analyzed in Section 5.4.
The Lagrangian problem associated with Model 2 is defined as follows: It provides an upper bound on z MILP for any values of α. Notice that α 0 ink (obtained when s = 1) refers to α S ink , ∀i, n, k. We consider various scenario grouping strategies that are inspired by the strategies introduced in Crainic et al. (2014). In our case, the descriptive statistics for each scenario r are given by the J × Ndimensional vector that contains the realizations associated with r.

Random (RAN)
This strategy serves as a benchmark to evaluate the more refined strategies described below. To balance the number of scenarios per group, we define σ groups with ⌈R/S⌉ scenarios and S − σ groups with ⌊S/R⌋ scenarios, where σ is the remainder in the Euclidean division of R by S.

Similar (SIM)
This approach is inspired by the k-means clustering algorithm. In this case, the distance between scenarios is defined by the Euclidean distance between the associated descriptive vectors. In addition to creating the scenario groups, this strategy determines the number of groups by assuming a lower (S) and an upper bound (S) on S and selecting S ∈ {S, . . . , S} such that the difference between the error associated with S and S − 1 is maximized. Since this strategy might generate very unbalanced groups with respect to the number of scenarios, we set ⌈R/S⌉ as the maximum number of scenarios in a group. As soon as this value is reached, the group is no longer updated. The scenarios are assigned to groups according to the difference between their distance to the nearest and the farthest group. Algorithm 1 outlines the pseudocode of this strategy.
Similar without dissimilar scenarios (SIM-D) This strategy first creates similar groups using SIM and generates additional dissimilar groups that contain one scenario from each similar group. The selected scenario is the one that is closest to its mean. To balance the number of scenarios, we also set a maximum of ⌈R/S⌉ scenarios per group. As soon as this value is reached, a new dissimilar group is created.

Dissimilar (DIS)
As opposed to SIM, scenarios are assigned to the groups that are the farthest away from the mean. They are ranked in increasing order of the difference between the distance to the nearest and to the farthest group. The selected number of groups S ∈ {S, . . . , S} is the one with the largest average of the distances between each group mean and the scenario closest to that mean. Update S * = S and store the associated scenario groups g s , ∀s = 1 . . . S * ; 20 Return S * and the associated scenario groups g s , ∀s = 1 . . . S * ;

Generation of feasible solutions
The decomposition technique described in Section 4.1 does not necessarily yield a feasible solution to Model 2. A feasible solution provides a lower bound to the optimal solution of Model 2. It allows to compute the duality gap at each iteration of the subgradient method. This gap is defined according to the relative difference between the decomposition's upper bound and the generated lower bound.
To efficiently generate a feasible solution, we solve the restricted MILP formulation to Model 2 that fixes the x-variables to the values these variables obtain in the Lagrangian solution. When the x-variables are fixed, the resulting problem can be iteratively solved with Algorithm 2. It iterates over the scenarios and the individuals in the order provided by the priority list. For each individual and sce-nario, it generates the associated choice according to the discounted utility and updates the occupancy level of the chosen alternative.
Notice that S values from the Lagrangian solution are available to each variable x ink . Letx s ink , ∀i, n, k, s, be the value associated with the copied variable x s ink . We evaluate all combinations that can be characterized fromx s ink , ∀i, n, k, s, as shown in Algorithm 3. This results into S J×N×K combinations, where J represents the number of alternatives, N the number of individuals and K the number of decisions. For each combination, Algorithm 2 is performed, and the feasible solution with the largest objective function value is selected. Despite Algorithm 3 being efficient (see Section 5.3), evaluating all combinations might prompt a considerable computational burden. In this case, Algorithm 3 can be simplified by only evaluating one combination per group, which generates S feasible solutions. Update z LB = z LB current and keep the associated feasible solution; 6 Return the best lower bound z LB and the associated feasible solution;

Subgradient method
To obtain the best possible upper bound to Model 2, we need to solve the Lagrangian dual The Lagrangian function z UB (α) is non-differentiable. However, subgradient directions can be easily generated. The subgradient method constructs a sequence {α ν } ν using where ν denotes the iteration, γ ν is a positive scalar called step size and d ν is a vector representing the direction of motion called step direction. The step direction d ν can be directly defined as the subgradient direction. The J × N × K × S-dimensional vector v defined by v s ink = x s ink − x s+1 ink , ∀i, n, k, s, and evaluated at the Lagrangian solution, is a subgradient of z UB (α) at any value of α. Nevertheless, preliminary experiments showed that the angle between v ν and v ν−1 is obtuse in multiple occasions. This leads to a next Lagrangian multiplier α ν+1 that is close to the previous one, which slows down the convergence of the procedure. This effect is known as zigzagging of kind I, and can be overcome by deflecting the step direction (Camerini et al., 1975): where ζ ν ∈ R ≥0 is a suitable scalar called deflection parameter (notice the negative sign because the Lagrangian dual is to be minimized). By defining this parameter as with 1 ≤ τ < 2, the step direction is forced to always form an acute angle with the preceding direction, which eliminates the zigzagging of kind I. Notice that in the absence of zigzagging of kind I (i.e., v ν d ν−1 ≥ 0), the step direction d ν is equal to −v ν (the negative subgradient). We consider the step size most commonly used in practice (Held et al., 1974): where d ν 2 is the norm of the step direction, i.e., d ν 2 = J i=1 N n=1 K k=1 S s=1 (d s,ν ink ) 2 , and λ ν is a step size decreasing parameter satisfying 0 < λ ν ≤ 2. Fisher (1973) suggests to halve this value whenever z UB (α ν ) has failed to decrease in a given number θ of consecutive iterations.
Algorithm 4 presents the pseudocode of the subgradient method. Notice that unlike the ordinary gradient method, the subgradient method is not a descent method. This is why we keep track of the best upper and lower bounds found throughout the process. We set a time budget as stopping criterion. Set λ ν ← λ ν /2;

Computational experiments
The goal of the experiments carried out in this section is threefold. First, we evaluate the performance of the decomposition method against a general-purpose MILP solver for a given time budget and compare the different scenario grouping strategies against each other (Section 5.3). Second, we assess the trade-off between the solution quality and duality gap and the number of scenarios per group (Section 5.4). Third, we analyze the impact in the algorithm's performance of scaling up the problem size by increasing the dimensions not involved in the decomposition (Section 5.5 for an increase in the number of alternatives and Section 5.6 for an increase in the number of individuals). Before that, Section 5.1 characterizes the revenue maximization problem. This choice-based optimization problem yields a concrete MILP formulation that we use to run the experiments. Section 5.2 describes the case study and the calibration of parameters for the subgradient method.

Revenue maximization problem
The planner aims at finding the best pricing strategy in order to maximize its revenue by offering services to a market. Each service in the set of services C has a given associated capacity. The market is composed of N individuals, which are assumed to be heterogeneous and price elastic. In a revenue maximization context, we need to model competition so that individuals are not captive. To this end, we incorporate an opt-out option into the model to capture individuals leaving the market, either because they choose a competitor's service or because they do not choose anything at all. The opt-out option is denoted by i = 0 and is always available to all individuals. We assume that the price is the only planner's decision. We define p i ∈ [a i , b i ] as the price to be paid to access service i ∈ C. The expected revenue obtained from the services in C is calculated as Model 13 comprises the MILP formulation of the revenue maximization problem. Notice that the product of the price (continuous variable) and the choice (binary variable) can be linearized because bounds on the former are assumed. We introduce the variables η inr = p i w inr to capture the product of the two, with linearizing constraints (13o)-(13r). We denote byC = C ∪ {0} the set containing all services and the opt-out option. The utilities associated with the opt-out option for each individual and scenario are included in (13b), and since the opt-out option is always available, we add constraints (13h) to enforce the availability variables y inr to be equal to 1 when i = 0.

Experimental setting
We consider the case study on parking services in a park-and-ride context used in Pacheco Paneque et al. (2021). The choice model for parking choices (mixed logit model) is specified and estimated in Ibeas et al. (2014). They interview 197 individuals to model their preferences with respect to three parking services: paid on-street parking (PSP), paid parking in an underground car park (PUP) and free on-street parking (FSP). We assume that FSP represents the opt-out option because it does not provide any revenue to the parking manager. For the integration of the choice model in Model 13, we replace β in , ∀i ∈ C, n, with their estimates, and we compute d inr , ∀i ∈C, n, r, with the values of the other explanatory variables of the choice model in the data and the scenario ξ inr . We determine the capacity associated with PSP and PUP (the opt-out option, FSP, is assumed to have unlimited capacity) according to the number of individuals in the instance, so that it is appropriate for the sample under consideration but restrictive enough so that some users are forced to choose FSP because PSP and/or PUP become unavailable. For the sake of simplicity, we assume c PSP = c PUP , and denote it simply by c. We assume lower price bounds for PSP than PUP, i.e., p PSP ∈ [0.5, 0.65] and p PUP ∈ [0.7, 0.85].
The instances used in the upcoming sections are characterized from the available data by setting specific values for N, R and J = |C|. More precisely, for each configuration of these parameters, we generate various instances by randomly selecting N individuals from the whole dataset, together with the associated R scenarios for the J parking services. The configurations are labeled as NX_RY_JZ_cT, where X indicate the number of individuals, Y the number of scenarios, Z the number of services and T the capacity. After performing some tests on different settings for the subgradient method, we resolve to initialize the Lagrangian multipliers to 0, and we consider λ 0 = 0.5, θ = 5 and τ = 1.5. The code is implemented in C++ using ILOG Concert Technology to access CPLEX 12.8, and all the instances were run using 12 threads in a 3.33 GHz Intel Xeon X5680 server running a 64-bit Ubuntu 16.04.2.

Performance and scenario grouping strategies
In this experiment, we compare the results obtained with the decomposition method for a given time budget against the exact approach while evaluating the different grouping strategies. We consider two configurations: N50_R100_J3_c20 and N50_R200_J3_c20. As shown in Pacheco Paneque et al. (2021), very large values of R are not required to ensure stability of the obtained quantities (e.g., revenue, prices). Furthermore, we need to restrict the number of scenarios to be able to solve to optimality Model 13 with CPLEX. Nevertheless, the decomposition method can be applied for larger values of R. Note that in this case a larger time budget might be considered to ensure that the algorithm runs for a few iterations.
To increase statistical significance, we allow for 10 instances of each configuration (see Table 1 for computational details). Furthermore, we determine 10% and 25% of the average computational time (reported by CPLEX) as two time budget values to evaluate the performance of the algorithm at different stages of the search. Notice that the computational time of an iteration of the subgradient method is unknown before being executed. At every iteration, we compute an average computational time per iteration to avoid exceeding the time budget. The subgradient method is then terminated as soon as the expected computational time, i.e., the sum of the current computational time and the average time per iteration, is larger than the time budget. Given the size of the instances, we consider scenario groups that have approximately five scenarios each. We could consider a larger number of scenarios per group while still being able to solve the subproblems within the given time budget at the expense of performing a smaller number of iterations. In the case of random grouping, we define S such that the number of scenarios per group does not exceed five, i.e., S = ⌈R/5⌉. For the other strategies, we define the lower and upper bound on S as S = ⌈R/5⌉ − 2 and S = ⌈R/5⌉ + 2, respectively (see Section 4.1). For example, when R = 100, S ∈ {18, 19, 20, 21, 22}. Table 2 summarizes the computational results obtained for the four grouping strategies and the two time budget values. We report the average, lowest and highest values of the duality and optimality gaps over the 10 instances, which are calculated as follows: Furthermore, we include the results on the best feasible solution found by CPLEX for the given time budget. Note that in this case the duality gap is replaced by the gap reported by CPLEX, which is computed as the relative difference between the best integer solution and the solution associated with the best node of the branchand-bound tree. We observe that the average computational time for any grouping strategy is lower than the time budget because of the above-mentioned stopping criterion. The average number of scenario groups S is larger in SIM-D because more scenario groups are created in addition to the ones generated by SIM. Even though more subproblems are solved, each of them has in average less scenarios, which results in a lower average UB time per iteration. This has also an impact on the average LB time, as more price combinations are evaluated. In any case, the algorithm to generate feasible solutions is much faster compared to solving the Lagrangian subproblem. Notice that more iterations are performed for R = 200 than R = 100 within each time budget. This is due to the fact that the time budget values are obtained from the average computational time of the exact method, that does not increase linearly with respect to R (see Table 1). In the case of R = 200, we observe a decrease both in the average UB time and LB time per iteration from 10% to 25% time budget. These results indicate that the subproblems and the restricted MILP formulations are more rapidly solved in the long run. We also notice that the best LB is reached earlier than the best UB. This means that the method is able to find high-quality feasible solutions at a relatively early stage whereas the best UB gets refined throughout the iterations. Thus, as long as the duality gap is low, the subgradient method can be terminated after a small number of iterations without improvement of the best LB.
The optimality gaps are very low for any strategy (below 0.16% in all cases), and they decrease as the time budget increases. We notice that they are lower on average for R = 200 than R = 100 because of the larger time budget values. On the contrary, the optimality gaps with respect to the best feasible solutions provided by CPLEX are much higher and more dispersed. This is especially the case for R = 100 and a 10% time budget, where the optimality gap fluctuates between 2.35% and 8.78% with an average value equal to 5.77%. The duality gaps are larger and of the same order of magnitude across grouping strategies (they are below 2.31% in all cases). As expected, the average duality gaps decrease as the time budget increases by a relatively low margin. For instance, for N50_R200_J3_c20 and RAN, it decreases from 1.82% for a 10% time budget to 1.62% for a 25% time budget. This reveals the slow convergence behavior of the subgradient method.
Regarding the grouping strategies, we observe an impact on the obtained results but we do not identify a clear winner. This is statistically supported by the ANOVA test, that evaluates whether optimality and duality gap means are equal across strategies. This is indeed the case, and post hoc tests to evaluate these means for each pair of strategies show that there is no significant difference between them. In general, DIS (dissimilar) reports the largest optimality and duality gaps. As opposed to Crainic et al. (2014), the obtained results are not superior to RAN (random). They also show that SIM-D (similar without dissimilar scenarios) often leads to higher quality solutions at the expense of longer computational times. In our case, this strategy provides the smallest optimality gaps for R = 100 but not for R = 200. With respect to the duality gap, other strategies yield lower values. As analyzed in Section 5.4, this has to do with the fact that the subproblems have on average less scenarios, and despite the method is able to perform more iterations, it does not contribute to reducing the duality gap. RAN and SIM (similar) report reasonable duality gaps for both values of R. SIM reports a lower average optimality gap for the 10% time budget, whereas RAN reports lower values for the 25% time budget. Therefore, we cannot conclude that any of the refined grouping strategies outperforms random grouping. RAN is easier to implement and shows slightly faster solution times with relatively lower duality gaps.

Size of the scenario groups
To analyze the trade-off between the size of the scenario groups and the performance of the method, we consider various values of S and run the configuration N50_R200_J3_c20 (5 instances) for a 2-hour time budget. Table 3 includes the obtained results for groups with 1-5, 10, 15, 20 and 25 scenarios each. As expected, the larger the number of scenarios per group, the larger the average UB time per iteration, resulting in a lower average number of iterations within the time budget.
Concerning the optimality gap, we notice an increasing trend in the obtained values as S decreases. Indeed, as the number of scenarios per group increases, the number of subproblems to solve decreases, and thus less price combinations are evaluated. Nevertheless, the average optimality gaps remain below 0.06% in all cases. The decrease in S has a more noticeable impact on the duality gaps, which decrease as S decreases. Figure 1 shows for each instance the evolution of the best UB obtained with respect to the number of scenarios per group. Indeed, we observe a decrease in their values as the number of scenarios per group increases.
In the case of S = 200 (1 scenario per group), even though the method performs on average 101.2 iterations, it can only reach an average duality gap of 3.67%. For larger groups (i.e., 15, 20 and 25 scenarios per group), the algorithm runs for a very limited number of iterations, but the resulting duality gaps are below 1.02% in average. We observe that despite the lower number of iterations and combinations for the generation of feasible solutions, it is beneficial to define subproblems with a large number of scenarios per group as long as they can be efficiently solved within the considered time budget. Notice that we might need to consider groups of lower size in other settings that involve more individuals, more scenarios, a lower time budget, etc.

Impact of the number of alternatives
We now evaluate the performance of the method when the problem is scaled up by increasing the number of services J. To do so, we segment the prices associated with PSP and PUP by residency, which is one of the explanatory variables of the choice model. This means that different prices are proposed to the residents in the study area and to non-residents. It can be seen as extendingC from J = 3 to J = 5 services where residents have only access to PSP and PUP associated with the resident price (and the opt-out option FSP) and analogously for non-residents. Since non-residents are willing to pay larger prices than residents, we modify the price bounds as follows: p res PSP , p non-res PSP ∈ [0.5, 0.75] and p res PUP , p non-res PUP ∈ [0.75, 1.0]. Table 4 shows the results for three configurations (5 instances) for a 2-hour time budget. Due to the complexity induced by the increase in J, solely the MILP formulation for N50_R25_J5_c20 could be solved to optimality in a reasonable time (18.72 h on average). We therefore consider 3 scenarios per group in the decomposition method. The average UB time per iteration is much larger for J = 5. For instance, it increases from 7.8 min for N50_R100_J3_c20 to 34.4 min for N50_R100_J5_c20. The average LB time per iteration is also higher but remains below 0.3 s for all configurations. The optimality gaps are larger than the ones obtained for J = 3 in Section 5.3 but they remain below 0.5%. As expected, the duality gaps are also larger and they increase as R increases for the given time budget.

Impact of the number of individuals
In this experiment we test the method on configurations with larger values of N, i.e., N ∈ {50, 100, 150, 197}, with capacities c ∈ {20, 40, 60, 80}, respectively. Table 5 shows the results for these four configurations (5 instances) with R = 100 and J = 3 for a 2-hour time budget. We consider scenario groups of 2 scenarios each. The average UB time per iteration grows exponentially. In the case of N = 197, it even exceeds the 2-hour time budget (the first iteration is always performed). The average number of iterations follows the opposite trend, as it decreases in average from 98.2 iterations for N = 50 to a single iteration for N = 197. Interestingly, we observe that the duality gaps decrease as N increases. The grouping strategies are implemented as a preprocessing step of the subgradient method, which iteratively generates upper bounds (by solving the Lagrangian subproblem) and lower bounds (by generating feasible solutions) to the optimization problem.
To experimentally test the Lagrangian decomposition scheme, we characterize a revenue maximization problem for a case study on parking choices. We show that near-optimal solutions can be obtained for all scenario grouping strategies and various values of R in a much lower computational time in comparison with the exact method. Such solutions are usually generated at an early stage of the subgradient method, which allows to promptly terminate the algorithm if the best feasible solution has not improved after a certain number of consecutive iterations (provided that the duality gap is below an acceptable threshold). We cannot identify a scenario grouping strategy that outperforms the others, so we resolve to rely on the random partition for the performed experiments. As long as the Lagrangian subproblems are computationally manageable, a large number of scenarios per group is recommended, as it leads to smaller duality gaps for a given computational time. We also observe low duality gaps for configurations with a larger number of alternatives and individuals.
In conclusion, Lagrangian decomposition provides a relevant scheme to address the tractability of choice-based optimization problems. Additionally, the proposed method could be combined with other decomposition techniques according to the model requirements. For instance, if the capacity of the alternatives is included as a decision variable (that is formulated with a reduced set of binary variables), Benders decomposition could be as well explored. Furthermore, parallelization routines that allow to solve multiple subproblems simultaneously could be implemented within the subgradient method in order to reduce the total computational time. These approaches allow to further enhance and expand the applicability of the method to the application areas discussed in Section 1, such as facility location, revenue management and transportation-related problems.