Branch and Price for Submodular Bin Packing

The Submodular Bin Packing (SMBP) problem asks for packing unsplittable items into a minimal number of bins for which the capacity utilization function is submodular. SMBP is equivalent to chance-constrained and robust bin packing problems under various conditions. SMBP is a hard binary nonlinear programming optimization problem. In this paper, we propose a branch-and-price algorithm to solve this problem. The resulting price subproblems are submodular knapsack problems, and we propose a tailored exact branch-and-cut algorithm based on a piece-wise linear relaxation to solve them. To speed up column generation, we develop a hybrid pricing strategy to replace the exact pricing algorithm with a fast pricing heuristic. We test our algorithms on instances generated as suggested in the literature. The computational results show the efficiency of our branch-and-price algorithm and the proposed pricing techniques.


Introduction
Bin packing (BP) is an important combinatorial optimization problem with applications in various fields, including call centers, healthcare, container shipping, and cloud computing.
These applications are typically modeled as BP problems that aim to pack items into a minimum number of bins, with a capacity constraint on each bin. Submodular bin packing (SMBP) is a nonlinear variant of the classical BP. The SMBP differs from the classical BP in the expression of the capacity utilization functions. The capacity utilization function in the classical BP is linear with respect to the items, while the capacity utilization function in SMBP is a linear term plus a square root term, and this function is submodular with respect to the set of items (see Atamtürk and Narayanan (2008)).
In many practical applications of BPs, item sizes are not revealed before the allocation decision is made. Therefore, uncertainty must be considered in the modeling. We will show that the SMBP formulates BPs with uncertainty under various conditions. Recently, Cohen et al. (2019) study the resource allocation problem for cloud services. The authors model this problem as an SMBP problem and solve it using approximation algorithms. In this paper, we study exact algorithms to solve the SMBP via the Dantzig-Wolfe decomposition approach and exact algorithms to solve the pricing submodular knapsack problems.
Heterogeneous optimization approaches have been used in the literature to model BPs under uncertainty, including stochastic optimization, chance-constrained optimization, robust optimization, and nonlinear optimization models. The SMBP is a nonlinear optimization model with connections to the other three optimization models. In the stochastic optimization model, the objective function is an expected value function. The stochastic BP (SBP) models item sizes as random variables. If capacity constraint violations can be accounted for as a penalty cost, the optimization objective is to minimize the number of utilized bins plus the expected penalty cost (Denton et al. (2010)). In SBP, the two costs must be weighted in a balanced manner to form a single objective. The chance-constrained and robust optimization approaches model uncertainty in the problem constraints directly, so capacity constraints are not violated in these models. The chance constraint is a wellknown tool for modeling constraints on random variables (Charnes and Cooper (1963)).
The BP with chance constraints (BPCC) (Shylo et al. (2013)) considers the BP problem, where the item sizes follow a multivariate distribution and the items in each bin must satisfy the capacity constraint with a certain probability. Robust optimization, in particular distributionally robust optimization, considers the worst case of chance constraints within a given family of distributions (see Ghaoui et al. (2003)). More generally than BPCC, distributionally robust BP (DRBP) allows item sizes to belong to a family of distributions with common properties. Capacity constraints are robustly satisfied with respect to the entire family of distributions (Zhang et al. (2020), Cohen et al. (2019)).
The sample average approximation method (SAA) is a common approach to solve stochastic, chance-constrained and robust optimization problems (Luedtke and Ahmed (2008), Bertsimas et al. (2018)). It approximates the problem as a two-or multi-stage Mixed-Integer Linear Programming (MILP) problem and computes approximate solutions that converge to an optimal solution in a probabilistic sense. The scalability and accuracy of this approach depend on the number of samples. Some BPCC and DRBP problems are actually equivalent to SMBPs under various conditions, so these problems can be solved using deterministic methods.
In Cohen et al. (2019) it is shown that BPCC is equivalent to SMBP when the item sizes follow independent Gaussian distributions. Furthermore, if the distributions of the item sizes have the same mean values and the same diagonal covariance matrix, then DRBP is equivalent to SMBP. If only the first two moments of the distribution are known, then the BPCC can be reformulated as an SMBP problem according to Zhang et al. (2018). The SMBP is also valuable from a practical point of view, as it provides an upper bound for general independent distributions over bounded intervals (note that the SMBP becomes a relaxation, see Cohen et al. (2019)).
The SMBP is a binary nonlinear optimization problem due to the nonlinear submodular capacity utilization function. Zhang et al. (2018) show that any SMBP can be reformulated as a Binary Second-Order Conic Programming (BSOCP) problem, and the reformulation can be strengthened by the extended polymatroid inequalities of Atamtürk and Narayanan (2008). The authors use a branch-and-cut algorithm to solve the SMBP. As the classical BPs, it is challenging to scale the branch-and-cut algorithm for SMBPs.
The Dantzig-Wolfe (DW) decomposition is a well-known approach to solving large classical BPs: a BP is reformulated into a set cover formulation based on enumerating all feasible packing patterns; then its continuous relaxation is solved using a column generation approach (Gilmore and Gomory (1961)).The branch-and-price algorithm integrates column generation with the branch-and-bound algorithm and is the state-of-the-art exact method for solving classical BPs (Wei et al. (2020b), Delorme et al. (2016)). To solve the SMBP efficiently, we use the DW decomposition and develop a branch-and-price algorithm. In our DW decomposition of the SMBP, we use the classical set cover reformulation for the classical BPs (Gilmore and Gomory (1961)), but with nonlinear pricing problems.
These are submodular knapsack problems with linear objective functions, but submodular capacity utilisation functions.
As with the classical BPs, solving the pricing problems involves the most computational effort of the branch-and-price solvers, so many efforts have been made in the past to develop efficient algorithms (Sadykov and Vanderbeck (2013), Wei et al. (2020a)). The pricing problem is more difficult for SMBPs because there is still no pseudopolynomial algorithm (such as dynamic programming) to solve the submodular knapsack problems. Therefore, solving pricing problems is crucial to the performance of our branch-andprice algorithm. There are several papers in the literature on exact algorithms for variants of the classical knapsack problem (Cacchiani et al. (2022)), e.g., the quadratic knapsack (Caprara et al. (1999), Furini and Traversi (2019)), the multidimensional knapsack (Puchinger et al. (2010)), the quadratic multi-knapsack (Bergman (2019), Olivier et al. (2021)). The quadratic knapsack has a nonlinear objective function, while the submodular knapsack has a nonlinear submodular function in the constraint. As far as we know, there is no tailored exact algorithmic framework and implementation for the submodular knapsack problem either. Moreover, the submodular knapsack problem is important in its own as it models the chance-constrained knapsack problem (Goyal and Ravi (2010)).
We propose a non-convex Mixed-Binary Quadratically Constrained Programming (MBQCP) formulation for the submodular knapsack problem. Based on this formulation, we construct the Piece-Wise Linear (PWL) relaxation and combine the relaxation with cutting planes to form an exact PWL relaxation-based branch-and-cut (PWL-B&C) algorithm.
PWL functions have been used to approximate or relax non-convex Mixed-Integer Nonlinear Programming (MINLP) problems (Geißler et al. (2012)). A PWL function is linear in each partition of its domain and can be modeled by a MILP formulation (Vielma et al. (2010)). In our experiments, the PWL-B&C algorithm is faster than the commercial solver CPLEX and it optimally solves most pricing problems where CPLEX still has a large dual gap. To further speed up the PWL-B&C algorithm, we also investigate adaptive PWL relaxation in our experiments.
We propose several strategies to accelerate the convergence of the branch-and-price algorithm, i.e., improve primal and dual bounds. Wei et al. (2020b) and Gleixner et al. (2020) incorporate the Farley bound (see Farley (1990), Vance et al. (1994)) into their algorithms to obtain an early valid dual bound before the termination of the column generation procedure. The formula for the Farley bound imposes a condition on whether an exact pricing algorithm can improve the current dual bound. If the condition is not satisfied, we do not need any exact pricing algorithm, but can use a fast pricing heuristic. Therefore, a hybrid pricing strategy is used to speed up column generation in our branch-and-price algorithms. We also adapt a column selection heuristic from Lübbecke and Puchert (2012).
This heuristic checks whether each new column can be combined with other generated columns to form a feasible solution. There are few publicly available instances for the SMBP problem. Cohen et al. (2019) test their approximation algorithms on real instances from data centers that are not available to the public due to confidentiality constraints.
To compare the proposed algorithms, we generate instances according to their description.
Finally, by combining the proposed techniques, our branch-and-price algorithm solves more instances and closes more dual gaps than CPLEX.

Contribution
In summary, our contribution in this paper is threefold. First, we apply the DW decomposition for the SMBP and develop several techniques for the branch-and-price algorithm (including primal heuristics and the hybrid pricing strategy). Second, for the submodular knapsack problem, we propose the MBQCP formulation and PWL relaxation and compare them with the existing formulation and relaxation. We then develop an adapted PWL-B&C algorithm. Finally, we perform computational experiments on a large number of instances to evaluate the proposed algorithms. The source code and benchmark are released on our project website https://github.com/lidingxu/cbp.

Literature Review
This work refers to different areas of literature. We can mention: (i) solution methods for SBPs, BPCCs and DRBPs; (ii) applications of branch-and-price algorithms for MINLPs; (iii) primal heuristics specialized for column generation; (iv) existing approximations and polyhedral results for the submodular knapsack problem; (v) the outer approximation and the PWL approximation.
The surgery planning problem (also known as operating room planning problem) is a common application of the SBP in healthcare, where the surgery duration (item size) is assumed to be stochastic. A number of works (Denton et al. (2010), Batun et al. (2011)) model stochastic surgery planning as a stochastic two-stage mixed-integer programming problem: the objective is to minimize the fixed cost and the expected penalty for overtime.
In some works (Cardoen et al. (2010), Deng et al. (2019)), only the expected penalty is considered in the models. Shylo et al. (2013) appear to be the first to consider BPCC in surgery planning. Assuming that the operation duration follows a multivariate normal distribution, the authors reformulate the problem as a semidefinite program.
Decomposition methods are already used together with the SSA method to solve SBPs/BPCCs, focusing either on two/multi-level structures of stochastic programs or on BP problem structure. Denton et al. (2010) and Batun et al. (2011) use Bender decomposition to solve SBPs. Zhang et al. (2020) apply DW decomposition to scenario-based subproblems of BPCC (obtained using the SAA method) and solve the subproblems using the branch-and-price algorithm. In addition, Zhang et al. (2020) consider the DRBP where the distributions of item sizes are ambiguous, i.e., the family of distributions is unknown or at best partially known. The authors approximate the problem by a MILP and solve it using the branch-and-price algorithm. As far as we know, the DW decomposition is still applied to MILPs coming from the approximation or relaxation of BPs with uncertainty, but not yet for an exact nonlinear programming model.
Recently, DW decomposition and the branch-and-price algorithm have been used to solve MINLPs (see Allman and Zhang (2021)), such as recursive circle packing (RCP) problems ), binary quadratic problems (Ceselli et al. (2022)), and facility location with general nonlinear facility cost functions (Ni et al. (2021)). There may be several ways to divide a MINLP into master and subproblems, so that a MINLP may admit different DW decompositions. Ceselli et al. (2022) study the strengths of different DW decompositions for binary quadratic problems. In most cases, after applying the DW decomposition to the compact MINLP formulation, the master problem is a MILP and the pricing problems are MINLPs. Allman and Zhang (2021) directly use a commercial MINLP solver, i.e., BARON (Tawarmalani and Sahinidis (2005)), to solve the pricing problems. In our experiments, we also try to solve the pricing problems using a commercial solver, i.e., CPLEX.
However, commercial solvers may not explore the structures of MINLPs. Since pricing problems can be solved in thousands of iterations, Gleixner et al. (2020) shows that any improvement in the pricing algorithm can speed up the convergence of column generation.
There are several ways to improve the performance of the branch-and-price algorithm, such as fast primal heuristics during column generation and fast pricing algorithms.
Primal heuristics improve the primal bounds of the branch-and-price algorithms. Joncour et al. (2010) propose a constructive approach to create feasible solutions from scratch (namely, column selection) using only knowledge of previously generated columns.
Column selection heuristics include greedy or relaxation-based approaches. The greedy heuristic is so fast that it can be invoked during column generation. Column selection heuristics are implemented in the generic branch-and-price solver GCG (Gamrath and Lübbecke (2010), Lübbecke and Puchert (2012)). Our implementation is similar to the greedy column selection approach, but we enforce the last column in the feasible solution.
The submodular knapsack problem is studied in several ways, we refer to Goyal and Ravi (2010) for approximation algorithms and Atamtürk and Narayanan (2009) for polyhedral analysis. The submodular knapsack problem can be reformulated as a Binary Second-Order Conic Programming (BSOCP) problem (Atamtürk and Narayanan (2008)), which can be solved by general-purpose solvers. Most solvers, such as CPLEX (Bliek et al. (2014)) and SCIP (Berthold et al. (2012)), implement the LP outer approximation-based branch-andcut LP-B&C) algorithm (Coey et al. (2020)) to solve the BSOCP or general Mixed-Integer Second-Order Conic Programming (MISOCP) problems. The LP outer approximation is sometimes referred to as the polyhedral outer approximation. Ben-Tal and Nemirovski spective reformulations/cuts to strengthen the resulting convex MINLP relaxation. In this study, the original formulation for the submodular knapsack problem is a convex MINLP and its nonlinear function includes all problem variables, which is difficult to approximate in high-dimensional problems. We reformulate this formulation into a nonconvex MINLP.
This nonconvex MINLP has a concave quadratic constraint, where the only nonlinear term is a univariate quadratic function on a slack variable. We relax the quadratic function into a PWL function (Vielma et al. (2010)), and obtain a MILP relaxation. The resulting relaxation leads to an algorithm with better performance to CPLEX. Our approach is counter-intuitive because it is generally assumed that convex MINLP formulations perform better than non-convex MINLP formulations. In our case, we will show that the quadratic function can be approximated in a "dimension-free" way, since the nonlinearity is concentrated on a single variable.

Outline
This paper is organized as follows. In Section 2, we describe the SMBP and give its BSOCP and set cover formulations. In Section 3, we introduce the key components of our branchand-price algorithm: the branching rule, column generation, dual bound computation, initial columns, and primal heuristic. In Section 4, focusing on solving the pricing problem, we introduce: the pricing heuristic, reformulations of the pricing problem, PWL relaxation, the exact pricing algorithm, and the hybrid pricing strategy. In Section 5, we show the computational results of the proposed algorithms for instances generated from the literature and analyze their performance. In Section 6, we end this paper with a conclusion and future research directions.

Problem Description and Formulations
In this section, we present existing formulations for the SMBP and propose a new formulation. The SMBP problem considers a finite set N of items and a finite set M of potential bins, each of which has identical capacity. The problem is to (i) determine a minimum number of bins to pack all items; (ii) allocate items to bins such that all submodular capacity constraints are satisfied.

Compact Formulations
We begin by describing two compact formulations for the SMBP. To facilitate the description, we first define the following notation:

Submodular capacity usage function
Given a ground set N , a set function f : 2 N → R is submodular, if Every subset of N can be indicated by a binary vector x ∈ {0, 1} N . The submodular capacity usage function f in the SMBP is defined as follows (Cohen et al. (2019)): where a i , b i ∈ R + (i ∈ N ) and σ ∈ R + . The function f becomes a linear function by setting σ = 0.
The following variables are used to model the SMBP: Decision variables Using the above notation, the SMBP has the following compact and non-convex MINLP formulation: Regarding the continuous relaxation of the above formulation, constraint (2b) is nonconvex, so formulation (2) is a non-convex MINLP. Zhang et al. (2018) gives a convex MINLP, more precisely, BSOCP reformulation for (2) by replacing (2b) with an equivalent constraint: for which the continuous relaxation is representable by second-order cones and thus convex. For any feasible solution v ij ∈ {0, 1} we have v 2 ij = v ij , so the constraints (3) and (2b) are equivalent.
Then, the compact and convex BSOCP formulation is The BSOCP formulation can be solved with off-the-shelf solvers, such as CPLEX 1 (Bonami and Tramontani (2015)) and SCIP (Gamrath et al. (2020)). When σ = 0, the BSOCP formulation becomes the conventional compact formulation for the linear BP.
Indeed, CPLEX and SCIP solve the BSOCP problem in a non-compact way, since they use the LP-B&C algorithm, which linearizes the BSOCP problem into a MILP model with numerous cutting planes. As in the linear BP case, we find in our experiments (Section 5) that the compact BSOCP formulation is not scalable for large models.

Set Cover Formulation
In this section, we propose a new set cover formulation for the SMBP. The formulation is derived in a similar way as the DW decomposition of the classical linear BP (Delorme et al. (2016)). This formulation can be solved efficiently by a branch-and-price algorithm.
A column p is defined by a binary vector as (d 1p , d 2p , . . . , d np ), where d ip = 1 if item i is contained in the column p. A column is called feasible if the combination of its items can fit into a bin, i.e., satisfies the submodular capacity constraint (2b). The set cover formulation is based on enumerating all feasible columns, the number of which can be exponential to the number of items. Next, we define the following notation:

Set notation
• P: the set of all feasible columns.

Decision variables
• λ p =    1, if column p is used by the solution 0, otherwise for p ∈ P.
We obtain the following set cover formulation for the SMBP: The set cover constraint (5b) specifies that each item i (i ∈ N ) is contained in at least one bin. As far as we know, the set cover formulation for the SMBP has not yet been proposed and solved in the literature.
The two compact formulations (2) and (4) are MINLPs, but the set cover formulation (5) is a MILP. Moreover, the number of nonlinear constraints in the compact formulations is equal to the number of potential bins. The nonlinearity of the set cover formulation is de facto 'hidden' in the pricing subproblems (Section 3.2), and each pricing subproblem has only one nonlinear constraint.
The following proposition shows that the set cover formulation is stronger than the BSOCP formulation: Proposition 1. The linear relaxation of the set cover formulation is tighter than the continuous SOCP relaxation of the BSOCP formulation.
The proof can be found in the Appendix A.

Branch and Price
It is challenging to solve the set cover formulation with an exponential number of binary variables. In this section, we present an exact branch-and-price algorithm to solve the set cover formulation for the SMBP. The branch-and-price algorithm integrates column generation with the branch-and-bound algorithm to efficiently solve the LP relaxation. In the following subsections, we describe the important steps of our branch-and-price algorithm: the branching rule, column generation, initial columns, dual bound computation, and primal heuristics.

Branching Rule
Our branch-and-price algorithm uses the Ryan/Foster branching rule (Foster and Ryan (1976)). The branching rule selects a pair of items i 1 ∈ N and i 2 ∈ N that must either be packed together or not packed together.
We denote by • S: the set of item pairs that are forced to be packed together such that, if a column p respects S, then for i 1 , i 2 ∈ S, d i 1 p = d i 2 p ; • D: the set of item pairs that are not allowed to be packed together such that, if a column p respects D, then for Indeed, (S, D) exactly describes the branching decisions made for each node of the search tree. We denote by the set of feasible columns respecting branching constraints induced by (S, D). We refer to P S,D as the (S, D)-feasible columns.
At each node of the search tree, the set cover problem (5) is restricted to the branching decision set (S, D), i.e., it follows as The above problem (6) is called the master problem, and its LP relaxation is called the master LP problem.

Column Generation
We present a column generation method to solve the master LP problem.
The column generation procedure starts with a subset of (S, D)-feasible columns of the master LP problem, adds columns, and solves the restricted LP iteratively. Given a subset P ′ S,D of P S,D , the corresponding restricted LP problem, namely the Restricted Master LP (RMLP) problem, is After solving the RMLP, let π i be the dual variable associated with the i-th constraint (7b). The reduced cost for a column p ∈ P S,D is r p := 1 − i∈N π i d ip . If there is a column p ∈ P S,D \ P ′ S,D whose reduced cost r p is negative, then adding p to P ′ S,D could reduce the objective value of the RMLP. Otherwise, the solution for the RMLP is also optimal for the master LP problem. The column with the most negative reduced cost is determined by solving a pricing problem.
Before the column generation procedure is applied to the current node, the items that can only be packed together are combined into the set S using a preprocessing process. Let the new item set be N ′ , a ′ , b ′ be the merged parameters, and the new conflict relation be D ′ . Preprocessing leads to a smaller pricing problem, which can be formulated as follows: submodular knapsack problem with conflicts: If the optimal value i∈N ′ π ′ i x i > 1, then the corresponding column has negative reduced cost and is added to the RMLP. Otherwise, the solution of the RMLP is an optimal solution of the master LP, and the current node is solved. The details of the pricing algorithms can be found in Section 4.

Initial Columns
We initialize the branch-and-price algorithm with a set P ′ of feasible columns. These feasible columns are computed by an approximation algorithm that also provides an upper bound on the number of feasible bins and a warm start for the compact formulation (5).
The approximate solution guarantees that the number is at most equal to a fixed ratio to the optimal number of bins.
The approximation algorithm, namely the greedy min-utilization algorithm, greedily allocates items to bins so that the capacity is used as little as possible.
The algorithm manages the following sets: • L: a list of existing bins, which is initially empty; • Γ: a set of unpacked items, which is initially N . For each existing bin p ∈ L, its load is expressed by a binary array (d 1p , d 2p , . . . , d np ). If , then the item i is packed by p. Therefore, we also treat p as a subset of items in N .
The algorithm updates the load of the existing bins and adds new bins one by one. At each iteration, the algorithm goes through the following steps: 1. for each existing bin p ∈ L, computes the sum of a and b of packed items respectively, i.e., A p := i∈N d ip a i and B p := i∈N d ip a i ; 2. for each existing bin p ∈ L and each unpacked item i ∈ Γ, if the bin p can accommodate the item i, then computes the incremental capacity usage γ i (p) : 3. for each unpacked item i ∈ Γ, let p i := arg min p∈L γ i (p) be the bin with the minimum increment capacity usage with respect to i; 4. among the unpacked items Γ, let i ⋆ := arg min i∈Γ γ i (p i ) be the item that has the least increment capacity usage; 5. adds the item i ⋆ to the corresponding bin p i * by setting d i ⋆ p i * = 1, and removes i ⋆ from Γ, if Γ = ∅, exits; 6. if step 4 fails, i.e., existing bins cannot accommodate any unpacked item, adds a new empty bin p ′ to L, sets d ip ′ = 0 for i ∈ N , and goes to step 1.
The greedy min-utilization algorithm can find a solution that uses at most 8 3 times the number of bins of the optimal solution.

Dual Bound Computation
For an optimization problem, a dual bound certifies the optimality of a solution. In the branch-and-price setting, a local dual bound at each node of the search tree is a lower bound on the optimum of the master problem (6). The local dual bound is used by the algorithm to fathom the node or select branch nodes.
The optimum of the master LP problem is a local dual bound. However, to converge to this optimum, the column generation procedure usually needs to solve a large number of pricing problems. At each iteration of the column generation procedure, another local dual bound is available. This bound is referred to in the literature as Farley bound. The following lemma illustrates how this bound can be computed.
Lemma 1 ( Farley (1990), Vance et al. (1994)). Let v MP be the optimum of the master LP, let v RMLP be the optimum of the RMLP, let v price be a dual bound for the pricing problem (8), and let v F := v RMLP v price be the Farley bound. Then, v F ≤ v MP , and thus v F is a local dual bound.
The computation of the Farley bound requires a dual bound on the pricing problem, obtained using an exact pricing algorithm. The branch-and-price algorithm holds a local lower bound v ld at each node of the search tree. After solving each pricing problem, the branch-and-price algorithm updates v ld according to the following rule:

Primal Heuristic
For some difficult SMBP instances, the branch-and-price algorithm may be too computationally expensive, since in the worst case an exponential number of columns could be generated. In this case, the root node RMLP would not converge to an optimum in a reasonable time.
On the other hand, each generated column could be included in a feasible primal solution, Therefore, a tailored heuristic is needed to help find primal solutions as soon as a new column is generated.
We propose a primal column selection heuristic similar to the greedy column selection heuristic in Joncour et al. (2010), Lübbecke and Puchert (2012). The heuristic is invoked as soon as a column is created.
Unlike the random rounding heuristic, this heuristic uses the set cover structure in (5).
We consider the case of the root node, the other cases are similar. Similar to the greedy min-utilization algorithm, the heuristic works on the following sets: • L: a list of existing bins, which is initially empty; • Γ: a set of unpacked items, which is initially N . Let P ′ be the set of generated columns. When a column p is generated, the heuristic has the following steps: 1. adds p to L, and, for each i with d ip = 1, removes i from Γ; 2. finds a generated column p ′ ∈ P ′ such that it packs a maximal number of items in Γ, and adds p ′ to L; 3. if all items are packed, i.e., Γ = ∅, exits and outputs the solution L, otherwise goes to the next step; 4. if still some items are not packed and none of generated columns contains them, exits, otherwise, goes to step 2.
Our column selection heuristic does not require the LP information, but the information of the generated columns. It also differs from the classical column selection because classical column selection does not force a column p in the solution.
The remainder of this paper is devoted to efficient methods for solving pricing subproblems, since pricing requires the most computational effort in column generation.

Solving the Pricing Problem
In this section we present solution methods for the pricing problem. The proposed algorithms can be implemented as a stand-alone solver for the submodular knapsack problem.
We first present a fast pricing heuristic. We then present two formulations of the submodular knapsack problem (with conflicts): a convex BSOCP formulation and a non-convex MBQCP formulation. The convex BSOCP formulation is solved in our experiments for a comparative study. The PWL method is a way to approximate nonlinear functions (or relax under some conditions) by linear functions in its subdomain. We derive a PWL relaxation of the MBQCP formulation and develop an exact PWL-based branch-and-cut algorithm (PWL-B&C) for the pricing problem.
To speed up column generation, we also present a hybrid pricing strategy that can replace the exact pricing algorithm with a fast pricing heuristic.

Pricing Heuristic
We propose a fast heuristic, the fixing-greedy heuristic. This heuristic is used by the hybrid pricing strategy to speed up the column generation procedure.
The fixing-greedy heuristic is based on the best-fit-greedy algorithm. The best-fit-greedy algorithm adds an item per iteration only if it does not conflict with the previously added items, as long as the capacity is not exceeded. The heuristic keeps • ∆: the set of items added to the bin, which is initially empty. At each iteration, the best-fit greedy heuristic has the following steps: 1. computes the sum of a ′ i and the sum of b ′ i of added items, i.e., A := i∈∆ a ′ i and , and the profit-over-usage ratio r i := π ′ i γ i ; 5. adds the unadded item with the maximum r i into ∆.
The fixing-greedy heuristic enforces, for each time, an item in N ′ to be in the solution, runs the best-fit greedy algorithm, and outputs the best solution.

BSOCP Formulation
The Binary Second-Order Conic Programming formulation for the pricing problem (8) is similar to the BSOCP formulation for the SMBP in (2).
The BSOCP formulation is: Where (9b) can be represented by second-order conic constraints. The BSOCP formulation is a convex MINLP formulation.
In this section, we analyze the polyhedral outer approximation of the BSOCP formulation and show that a finite number of cutting planes is sufficient to define an exact MILP reformulation of the BSOCP formulation.
To simplify the presentation, we use the following notation: • the left-hand side of (9b): • the binary set defined by (9b): • the continuous relaxation of C: Since f is convex, C is convex. We also note that the convex hull of C is a polytope.
A set O is said to be a polyhedral outer approximation of C, if O is a polyhedron and

C ⊂ O.
A polyhedral outer approximation can be constructed as follows. Define a linearization The following theorem indicates that it suffices to describe C with a finite number of valid inequalities and binary constraints.
Theorem 1. Given a pointx ∈ {0, 1} N ′ , the following inequality is valid for C and C: Moreover, Proof. Since function f is submodular and hence convex, it follows that Moreover, where the last equation follows from the fact thatx is binary.
Therefore, inequality (10) in the statement is valid for C.
The left hand side of inequality (10) evaluated atx is i∈N ′ a ′ ix i + σ i∈N ′ b ′ ix i which is by hypothesis is at least c, sox violates the inequality.

Let us consider
If x * ∈ C, since O is a polyhedral outer approximation of C, x * must be in O. Therefore, Let us define the generating set of the cuts in Theorem 1 by and the following cut coefficient set generated by X The BSOCP formulation is equivalent to the following MILP formulation However, X (and hence Θ) is unknown before the search space is explored, and its cardinality may be exponential. In practice, the cuts corresponding to Θ can only be separated lazily, i.e., a cut is added until a pointx is found in X . This finite family of cuts cannot be used by an off-the-shelf solver, but it is a key component for constructing our PWL-B&C algorithm in Section 4.5.
The following lemma explains the approximation error of the polyhedral outer approxi- Lemma 2 (Ben-Tal and Nemirovski (2001)). Let ǫ > 0, then there exists a method to construct a polyhedral outer approximation O of C with additional O(1)|N ′ | log( 1 ǫ ) variables and constraints, such that the relative ℓ ∞ approximation error Note that the approximation error of the polyhedral outer approximation depends on the number of variables.

MBQCP Formulation
We present a non-convex Mixed Binary Quadratically Constrained Programming formulation for the submodular knapsack problem (with conflicts). Although we do not use this formulation to solve the price subproblems, this formulation inspires PWL relaxation and the PWL-B&C algorithm. Here, we introduce a slack variable w to define the sum i∈N ′ a ′ i x i . Then our MBQCP formulation becomes the following non-convex MINLP program: Although the program contains a concave quadratic constraint (14c), the nonlinearity is only a univariate quadratic function compared to the |N ′ |-dimensional nonlinear SOC function f in (9b).

PWL Relaxation
Note thatq B is an over-estimator of q due to the convexity of q.
We call B a breakpoint set in [w, w], andq B its induced PWL function. Note that we consider the two bounds w and w as breakpoints here. Assume that we are given the breakpoints B. Replacing σ 2 (14c), we obtain the following PWL relaxation of the MBQCP formulation (14): Graphs of the quadratic and its PWL over-estimator Modeling PWL functions The graphs of PWL functions have several MILP formulations, see Vielma et al. (2010). In this paper we consider the logarithmic model. We denote by the breakpoint selection problem aims to find a set B ∈ B h to minimize the ℓ p error: Geißler et al. (2012) use a convex program to compute the ℓ ∞ -approximation error for general nonconvex functions. Berjón et al. (2015) develop an error analysis that yields asymptotically tight bounds to quantify the ℓ 2 -approximation error.
An optimal solution to the breakpoint selection problem under the ℓ ∞ -approximation error is an equidistant partition of [w, w].
The proof can be found in the Appendix B. The approximation error decreases with the quadratic rate with respect to h. The relative ℓ ∞ -approximation error is defined as We have the following result on the relative approximation error of the PWL relaxation.
Corollary 1. Let ǫ > 0, then there exists a MILP formulation of PWL functionq B induced by B with O(1) log( 1 ǫ ) binary variables and O(1) 1 √ ǫ continuous variables and constraints, such that the relative ℓ ∞ -approximation error is at most ǫ.
Next, we summarize the approximation errors of all the proposed relaxations to their corresponding formulations. Note that we do not consider the integrality of the binary variable x ′ . Comparing Lemma 2 and Corollary 1, the approximation error of the PWL relaxation (15) to the MBQCP formulation (14) is independent of the number of variables, while the approximation error of the polyhedral outer approximation to the BSOCP formulation (9) depends on this number.
For a large (possibly exponential) number of breakpoints, the PWL relaxation can also be transformed into an exact reformulation of the MBQCP formulation.
Then, the PWL relaxation (15) derived from B * is an exact reformulation of the MBQCP formulation (15).
Proof. It suffices to prove that given a binary point (x, w) satisfying the PWL relaxation, it is also feasible for the MBQCP formulation.
Since (x, w) satisfies the constraint (15c) in the PWL relaxation, we have σ 2 i∈N ′ b ′ i x i ≤ q B * (w) = q(w). Hence, (x, w) satisfies the corresponding constraint (14c) in the MBQCP formulation.

Exact PWL-B&C Algorithm
The approximation error of the PWL relaxation is dimensionless, but only for a small number of breakpoints it is not exact. Instead of adding many breakpoints, the finite number of cuts induced by the set Θ in (12) suffices to make the PWL relaxation exact.
We propose a combined formulation and a branch-and-cut algorithm based on the PWL relaxation (PWL-B&C) to solve it.
We show in experiments that this formulation with redundant constraints (17b) and (17c) can be solved much faster than the standard BSOCP formulation (9). In practice, only a few cuts in (17d) are required to separate before the convergence.
Our algorithm consists of three main steps: tightening the bounds, constructing the PWL relaxation (breakpoints), and the PWL B&C algorithm. First, bound tightening is a preprocessing procedure used to tighten the bounds on the breakpoints for all pricing problems. Then, the PWL relaxation (breakpoints) is constructed for all pricing problems, and this is also a pre-solving procedure. The construction depends on the number of items, the size of the items, and the capacity. Finally, based on the PWL relaxation, the PWL-B&C algorithm is adapted to the LP-B&C algorithm (Coey et al. (2020)).
Bound tightening The bound tightening procedure is called before the branch-andprice algorithm to reduce the boundaries of the breakpoints B into [0, c].
Considering a pricing problem at a node of the search tree, we find that if w = i∈N ′ a ′ i x i is small, q(w) = (c − w) 2 is larger than σ 2 i∈N ′ b ′ i x i , so the capacity constraint (14c) is not active. Thus, there is no need to overestimate q when w is small. More precisely, there is a w ∈ [0, c] such that, for any binary solution The point w is called lower breakpoint, the submodular capacity constraint (14c) is never violated for w ∈ [0, w]. We can start by overestimating q starting from the maximum lower breakpoint computed from the following convex MBQCP problem: Similarly, we can define the upper breakpoint. There exists some upper breakpoint w ∈ [0, c], such that, for every binary solution The minimum upper breakpoint can be computed from the following BSOCP problem: We solve the above two programs at the root node and obtain the bound [w r , w r ] for breakpoints. Since the feasible sets of the other nodes are a subset of the set of the root node, the above programs at other nodes are more strict than those at the root node.
It follows for w, w of any other node that w r ≤ w and w ≤ w r . We then set w = w r and w = w r for all nodes.

Construction of breakpoints
To determine the number of breakpoints B, we run a greedy heuristic algorithm that tries to maximize the number of items in a bin. We take h as the solution value given by the heuristic algorithm and assign breakpoints h equidistantly in [w, w]. Note that for a fixed number of breakpoints, the equidistant partition gives the best approximation error according to Theorem 2. We also add a breakpoint corresponding to w = 0.

PWL-B&C algorithm
The main steps of the PWL-B&C algorithm are described in Algorithm 1. Recall that the problem (8) is a maximization problem. Algorithm 1 maintains a set of active nodes N of the search tree, a pool of cuts C , an incumbent solution x * ( i∈N ′ π ′ i x * i is a primal bound). A node (l, u, U ) is characterized by the finite variable boundary vectors l and u and the dual upper bound U of the node. The upper bound U is firstly inherited from its parent node and secondly computed via the LP relaxation. Note that the PWL function is a modelling concept. We use a MILP solver, i.e. CPLEX, that formulates the PWL function q B into a MILP. We denote by z the additional binary variables to modelq B (see Section 4.4).
The variables z are also constructed internally by CPLEX, and we assume that the PWL function is forced when z is set to binary.
We denote by M B (C , l, u, U ) the MILP relaxation restricted to finite bounds (l, u) for (x, z) at a node of the search tree. The MILP relaxation M B (C , l, u, U ) consists of the PWL relaxation (15), cuts from C , and other cuts added by the MILP solver.
The node set N initially contains the root node (l 0 , u 0 ), where l 0 , u 0 ∈ R I are the finite initial global bounds on variables (x, z). On Line 8 of Algorithm 1, the main loop removes a node (l, u, U ) from N . Line 9 solves the LP relaxation of M B (C , l, u, U ) by the MILP solver.
If the LP-relaxation M B (C , l, u, U ) is infeasible, Line 11 immediately fathoms the node by infeasibility. The upper bound U of the node means that any feasible solution to the combined formulation (17) that satisfies the bounds of the node for (x, z) has an objective value of at most U . Since LP is a relaxation of the combined formulation (17), any feasible solution to the combined formulation (17) that satisfies the bounds of the node for x has a objective value of at most U .
Line 14 fathoms the node by bound if U is not better than the incumbent value. Otherwise, the upper bound U of the node is set to the optimal value of LP on Line 16.
Ifẑ is not binary, then the PWL function is not implicitly enforced by the integrality of z, and the algorithm continues in Line 19. Ifx is binary and feasible, its objective value should be at least the upper bound U . Line 23 stores the new incumbent solutionx, and Line 24 fathoms the node sincex is an optimal binary solution with respect to the bounds (l, u).
The constraint (17d) is a violated constraint ifx is binary. Line 26 adds this condition to the cut pool C , Line 27 adds the current node for re-optimization, and Line 28 discardsx by the cut in the next optimization iteration. Finally, (x,ẑ) must be fractional on Line 31, the algorithm branches using the information from fractionality and U .

Hybrid Pricing Strategy
The pricing heuristic in Section 4.1 is fast, but it cannot guarantee the dual upper bound required by the Farley bound of Lemma 1. The exact pricing algorithm is slow but yields the dual upper bound for the pricing problem. The hybrid pricing strategy first calls the pricing heuristic to decide whether calling the exact pricing algorithm can improve the local dual bound of the master problem.
In fact, for a heuristic solution, the exact algorithm is required only under a certain condition. The condition is given by the following proposition.
Proposition 3. Let v heur be the solution value of the pricing heuristic, let v RMLP be the optimum of RMLP (7), and let v ld be the current local dual bound for the master problem.
If v RMLP v heur ≤ v ld , the exact algorithm cannot yield a better local dual bound than v ld .
Proof. Let v popt be the optimum for the pricing problem (8), then v heur ≤ v popt . It follows However, v popt is the smallest pricing dual bound v price , so v RMLP vpopt is the greatest Farley bound according to Lemma 1. Therefore even if the pricing algorithm is solved to optimality, we cannot obtain a better bound than v ld .
If the condition v RMLP v heur ≤ v ld holds, one can get rid of the exact pricing algorithm, and use the solution from the fixing-greedy heuristic in Section 4.1. The hybrid pricing strategy is outlined in Algorithm 2.
Algorithm 2: Hybrid pricing strategy 1 Input: a pricing problem (8) with the objective coefficients π ′ , v RMLP the optimum of RMLP (7), v ld the local dual bound of the master problem; 2 Output: a generated column x * , and the updated local dual bound v ld ; 3 call the pricing heuristic with the objective coefficients π ′ ; ⊲ run heuristic first 4 let x, v heur be the heuristic solution and its value; In Line 3, the heuristic algorithm is called first. If v RMLP v heur ≤ v ld , the exact pricing is not needed. If the heuristic solution x has a negative reduced cost, the strategy outputs it in Line 6. Otherwise, the strategy calls the exact algorithm in Line 8.

Computational Experiments
In this section, we present the computational experiments we made to test the effectiveness of our branch-and-price algorithms for the SMBP. In particular, we test different configurations of branch-and-price algorithms to evaluate the proposed techniques. The source code and benchmarks are publicly available on the project website https://github.com/lidingxu/cbp. There we also provide a bash file to reproduce the experiments on Linux systems.

Benchmarks
We produce benchmarks as described in Cohen et al. (2019). The authors test their approximation algorithms on benchmarks from real cloud data centers of Google, which are not accessible due to confidentiality. 2 .
The authors generate SMBPs to model BPs with the chance constraint or BPs with the distributionally robust constraint where D is a family of distributions, and µ i (i ∈ N ) is the nominal size of item i. For different risk levels α, they propose three data generation methods (cases) to construct the data a, b, σ in the SMBP (4), i.e., the Gaussian case, the Hoeffding inequality case, and the distributionally robust approximation case.
As for them, we set the capacity of each bin to 72 (the number of cores of the servers), the risk level α ∈ {0.6, 0.7, 0.8, 0.9, 0.95, 0.99}.
We set the number of items (i.e., jobs) |N | ∈ {100, 400, 1000} to obtain three benchmarks with different sizes: CloudSmall, CloudMedium and CloudLarge. There are three generation methods and six risk levels. For each combination of generation methods and risk levels, we generate six instances with different random seeds. As a result, we have 108 = 6 × 6 × 3 instances in a benchmark. The interested reader can find the detailed generation method in the Appendix C.

Experimental Setups
In this section we describe the setup of the experiments, including the development environment, the implementation of the algorithms, and the solution statistics.

Development environment The experiments are conducted on a computer with
Intel Core i7-6700K CPU @ 4.00GHZ and 16GB main memory. We use SCIP 7.0.3 (Gamrath et al. (2020)) as a branch-and-price (B&P) framework to solve the set cover formulation (5). We use ILOG CPLEX 20.1 as: • an LP solver to solve the LP relaxation of RMLP (7); • a BSOCP solver to solve the BSOCP formulations of the SMBP (2) and the submodular knapsack problem with conflicts (9); • a MILP solver used by the PWL-B&C Algorithm 1; CPLEX's parameters are set by default, except that we disable its parallelism.
Solver implementation We implement five solvers for the SMBP in this paper according to the proposed techniques. Four of them are branch-and-price solvers.
2. a branch-and-price solver that uses CPLEX to solve the BSOCP formulation (9) of the pricing problem, abbreviated as B&P-BSOCP.
3. a branch-and-price solver that uses the PWL-B&C algorithm to solve the combined formulation (17) of the pricing problem, abbreviated as B&P-PWL. The PWL-B&C algorithm calls CPLEX to formulate PWL functions and solve the resulting MILP, and cuts are added as lazy constraints.
4. a branch-and-price solver that uses the hybrid pricing strategy in Algorithm 2, abbreviated as B&P-Hybrid. The hybrid pricing strategy uses the PWL-B&C algorithm as an exact pricing subroutine.
5. a branch-and-price solver that uses the hybrid pricing strategy in Algorithm 2 and the column selection heuristic in Section 3.5, abbreviated as B&P-Hybrid*.
We use the approximation algorithm in Section 3.3 to find an initial feasible solution that serves as a warm start for all solvers. The time limit for each solver is 3600 CPU seconds.
If the column generation procedure at the root node does not finish after 3500 CPU seconds, it is halted, giving SCIP 100 CPU seconds to invoke its own primal heuristic.
For the pricing problems, we set the same time limit for the exact algorithms (|N | ×0.015 CPU seconds) and the same tolerance for relative gaps (the default of CPLEX).
Performance metrics and statistical tests In order to evaluate the solver performance in different instances, we compare shifted geometric means (SGMs) (see Achterberg et al. (2008)) of performance metrics.
Given an SMBP problem instance, let v be a dual lower bound and v be a primal upper bound found by a solver. The relative dual gap in percentage is defined as: A smaller relative dual gap indicates better performance.
Let v a be the value of the solution found by the greedy min-utilization algorithm, which is communicated to all solvers as a warm start. The closed primal bound is defined as: where v * is the largest dual bound found among all solvers. A larger closed primal gap means better performance.
We report the following performance metrics for each instance tested by each solver and compute the SGMs of the benchmarks: 1. t: the total running time in CPU seconds, with a shifted value set to 1; 2. δ d %: the relative dual gap in percentage, with a shifted value set to 1%; 3. δ p %: the closed primal bound in percentage, with a shifted value set to 1%; 4. #N: the number of nodes of the search tree, with a shifted value set to 1; 5. #C: the number of columns generated, with a shifted value set to 1; 6. E%: the percentage of columns generated by the exact pricing algorithm, with a shifted value set to 1%; 7. τ %: the relative dual gap in percentage of a pricing problem solved by an exact algorithm, with a shifted value set to 1%; 8. t p %: the ratio between pricing time and total solving time in percentage, with a shifted value set to 1%.
We will discuss the computational results, which are divided into two parts. In the first part, we compare five solvers: the CPLEX-BSOCP, B&P-BSOCP, B&P-PWL, B&P-Hybrid, and B&P-Hybrid*. The second part is based on the observations and analysis of the first part, in which we test whether non-equidistant (adaptive) breakpoints can improve the performance of the B&P-Hybrid* solver.

Comparative Analysis of Main Results
The main computational results are summarized in Table 1, for detailed results we refer to Appendix D. For each benchmark, we report the SGM statistics of the performance metrics, the number of instances solved (#S) and the number of instances with improved primal bounds (#I).
Next, we analyze the main computational results by comparing the solvers.

Compact formulation vs. set cover formulation
We analyze the performance of the compact BSOCP formulation (2) and the set cover formulation (5). We focus on the performance statistics of the master problems.
For all benchmarks, the B&P-Hybrid is the best B&P solver in terms of dual gap, total time, and number of instances solved. Therefore, we compare the B&P-Hybrid with the CPLEX-BSOCP.
We first analyze the results for the small benchmark. The average total time and average dual gap of the CPLEX-BSOCP are 4.34 and 7.86 times that of the B&P-Hybrid, respectively. The CPLEX-BSOCP solves 18 instances, and this performance is the worst among all solvers. In contrast, the B&C hybrid solves 65 instances. However, the closed primal bound of the CPLEX-BSOCP is 1.93 times larger than that of the B&P-Hybrid.
CPLEX-BSOCP improves the initial approximate solutions for 13 more instances than the B&P-Hybrid.  Table 1 Aggregated statistics of the main computational results

Benchmarks Solvers
The CPLEX-BSOCP outperforms all other solvers in terms of finding good primal solutions because CPLEX has powerful internal heuristic algorithms.
In the medium and large benchmarks, none of the instances are solved. Note that 400 and 1000 items are very large for the classical linear BPs.
The average number of nodes and the average dual gap of the CPLEX-BSOCP are 0 and 100 %, and the CPLEX-BSOCP even cannot finish the root node procedure. Because by default CPLEX adds SOC cuts for nonlinear constraints (2b) and reoptimizes the LP relaxation iteratively, the resulting size of LP relaxation is too large to solve efficiently. The B&P-Hybrid (or the B&P-Hybrid*) is still able to prove a dual gap for the approximation solution or find better primal solutions.
The set cover formulation is more scalable than the compact formulation. CPLEX can solve "simple" pricing problems, but when "difficult" pricing problems occur after generating many columns, it cannot prove tight dual bounds. We find that for the B&P-BSOCP, the average dual gap of pricing problems decreases when the instance size increases. This is because for the large instance, only a small number of columns are generated before the time limit and the "hard" pricing problems do not arise.
In contrast, the B&P-PWL solves almost all pricing problems up to optimality (about 0.01%) before the time limit.
Exact pricing vs. hybrid pricing We compare the exact pricing strategy (B&P-PWL) with the hybrid pricing strategy (B&P-Hybrid).
For the small benchmark, the average total time of the B&P-PWL is 1.92 times that of the B&P-Hybrid, and the B&P-Hybrid solves 4 instances more. For the small, medium, and large benchmarks, the average dual gap of the B&P-PWL is 1.2, 1.46, and 1.26 times as large as those of the B&P-Hybrid, respectively. This is because most of the slow exact pricing procedures are replaced by the fast heuristic pricing procedures in the B&P-Hybrid.
Therefore, more columns can be generated.
For the small, medium, and large benchmarks, the average number of columns generated by the B&P-Hybrid is 1.86, 2.04, and 2.02 times that of the B&P PWL, respectively; the average percentage of exact columns generated by the B&P-Hybrid is 18. 92%, 9.19%, and 4.747%, respectively. Note that the price-dual gap of the B&P-Hybrid is slightly larger than that of the B&P-PWL because more "hard" columns are generated.
The results show that the hybrid pricing strategy is a scalable approach for large instances. The B&P-Hybrid also has better performance in terms of the closed primal bound and the number of improved instances. This is because the B&P-Hybrid generates more columns and the SCIP heuristics have more chances to improve the primal solutions. This result suggests that PWL-B&C can also be an independent solver for submodular Knapsack problems. The hybrid pricing strategy speeds up column generation. The column selection heuristic can find better solutions than the approximate solutions.

Effects of column selection heuristics
As for benchmarks, CloudSmall is a suitable testbed for comparing solvers, CloudMedium is suitable for testing the pricing algorithms, and CloudLarge is still too big to handle.

Non-equidistant Breakpoints
According to Theorem 2 in Section 4.4, the optimal breakpoints under the ℓ ∞ error form an equidistant partition of [w, w]. In this section, we investigate whether adaptive nonequidistant breakpoints can improve the B&P-Hybrid*. There are many possibilities for non-equidistant breakpoints, and we propose a simple approach below.
From the previous experiments, we first made the following observations. When the PWL-B&C algorithm solves a pricing problem by generating many cuts, the problem appears to be a "hard" pricing problem. As a result, exact pricing is slow and numerically unstable, so that CPLEX sometimes issues the warning "Advanced basis is not built". This phenomenon usually occurs at the end of column generation.
Therefore, the number of cuts generated should be reduced. This can be achieved by adjusting the breakpoints, as the following example shows.
When a pricing problem is solved by the PWL-B&C algorithm, let X 0 be the set of allx in line 23 of Algorithm 1 that generate cuts. We note that X 0 is a subset of the generating set X in (11), and we call X 0 the sub generating set. We change the breakpoints of the PWL relaxation to B 0 := {w ∈ R : w = i∈N ′ a ixi ,x ∈ X 0 }. Since the PWL relaxation is now exactly in B 0 (i.e.,q B 0 (w) = q(w) for w ∈ B 0 ), when the PWL-B&C algorithm is reexecuted, the submodular capacity constraint (9b) is satisfied for x ∈ X 0 . Therefore, the cuts generated by X 0 are not added to line 23 of Algorithm 1.
Then we note that the hybrid pricing strategy uses many heuristic pricing iterations between two exact pricing iterations. For such two pricing iterations, let X 1 and X 2 be two sub generating set, and B 1 := {w ∈ R : w = i∈N ′ a ixi ,x ∈ X 1 } and B 2 := {w ∈ R : w = i∈N ′ a ix2 ,x ∈ X 2 }. We note that the points of B 1 and B 2 can be very far apart. From the second observation, it is difficult to reuse the information from the previous exact iterations and predict the correct breakpoints of the current iteration.
Finally, we also note that the sub generating set will converge to a midpoint during the one run of the PWL-B&C algorithm.
We use the following non-equidistant breakpoint strategy. We compute the midpoint of B from a "warm-up" PWL-B&C algorithm with the equidistant breakpoints.
In the "warm-up" stage, we record the sub generating set X ′ as an ordered list, and assign each element of X ′ an order according to the time that this element is added to X ′ . Let w −1 = i∈N ′ a ixi , wherex is the last element in X ′ . Once Line 23 of Algorithm 1 is executed, let w = i∈N ′ a ixi . If wc−w −1 w−w ≤ 0.1 (the convergence criteria is satisfied), then the "warm-up" PWL-B&C algorithm terminates and outputs w c := w. Otherwise, we add x to X ′ and continue the algorithm.
In the "restart" stage, we rerun the PWL-B&C algorithm with the following nonequidistant breakpoints. The number of new breakpoints is the same as the number of old breakpoints. Let i c := ⌈ w−w w−w h⌉, S l := 1≤j≤ic−1 j, and S u := 1≤j≤h−ic j, we generate the non-equidistant breakpoints centered at w c as follows: The set B = {w i } is centered at w c with high-density.
Next, we implement a solver, namely B&P-Hybrid**, which modifies B&P-Hybrid* by using the two-stage PWL-B&C algorithm for exact pricing. The two solvers are tested on the CloudMedium benchmark, since none of the instances in this benchmark can be solved by any solver and the time to solve RMLPs in Section 3.2 is not very long.
For a detailed comparison, the results in Table 2 show the average SGM statistics of instances with the same risk level and generation method. As for the notation of the instances, "G" denotes the instances of the Gaussian distribution, "H" denotes the instances of the Hoeffding inequality, and "D" denotes the instances of the distributionally robust case.
We first focus on the price statistics. The average number of columns of the B&P-Hybrid** is 1.08 times that of the B&P-Hybrid*; the average dual pricing gap and the average exact pricing time of the B&P-Hybrid* are 3 times and 1.04 times that of the B&P-Hybrid**, respectively. Although the two-stage PWL-B&C algorithm takes more time, the time to solve difficult price instances is reduced due to the refined non-equidistant breakpoints. The two-stage PWL-B&C algorithms solve all pricing problems up to optimality (0.01%), except for the distributionally robust case and α = 0.99.
Next, we focus on the master statistics. The average number of improved instances and the closed primal bound of the B&P-Hybrid** are 1.1 and 1.2 times larger than those of the B&P-Hybrid*, respectively. This is because more columns are generated and therefore   Table 2 Master and pricing problem statistics of the B&P-Hybrid* and B&P-Hybrid** for CloudMedium the primal heuristic has more chances to find better solutions. The average dual gap of the B&P-Hybrid* is 1.02 times larger than that of the B&P-Hybrid**.
We conclude that the non-equidistant breakpoints lead to a more efficient solution of the pricing problems and a marginal improvement of the master problems.

Conclusion
In this paper, we propose a DW decomposition approach for solving SMBPs and develop exact branch-and-price algorithms. In our computational study, the branch-and-price algorithm is a promising method, since it can solve more instances than CPLEX.
We develop a PWL-B&C algorithm for solving pricing submodular knapsack problems.
The PWL-B&C algorithm is more efficient than CPLEX for the pricing submodular knapsack problems. The PWL-B&C algorithm can also be extended to solve the multiple submodular knapsack problems. For general MINLP problems, if a nonlinear constraint can be reformulated into a linear part and a univariate concave part, then the univariate concave part can be convexified by the PWL relaxation.
Our hybrid pricing strategy is applicable to the column generation procedure, where the master problems are in set cover formulations, as long as there are fast pricing heuristics.
This pricing strategy is useful for large instances. As a future study, we can apply this strategy to solve the DW decomposition of the capacitied vehicle routing problem, for which the pricing problem is difficult.
We compare the equidistant and non-equidistant breakpoints for the PWL-B&C algorithm, and the non-equidistant breakpoints can speed up the PWL-B&C algorithm given a suitable partition. The future study can investigate a more accurate partition than the one created by the"warm-up" phase of the PWL-B&C algorithm.
ACKNOWLEDGMENT The authors would like to thank Leo Liberti and and Sandra Ulrich Ngueveu for discussion with the authors.
Appendix A: Proof of Proposition 1 Proof. Let be the feasible set of the j−th constraint in the BSOCP formulation. Therefore, the feasible set of the BSCOP Let F j be the continuous relaxation of F j , and Therefore, the feasible set of the continuous relaxation of the BSCOP formulation is F = j∈M F j .
On the other hand, the points of F j are zero vector and (p, 1) (p ∈ P). Therefore, its convex hull is We note that F j is also a convex relaxation of F j , hence F j ⊂ conv(F j ) ⊂ F j .
The optimum of the continuous relaxation of the BSOCP formulation is min An optimal solution of the LP relaxation of the set cover formulation satisfies p∈P d ip λ p = 1 (i ∈ N ), and the optimal value is exactly the same as min (v,y)∈ j∈M conv(F j ),v satisfies (2c) j∈M y j . Since j∈M conv(F j ) ⊂ F , the result follows.

Appendix B: Proof of Theorem 2
Proof. Sinceq B and q have the same value at w ∈ {w 1 , . . . , w h }, it follows that the ℓ ∞ -norm is the maximum value of ℓ ∞ -norms over individual sub intervals: Let w ∈ [w k−1 , w k ], then The maximum value is at w = w k−1 +w k 2 .
It follows that (16) is equivalent to: Therefore, the optimal solution is an equidistant partition of [w, w], and the results follow.

Appendix C: Benchmark Generation
Next, we briefly review the method to generate an instance. We call the distribution of µ i the target distribution for item i. We assume that every µ i follows the same target distribution. This target distribution is unknown in Cohen et al. (2019) except for its quantiles in Table 3.
Given α and N , we generate an SMBP instance as follows: 1. sample µ i (i ∈ N ) according to Table 3; 2. sample a and b from µ and σ, using one of the following cases: • Gaussian case; • Hoeffding's inequality case; • distributionally robust approximation case. We first illustrate the approach of sampling µ. We approximate the target distribution by a normalized histogram such that its quantile distribution is the same as in Table 3. A histogram consists of intervals divided from the entire range [0, 72], and each interval has endpoints of two consecutive quantiles of Table 3.
The histogram gives a discrete non-parametric estimation of the target distribution. To obtain a nominal item size µ i (i ∈ N ) sampled as from a continuous distribution, we apply a two-stage sampling. It has two steps: Second, we construct a truncated Gaussian, which is defined by its lower and upper bounds A and A, its mean µ ′ , and its standard deviation σ ′ . To obtain these parameters, for each i ∈ N , we: 3. compute the mean µ ′ i and the standard variation σ ′ i of the truncated Gaussian with lower bound A i , upper bound A i and scale parameter s i .
With the above parameters, we generate the data a, b, σ of the SMBP (4). There are three cases, which correspond to different assumptions on the uncertainty or probability distribution.
For the Gaussian case: 1. let σ = Φ −1 (α), where Φ is the cumulative distribution function of the Gaussian distribution; 2. for i ∈ N : For the Hoeffding's inequality case: 1. let σ = −0.5 ln (1 − α); 2. for i ∈ N : For the distributionally robust approximation case: 1. let σ = α/(1 − α); 2. for i ∈ N : . For all the above cases, if there exists i ∈ N such that a i , b i are too large to fit a bin (usually for large α, σ), then we rescale a i , b i to fit the bin.

Appendix D: Detailed Results
Master problem statistics are summarized in Table 4, Table 5, and Table 6. Pricing problem statistics are summarized in Table 7, Table 8, and Table 9. As for the case notation, "G" denotes the instances of the Gaussian distribution case, "H" denotes the instances of the Hoeffding inequality case, and "D" denotes the instances of the distributionally robust case.
For each benchmark, we report the SGM statistics of performance metrics, the number of solved instances (#S), and the number of instances with improved primal bounds (#I).
We also divide the instances in each benchmark into small subsets and report SGM statistics of these subsets. In these subsets, instances have the same risk level and case.    Table 7 Pricing problem statistics of CloudSmall with 108 instances (|N | = 100) Case α B&P-BSOCP B&P-PWL B&P-Hybrid B&P-Hybrid* #C E% τ % t p % #C E% τ % t p % #C E% τ % t p % #C E% τ % t p %   Table 9 Pricing problem statistics of CloudLarge with 108 instances (|N | = 1000)