Relaxation heuristics for the set multicover problem with generalized upper bound constraints

We consider an extension of the set covering problem (SCP) introducing (i) multicover and (ii) generalized upper bound (GUB) constraints. For the conventional SCP, the pricing method has been introduced to reduce the size of instances, and several efficient heuristic algorithms based on such reduction techniques have been developed to solve large-scale instances. However, GUB constraints often make the pricing method less effective, because they often prevent solutions from containing highly evaluated variables together. To overcome this, we develop heuristic algorithms to reduce the size of instances, in which new evaluation schemes of variables are introduced taking account of GUB constraints. We also develop an efficient implementation of a 2-flip neighborhood local search algorithm that reduces the number of candidates in the neighborhood without sacrificing the solution quality. According to computational comparison on benchmark instances with the recent solvers, the proposed algorithm performs quite effectively for instances having large gaps between lower and upper bounds.


Introduction
The set covering problem (SCP) is one of representative combinatorial optimization problems. We are given a ground set of m elements i ∈ M = {1, . . . , m}, n subsets S j ⊆ M (|S j | ≥ 1) and their costs c j (> 0) for j ∈ N = {1, . . . , n}. We say that X ⊆ N is a cover of M if j∈X S j = M holds. The goal of SCP is to find a minimum cost cover X of M . The SCP is formulated as a 0-1 integer programming (0-1 IP) problem as follows: where a ij = 1 if i ∈ S j holds and a ij = 0 otherwise, and x j = 1 if j ∈ X and x j = 0 otherwise. That is, a column a j = (a 1j , . . . , a mj ) ⊤ of matrix (a ij ) represents the corresponding subset S j by S j = {i ∈ M | a ij = 1}. For notational convenience, for each i ∈ M , let N i = {j ∈ N | a ij = 1} be the index set of subsets S j that contains the element i. The SCP is known to be NP-hard in the strong sense, and there is no polynomial time approximation scheme (PTAS) unless P = NP. However, the worst-case performance analysis does not necessarily reflect the experimental performance in practice. The continuous development of mathematical programming has much improved the performance of heuristic algorithms accompanied by advances in computing machinery [2,3]. For example, Beasley [4] presented a number of greedy algorithms based on Lagrangian relaxation called the Lagrangian heuristics, and Caprara et al. [5] introduced pricing techniques into a Lagrangian heuristic algorithm to reduce the size of instances. Several efficient heuristic algorithms based on Lagrangian heuristics have been developed to solve very large-scale instances with up to 5000 constraints and 1,000,000 variables with deviation within about 1% from the optimum in a reasonable computing time [5,6,7,8].
The SCP has important real applications such as crew scheduling [5], vehicle routing [9], facility location [10,11], and logical analysis of data [12]. However, it is often difficult to formulate problems in real applications as SCP, because they often have additional side constraints in practice. Most practitioners accordingly formulate them as general mixed integer programming (MIP) problems and apply general purpose solvers, which are usually less efficient compared to solvers specially tailored to SCP.
In this paper, we consider an extension of SCP introducing (i) multicover and (ii) generalized upper bound (GUB) constraints, which arise in many real applications of SCP such as vehicle routing [13,14], crew scheduling [15], staff scheduling [16,17] and logical analysis of data [18]. The multicover constraint is a generalization of covering constraint [19,20], in which each element i ∈ M must be covered at least b i ∈ Z + (Z + is the set of nonnegative integers) times. The GUB constraint is defined as follows. We are given a partition {G 1 , . . . , G k } of N (∀h = h ′ , G h ∩ G h ′ = ∅, k h=1 G h = N ). For each block G h ⊆ N (h ∈ K = {1, . . . , k}), the number of selected subsets S j from the block (i.e., j ∈ G h ) is constrained to be at most d h (≤ |G h |). We call the resulting problem the set multicover problem with GUB constraints (SMCP-GUB), which is formulated as a 0-1 IP problem as follows: This generalization of SCP substantially extends the variety of its applications. However, GUB constraints often make the pricing method less effective, because they often prevent solutions from containing highly evaluated variables together. To overcome this, we develop heuristic algorithms to reduce the size of instances, in which new evaluation schemes of variables are introduced taking account of GUB constraints. We also develop an efficient implementation of a 2-flip neighborhood local search algorithm that reduces the number of candidates in the neighborhood without sacrificing the solution quality. To guide the search to visit a wide variety of good solutions, we also introduce an evolutionary approach called the path relinking method [21] that generates new solutions by combining two or more solutions obtained by then.
The SMCP-GUB is NP-hard, and the (supposedly) simpler problem of judging the existence of a feasible solution is NP-complete, since the satisfiability (SAT) problem can be reduced to this decision problem. We accordingly allow the search to visit infeasible solutions violating multicover constraints and evaluate their quality by the following penalized objective function. Note that throughout the remainder of the paper, we do not consider solutions that violate the GUB constraints, and the search only visits solutions that satisfy the GUB constraints. Let w = (w 1 , . . . , w m ) ∈ R m + (R + is the set of nonnegative real values) be a penalty weight vector. A solution x is evaluated bŷ ( If the penalty weights w i are sufficiently large (e.g., w i > j∈N c j holds for all i ∈ M ), then we can conclude SMCP-GUB to be infeasible when an optimal solution x * under the penalized objective functionẑ(x, w) violates at least one multicover constraint. In our algorithm, the initial penalty weightsw i (i ∈ M ) are set tow i = j∈N c j + 1 for all i ∈ M . Starting from the initial penalty weight vector w ←w, the penalty weight vector w is adaptively controlled to guide the search to visit better solutions. We present the outline of the proposed algorithm for SMCP-GUB. The first set of initial solutions are generated by applying a randomized greedy algorithm several times. The algorithm then solves a Lagrangian dual problem to obtain a near optimal Lagrangian multiplier vectorũ by a subgradient method (Section 2), which is applied only once in the entire algorithm. Then, the algorithm applies the following procedures in this order: (i) heuristic algorithms to reduce the size of instances (Section 5), (ii) a 2-flip neighborhood local search algorithm (Section 3), (iii) an adaptive control of penalty weights (Section 4), and (iv) a path relinking method to generate initial solutions (Section 6). These procedures are iteratively applied until a given time limit has run out.

Lagrangian relaxation and subgradient method
For a given vector u = (u 1 , . . . , u m ) ∈ R m + , called a Lagrangian multiplier vector, we consider the following Lagrangian relaxation problem LR(u) of SMCP-GUB: We refer toc j (u) = c j − i∈M a ij u i as the Lagrangian cost associated with column j ∈ N . For any u ∈ R m + , z LR (u) gives a lower bound on the optimal value z(x * ) of SMCP-GUB (when it is feasible, i.e., a feasible solution to SMCP-GUB exists).
The problem of finding a Lagrangian multiplier vector u that maximizes z LR (u) is called the Lagrangian dual problem (LRD): For a given u ∈ R m + , we can easily compute an optimal solutionx(u) = (x 1 (u), . . . ,x n (u)) to LR(u) as follows. For each block G h (h ∈ K), if the number of columns j ∈ G h satisfying c j (u) < 0 is equal to d h or less, then setx j (u) ← 1 for variables satisfyingc j (u) < 0 and x j (u) ← 0 for the other variables; otherwise, setx j (u) ← 1 for variables with the d h lowest Lagrangian costsc j (u) andx j (u) ← 0 for the other variables.
The Lagrangian relaxation problem LR(u) has integrality property. That is, an optimal solution to LR(u) is also optimal to its linear programming (LP) relaxation problem obtained by replacing x j ∈ {0, 1} in (4) with 0 ≤ x j ≤ 1 for all j ∈ N . In this case, any optimal solution u * to the dual of the LP relaxation problem of SMCP-GUB is also optimal to LRD, and the optimal value z LP of the LP relaxation problem of SMCP-GUB is equal to z LR (u * ).
A common approach to compute a near optimal Lagrangian multiplier vectorũ is the subgradient method. It uses the subgradient vector g(u) = (g 1 (u), . . . , g m (u)) ∈ R m , associated with a given u ∈ R m + , defined by This method generates a sequence of nonnegative Lagrangian multiplier vectors u (0) , u (1) , . . . , where u (0) is a given initial vector and u (l+1) is updated from u (l) by the following formula: where x * is the best solution obtained so far under the penalized objective functionẑ(x,w) with the initial penalty weight vectorw, and λ > 0 is a parameter called the step size. When huge instances of SCP are solved, the computing time spent on the subgradient method becomes very large if a naive implementation is used. Caprara et al. [5] developed a variant of the pricing method on the subgradient method. They define a core problem consisting of a small subset of columns C ⊂ N (|C| ≪ |N |), chosen among those having low Lagrangian costs c j (u (l) ) (j ∈ N ). Their algorithm iteratively updates the core problem in a similar fashion that is used for solving large-scale LP problems [22]. In order to solve huge instances of SMCP-GUB, we also introduce a pricing method into the basic subgradient method (BSM) described, e.g., in [3].

2-flip neighborhood local search
The local search (LS) starts from an initial solution x and repeats replacing x with a better solution x ′ in its neighborhood NB(x) until no better solution is found in NB(x). For a positive integer r, the r- j }| is the Hamming distance between x and x ′ . In other words, NB r (x) is the set of solutions obtainable from x by flipping at most r variables. We first develop a 2-flip neighborhood local search algorithm (2-FNLS) as a basic component of the proposed algorithm. In order to improve efficiency, 2-FNLS searches NB 1 (x) first, and then NB 2 (x) \ NB 1 (x).
We first describe the algorithm to search NB 1 (x), called the 1-flip neighborhood search. Let denote the increase ofẑ(x, w) by flipping x j = 0 → 1 and x j = 1 → 0, respectively, where The algorithm first searches for an improved solution obtainable by flipping x j = 0 → 1 by searching for If an improved solution exists, it chooses j with the minimum ∆ẑ ↑ j (x, w); otherwise, it searches for an improved solution obtainable by flipping x j = 1 → 0 by searching for j ∈ X(x) satisfying ∆ẑ ↓ j (x, w) < 0. We next describe the algorithm to search NB 2 (x)\NB 1 (x), called 2-flip neighborhood search. Yagiura et al. [8] developed a 3-flip neighborhood local search algorithm for SCP. They derived conditions that reduce the number of candidates in NB 2 (x) \ NB 1 (x) and NB 3 (x) \ NB 2 (x) without sacrificing the solution quality. However, those conditions are not directly applicable to the 2-flip neighborhood search for SMCP-GUB because of GUB constraints. We therefore derive the following three lemmas that reduce the number of candidates in NB 2 (x) \ NB 1 (x) by taking account of GUB constraints.
Let ∆ẑ j 1 ,j 2 (x, w) denote the increase ofẑ(x, w) by flipping the values of x j 1 and x j 2 simultaneously.
Based on this lemma, we consider only the set of solutions obtainable by flipping x j 1 = 1 → 0 and x j 2 = 0 → 1 simultaneously. We assume that j∈G h x j < d h holds for G h ∋ j 2 or j 1 and j 2 are in the same block G h , because otherwise the move is infeasible. Let denote the increase ofẑ(x, w) in this case.

Lemma 2.
Suppose that a solution x is locally optimal with respect to NB 1 (x), x j 1 = 1 and x j 2 = 0. Then ∆ẑ j 1 ,j 2 (x, w) < 0 holds, only if at least one of the following two conditions holds.
1. Both j 1 and j 2 belong to the same block G h satisfying j∈G h x j = d h .
Proof. See B.
Lemma 3. Suppose that a solution x is locally optimal with respect to NB 1 (x), and for a block G h and a pair of indices Note that the condition of Lemma 3 implies that the condition (i) of Lemma 2 is satisfied. We can conclude that to find an improved solution satisfying the condition (i), it suffices to check only one pair for each block G h satisfying j∈G h x j = d h , instead of checking all pairs (j 1 , j 2 ) with j 1 , j 2 ∈ G h , x j 1 = 1 and x j 2 = 0 (provided that the algorithm also checks the solutions satisfying the condition (ii) of Lemma 2).
The algorithm first searches for an improved solution x ′ ∈ NB 2 (x) \ NB 1 (x) satisfying the condition (i). For each block G h (h ∈ K) satisfying j∈G h x j = d h , it checks the solution obtained by flipping x j 1 = 1 → 0 and x j 2 = 0 → 1 for j 1 and j 2 in G h with the minimum ∆ẑ ↓ j 1 (x, w) and ∆ẑ ↑ j 2 (x, w), respectively. The algorithm then searches for an improved solution If an improved solution is found, it chooses a pair j 1 and j 2 with the minimum ∆ẑ j 1 ,j 2 (x, w) among those in NB (j 1 ) 2 (x). Algorithm 2-FNLS searches NB 1 (x) first, and then NB 2 (x) \ NB 1 (x). The algorithm is formally described as follows.
Input: A solution x and a penalty weight vector w.
Output: A solution x.
Step 1: , set x j ← 1 and return to Step 1.
Step 2: , set x j ← 0 and return to Step 2.
If the current solution x has been updated at least once in Step 3, return to Step 3.
Step 4: For each j 1 ∈ X(x) in the ascending order of ∆ẑ ↓ with the minimum ∆ẑ j 1 ,j 2 (x, w) and set x j 1 ← 0 and x j 2 ← 1. If the current solution x has been updated at least once in Step 4, return to Step 1; otherwise output x and exit.
We note that 2-FNLS does not necessarily output a locally optimal solution with respect to NB 2 (x), because the solution x is not necessarily locally optimal with respect to NB 1 (x) in Steps 3 and 4. Though it is easy to keep the solution x locally optimal with respect to NB 1 (x) in Steps 3 and 4 by returning to Step 1 whenever an improved solution is obtained in Steps 2 or 3, we did not adopt this option because it consumes much computing time just to conclude that the current solution is locally optimal with respect to NB 1 (x) in most cases.
Let one-round be the computation needed to find an improved solution in the neighborhood or to conclude that the current solution is locally optimal, including the time to update relavant data structures and/or memory [23,24]. If implemented naively, 2-FNLS requires O(σ) and O(nσ) one-round time for NB 1 (x) and NB 2 (x) \ NB 1 (x), respectively, where σ = i∈M j∈N a ij . In order to improve computational efficiency, we keep the following auxil- in memory to compute each ∆ẑ ↑ We first consider the one-round time for NB 1 (x). In Steps 1 and 2, the algorithm finds j ∈ N \ X(x) and j ∈ X(x) with the minimum ∆ẑ ↓ j (x, w) and ∆ẑ ↑ j (x, w) in O(n) time, respectively, by using the auxiliary data whose update requires O(τ ) time. Thus, the one-round time is reduced to O(n + τ ) for NB 1 (x).
We next consider the one-round time for NB 2 (x) \ NB 1 (x). In Step 3, the algorithm first finds j 1 and j 2 with the minimum ∆ẑ ↓ The algorithm then evaluates ∆ẑ j 1 ,j 2 (x, w) in O(ν) time by using (9), where ν = max j∈N |S j |. In Step 4, the algorithm first flips x j 1 = 1 → 0 and temporarily updates the values of (in O(1) time for each pair of j 1 and j 2 ) only for each ) during the temporary update. Note that the number of such candidates j 2 that satisfy ∆p ↑ When an improved solution was not found in NB Because k ≤ n, ν ≤ m, τ ≤ σ, n ′ ≤ n, m ≤ σ, and n ≤ σ always hold, these orders are not worse than those of naive implementation, and they are much better if ν ≪ m, τ ≪ σ and n ′ ≪ n hold, which are the case for many instances. We also note that the computation time for updating the auxiliary data has little effect on the total computation time of 2-FNLS, because, in most cases, the number of solutions actually visited is much less than that of evaluated neighbor solutions.

Adaptive control of penalty weights
We observed that 2-FNLS alone tends to be attracted to locally optimal solutions of insufficient quality when the penalty weights w i are large. We accordingly incorporate a mechanism to adaptively control the values of w i (i ∈ M ) [8,25,26]; the algorithm iteratively applies 2-FNLS, updating the penalty weight vector w after each call to 2-FNLS. We call such a sequence of calls to 2-FNLS the weighting local search (WLS) according to [27,28].
Let x denote the solution at which the previous 2-FNLS stops. Algorithm WLS resumes 2-FNLS from x after updating the penalty weight vector w. Starting from an initial penalty weight vector w ←w, where we setw i = j∈N c j + 1 for all i ∈ M , the penalty weight vector w is updated as follows. Let x best denote the best solution obtained in the current call to WLS with respect to the penalized objective functionẑ(x,w) with the initial penalty weight vector w. Ifẑ(x, w) ≥ẑ(x best ,w) holds, WLS uniformly decreases the penalty weights w i ← (1 − η)w i for all i ∈ M , where the parameter η is decided so that for 15% of variables satisfying x j = 1, the new value of ∆ẑ ↓ j (x, w) becomes negative. Otherwise, WLS increases the penalty weights by where y i (x) = max{b i − j∈N a ij x j , 0} is the amount of violation of the ith multicover constraint, and δ is a parameter that is set to 0.2 in our computational experiments. Algorithm WLS iteratively applies 2-FNLS, updating the penalty weight vector w after each call to 2-FNLS, until the best solution x best with respect toẑ(x,w) obtained in the current call to WLS has not improved in the last 50 iterations.

Algorithm WLS(x)
Input: A solution x.
Output: A solutionx and the best solution x best with respect toẑ(x,w) obtained in the current call to WLS.
Step 2: Apply 2-FNLS(x, w) to obtain an improved solutionx ′ and then setx ← x ′ . Let x ′ be the best solution with respect toẑ(x,w) obtained during the call to 2-FNLS(x, w).
Step 4: Ifẑ(x, w) ≥ẑ(x best ,w) holds, then uniformly decrease the penalty weights w i for all i ∈ M by w i ← (1 − η)w i ; otherwise, increase the penalty weights w i for all i ∈ M by (12). Return to Step 2.

Heuristic algorithms to reduce the size of instances
For a near optimal Lagrangian multiplier vector u, the Lagrangian costsc j (u) give reliable information on the overall utility of selecting columns j ∈ N for SCP. Based on this property, the Lagrangian costsc j (u) are often utilized to solve huge instances of SCP. Similar to the pricing method for solving the Lagrangian dual problem, several heuristic algorithms successively solve a number of subproblems, also called core problems, consisting of a small subset of columns C ⊆ N (|C| ≪ |N |), chosen among those having low Lagrangian costsc j (u) (j ∈ C) [5,6,7,8]. The Lagrangian costsc j (u) are unfortunately unreliable for selecting columns j ∈ N for SMCP-GUB, because GUB constraints often prevent solutions from containing more than d h variables x j with the lowest Lagrangian costsc j (u). To overcome this, we develop two evaluation schemes of columns j ∈ N for SMCP-GUB. Before updating the core problem C for every call to WLS, the algorithm heuristically fixes some variablesx j ← 1 to reflect the characteristics of the incumbent solution x * and the current solutionx. Let u be a near optimal Lagrangian multiplier vector, and V = {j ∈ N | x * j =x j = 1} be an index set from which variables to be fixed are chosen. Letc max (u) = max j∈Vcj (u) be the maximum value of the Lagrangian costc j (u) (j ∈ V ). The algorithm randomly chooses a variable x j (j ∈ V ) with probability and fixesx j ← 1. We note that the uniform distribution is used ifc max (u) =c j (u) holds for all j ∈ V . The algorithm iteratively chooses and fixes a variable x j (j ∈ V ) until j∈N a ij x j ≥ b i holds for 20% of multicover constraints i ∈ M . It then updates the Lagrangian multiplier u i ← 0 if j∈F a ij ≥ b i holds for i ∈ M , and computes the Lagrangian costsc j (u) for j ∈ N \ F , where F is the index set of the fixed variables. The variable fixing procedure is formally described as follows.
Algorithm FIX(x * ,x,ũ) Input: The incumbent solution x * , the current solutionx and a near optimal Lagrangian multiplier vectorũ.
Output: A set of fixed variables F ⊂ N and a Lagrangian multiplier vector u.
Step 2: If |{i ∈ M | j∈F a ij ≥ b i }| ≥ 0.2m holds, then set u i ← 0 for each i ∈ M satisfying j∈F a ij ≥ b i , output F and u, and halt.
Step 3: Randomly choose a column j ∈ V with probability prob j (u) defined by (13), and set F ← F ∪ {j}. Return to Step 2.
Subsequent to the variable fixing procedure, the algorithm updates the instance to be considered by setting The first evaluation scheme modifies the Lagrangian costsc j (u) to reduce the number of redundant columns j ∈ C resulting from GUB constraints. For each block G h (h ∈ K), let θ h be the value of the (d h + 1)st lowest Lagrangian costc j (u) among those for columns in G h if d h < |G h | holds and θ h ← 0 otherwise. We then define a score ρ j for j ∈ G h , called the normalized Lagrangian score, by ρ j =c j (u) − θ h if θ h < 0 holds, and ρ j =c j (u) otherwise.
The second evaluation scheme modifies the Lagrangian costsc j (u) by replacing the Lagrangian multiplier vector u with the adaptively controlled penalty weight vector w. We define another score φ j for j ∈ N , called the pseudo-Lagrangian score, by φ j =c j (w). Intuitive meaning of this score is that we consider a column to be promising if it covers many frequently violated constraints in the recent search. The variable fixing procedure for the second evaluation scheme is described in a similar fashion to that of the first evaluation scheme by replacing the Lagrangian multiplier vectorsũ and u with penalty weight vectorsw and w, respectively.
Given a score vector ρ (resp., φ), a core problem is defined by a subset C ⊂ N consisting of (i) columns j ∈ N i with the b i lowest scores ρ j (resp., φ j ) for each i ∈ M , (ii) columns j ∈ N with the 10n ′ lowest scores ρ j (resp., φ j ) (recall that we define n ′ = j∈N x j ), and (iii) columns j ∈ X(x * ) ∪ X(x) for the incumbent solution x * and the current solutionx. The core problem updating procedure is formally described as follows.
Algorithm CORE(ρ, x * ,x) Input: A score vector ρ, the incumbent solution x * and the current solutionx.
Output: The core problem C ⊂ N .
Step 1: For each i ∈ M , let C 1 (i) be the set of columns j ∈ N i with the b i lowest ρ j among those in N i . Then set C 1 ← i∈M C 1 (i).
Step 2: Set C 2 be the set of columns j ∈ N with the 10n ′ lowest ρ j .

Path relinking
The path relinking method [21] is an evolutionary approach to integrate intensification and diversification strategies. This approach generates new solutions by exploring trajectories that connect good solutions. It starts from one of the good solutions, called an initiating solution, and generates a path by iteratively moving to a solution in the neighborhood that leads toward the other solutions, called guiding solutions.
Because it is preferable to apply path relinking method to solutions of high quality, we keep reference sets R 1 and R 2 of good solutions with respect to the penalized objective functionŝ z(x, w) andẑ(x,w) with the current penalty weight vector w and the initial penalty weight vectorw, respectively. Initially R 1 and R 2 are prepared by repeatedly applying a randomized greedy algorithm, which is the same as Steps 1 and 2 of 2-FNLS(0,w) except for randomly choosing j ∈ I ↑ 1 (x) from those with the five lowest ∆ẑ ↑ j (x,w) in Step 1 (recall that we definew to be the initial penalty weight vector). Suppose that the last call to WLS stops at a solution x and x best is the best solution with respect toẑ(x,w) obtained during the last call to WLS. Then, the worst solutionx worst in R 1 (with respect toẑ(x, w)) is replaced with the solutionx if it satisfiesẑ(x, w) ≤ẑ(x worst , w) andx = x ′ for all x ′ ∈ R 1 . The worst solution x worst in R 2 (with respect toẑ(x,w)) is replaced with the solution x best if it satisfiesẑ(x best ,w) ≤ẑ(x worst ,w) and The path relinking method first chooses two solutions x init (initiating solution) and x guide (guiding solution) randomly, one from R 1 and another from R 2 , where we assume thatẑ(x init , w) ≤ z(x guide , w) and x init =x hold. Let ξ = d(x init , x guide ) be the Hamming distance between solutions x init and x guide . It then generates a sequence x init = x (0) , x (1) , . . . , x (ξ) = x guide of solutions as follows. Starting from x (0) ← x init , for l = 0, 1, . . . , ξ − 1, the solution x (l+1) is defined to be a solution x ′ with the best value ofẑ(x ′ , w) among those satisfying x ′ ∈ NB 1 (x (l) ) and d(x ′ , x guide ) < d(x (l) , x guide ). The algorithm chooses the first solution x (l) (l = 0, 1, . . . , ξ − 1) satisfyingẑ(x (l) , w) ≤ẑ(x (l+1) , w) as the next initial solution of WLS.
Given a pair of solutionsx and x best and the current reference sets R 1 and R 2 , the path relinking method outputs the next initial solution x of WLS and the updated reference sets R 1 and R 2 . The path relinking method is formally described as follows.
Algorithm PRL(x, x best , R 1 , R 2 ) Input: Solutionsx and x best and reference sets R 1 and R 2 .
Output: The next initial solution x of WLS and the updated reference sets R 1 and R 2 .
Step 1: Letx worst = arg max Step 2: Let x worst = arg max x ′ ∈R 2ẑ (x ′ ,w) be the worst solution in R 2 . If the solution x best satisfiesẑ(x best ,w) ≤ẑ(x worst ,w) and x best = x ′ for all x ′ ∈ R 2 , then set Step 3: Randomly choose two solutions x init and x guide , one from R 1 and another from R 2 , where we assume thatẑ(x init , w) ≤ẑ(x guide , w) and x init =x hold. Set l ← 0 and x (l) ← x init .

Summary of the proposed algorithm
We summarize the outline of the proposed algorithm for SMCP-GUB in Figure 1. The first reference sets R 1 and R 2 of good solutions are generated by repeating the randomized greedy algorithm (Section 6). The first initial solutionx is set tox ← arg min x∈R 1 ∪R 2ẑ (x,w). When using the normalized Lagrangian score ρ j , the algorithm obtains a near optimal Lagrangian multiplier vectorũ by the basic subgradient method BSM accompanied by a pricing method [3] (Section 2), where it is applied only once in the entire algorithm. The algorithm repeatedly applies the following procedures in this order until a given time has run out. The heuristic variable fixing algorithm FIX(x * ,x,ũ) (Section 5) decides the index set F of variables to be fixed, and it updates the instance to be considered by fixing variablesx j ← 1 (j ∈ F ). The heuristic size reduction algorithm CORE(ρ, x * ,x) (Section 5) constructs a core problem C ⊂ N and fixes variablesx j ← 0 (j ∈ N \ C). The weighting local search algorithm WLS(x) explores good solutionsx and x best with respect toẑ(x, w) andẑ(x,w), respectively, by repeating the 2-flip neighborhood local search algorithm 2-FNLS(x, w) (Section 3) while updating the penalty weight vector w adaptively (Section 4), where the initial penalty weights w i (i ∈ M ) are set tow i = j∈N c j + 1 for all i ∈ M . After updating the reference sets R 1 and R 2 , the next initial solutionx is generated by the path relinking method PRL(x, x best , R 1 , R 2 ) (Section 6).

Computational results
We first prepared eight classes of random instances for SCP [29], among which classes G and H were taken from Beasley's OR Library [30] and classes I-N were newly generated in similar manner. Each class has five instances; we denote instances in class G as G.1, . . . , G.5, and other instances in classes H-N similarly. Another set of benchmark instances is called RAIL arising from a crew pairing problem in an Italian railway company [5,7]. The summary of these instances are given in Table 1, where the density is defined by i∈M j∈N a ij /mn.
For each random instance, we generated four types of SMCP-GUB instances with different values of parameters d h and |G h | as shown in Table 2, where all blocks G h (h ∈ K) have the same size |G h | and upper bound d h for each instance. Here, the right-hand sides of multicover constraints b i are random integers taken from interval [1,5].
We first compared the proposed algorithm with different evaluation schemes of variables: (i) the Lagrangian costc j (u), (ii) the normalized Lagrangian score ρ j , and (iii) the pseudo-Lagrangian score φ j . We also tested the proposed algorithm without the size reduction mechanism. Table 3 shows the average objective values of the proposed algorithm for each class of SMCP-GUB instances. The third column "z LP " shows the optimal values of LP relaxation, and the fourth column "w/o" shows the proposed algorithm without the size reduction mechanism. The fifth, sixth, and seventh columns show the results of the proposed algorithm with different evaluation schemes. The best upper bounds among the compared settings are highlighted in bold. We observe that the proposed algorithm with the Lagrangian costc j (u) performs much worse than the algorithm without the size reduction mechanism for types 1 and 2, which indicates that the Lagrangian costc j (u) does not evaluate the promising variables properly for these instances. The proposed algorithm with the normalized Lagrangian score ρ j performs much better than the algorithm with the Lagrangian costc j (u) for types 1 and 2, while they show almost the same performance for types 3 and 4. This is because the normalized Lagrangian score ρ j takes almost the same value as the Lagrangian costc j (u) for types 3 and 4. We also observed that the proposed algorithm with the pseudo-Lagrangian score φ j performs better than the algorithm with the normalized Lagrangian score ρ j for most of the tested instances. Table 4 shows the average objective values of the proposed algorithm for each class of SCP instances, in which we omit the results for the normalized Lagrangian score ρ j because it takes exactly the same value as the Lagrangian costc j (u) for SCP instances. We observe that the proposed algorithm with the Lagrangian costc j (u) and the pseudo-Lagrangian score φ j performs better than the algorithm without the size reduction mechanism, and the algorithm with the Lagrangian costc j (u) performs best for the RAIL instances.
We next compared the proposed algorithm with the above-mentioned recent solvers, where we tested the proposed algorithm with the pseudo-Lagrangian score φ j . Tables 5 and 6 show the average objective values of the compared algorithms for each class of SMCP-GUB and SCP instances, respectively, where the results with asterisks " * " indicate that the obtained feasible solutions were proven to be optimal. We observe that the proposed algorithm performs better than CPLEX12.6, Gurobi5.6.2 and LocalSolver3.1 for all types of SMCP-GUB instances and classes G-N of SCP instances while it performs worse than CPLEX12.6 and Gurobi5.6.2 for class RAIL of SCP instances. We also observe that the 3-flip neighborhood local search algorithm [8]  achieves best upper bounds for almost all classes of SCP instances.
These observations indicate that the variable fixing and pricing techniques based on the LP and/or Lagrangian relaxation are greatly affected by the gaps between lower and upper bounds, and they may not work effectively for the instances having large gaps. For such instances, the proposed algorithm with the pseudo-Lagrangian score φ j succeeds in evaluating the promising variables properly.

Conclusion
In this paper, we considered an extension of the set covering problem (SCP) called the set multicover problem with the generalized upper bound constraints (SMCP-GUB). We developed a 2-flip local search algorithm incorporated with heuristic algorithms to reduce the size of instances, evaluating promising variables by taking account of GUB constraints, and with an adaptive control mechanism of penalty weights that is used to guide the search to visit feasible and infeasible regions alternately. We also developed an efficient implementation of a 2-flip neighborhood search that reduces the number of candidates in the neighborhood without sacrificing the solution quality. To guide the search to visit a wide variety of good solutions, we also introduced an evolutionary approach called the path relinking method that generates new solutions by combining two or more solutions obtained by then. According to computational comparison on benchmark instances with three recent solvers, we can conclude that the proposed algorithm performs quite effectively for instances having large gaps between lower and upper bounds.
We expect that the evaluation scheme of promising variables is also applicable to other combinatorial optimization problems, because the pseudo-Lagrangian score φ j =c j (w) (j ∈ N ) can be defined when an adaptive control mechanism of penalty weights w is incorporated in local search, and such an approach has been observed to be effective for many problems.  where M + (x) = {i ∈ M | j∈N a ij x j = b i + 1}. Next, we consider the case with x j 1 = x j 2 = 0. By the assumption of the lemma, for both j = j 1 and j 2 , holds unless j belongs a block G h satisfying l∈G h x l = d h . Then we have where M − (x) = {i ∈ M | j∈N a ij x j = b i − 1}.

B Proof of Lemma 2
We assume that neither the conditions (i) nor (ii) is satisfied and ∆ẑ j 1 ,j 2 (x, w) < 0 holds. Then, for (ii) is assumed to be unsatisfied, M E (x) ∩ S j 2 ∩ S j 2 = ∅ holds, and hence we have ∆ẑ j 1 ,j 2 (x, w) = ∆ẑ ↓ j 1 (x, w) + ∆ẑ ↑ j 2 (x, w) < 0, which implies ∆ẑ ↓ j 1 (x, w) < 0 or ∆ẑ ↑ j 2 (x, w) < 0. If ∆ẑ ↓ j 1 (x, w) < 0 holds, then the algorithm obtains an improved solution by flipping x j 1 = 1 → 0. If ∆ẑ ↑ j 2 (x, w) < 0 holds, the algorithm obtains an improved solution by flipping x j 2 = 0 → 1, because j 1 and j 2 belong to the same block satisfying j∈G h x j < d h or to different blocks, and in the latter case, j 2 belongs a block G h satisfying j∈G h x j < d h by the assumption that we only consider solutions that satisfy the GUB constraints, including the solution obtained after flipping j 1 and j 2 . Both cases contradict the assumption that x is locally optimal with respect to NB 1 (x).