Exploiting variable associations to configure efficient local search algorithms in large-scale binary integer programs

We present a data mining approach for reducing the search space of local search algorithms in a class of binary integer programs including the set covering and partitioning problems. We construct a k-nearest neighbor graph by extracting variable associations from the instance to be solved, in order to identify promising pairs of flipping variables in the neighborhood search. We also develop a 4-flip neighborhood local search algorithm that flips four variables alternately along 4-paths or 4-cycles in the k-nearest neighbor graph. We incorporate an efficient incremental evaluation of solutions and an adaptive control of penalty weights into the 4-flip neighborhood local search algorithm. Computational comparison with the latest solvers shows that our algorithm performs effectively for large-scale set covering and partitioning problems.

Given a ground set of m elements i ∈ M = {1, . . . , m}, n subsets S j ⊆ M (|S j | ≥ 1), and their costs c j ∈ R for j ∈ N = {1, . . . , n}, we say that X ⊆ N is a cover of M if j∈X S j = M holds. We say that X ⊆ N is a partition of M if j∈X S j = M and S j 1 ∩ S j 2 = ∅ hold for all j 1 , j 2 ∈ X. The goals of SCP and SPP are to find a minimum cost cover and a partition X of M , respectively. In this paper, we consider the following class of binary integer programs (BIPs) including SCP and SPP: (1) * A preliminary version of this paper was presented in [39]. † Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka, 565-0871, Japan. umetani@ist.osaka-u.ac.jp where a ij ∈ {0, 1} and b i ∈ Z + (Z + is the set of non-negative integer values). We note that a ij = 1 if i ∈ S j holds and a ij = 0 otherwise, and x j = 1 if j ∈ X and x j = 0 otherwise. That is, a column vector a j = (a 1j , . . . , a mj ) T of the matrix (a ij ) represents the corresponding subset S j by S j = {i ∈ M G ∪ M L ∪ M E | a ij = 1}, and the vector x also represents the corresponding cover (or partition) X by X = {j ∈ N | x j = 1}. For notational convenience, for each i ∈ M G ∪ M L ∪ M E , let N i = {j ∈ N | a ij = 1} be the index set of subsets S j that contain the elements i and let s i (x) = j∈N a ij x j be the left-hand side of the ith constraint.
The SCP and SPP are known to be NP-hard in the strong sense, and no polynomial time approximation scheme (PTAS) exists unless P = NP. However, worst case performance analysis does not necessarily represent the experimental performance in practice. Continuous development of mathematical programming has much improved the performance of heuristic algorithms and this has been accompanied by advances in computing machinery. Many efficient exact and heuristic algorithms for large-scale SCP and SPP instances have been developed [3,5,7,10,13,15,16,17,19,28,37,42,44], many of which are based on variable fixing techniques that reduce the search space to be explored by using the optimal values obtained by linear programming (LP) and/or Lagrangian relaxation as lower bounds. However, many large-scale SCP and SPP instances still remain unsolved because there is little hope of closing the large gap between the lower and upper bounds of the optimal values. In particular, the equality constraints of SPP often make variable fixing techniques less effective because they often prevent solutions from containing highly evaluated variables together. In this paper, we consider an alternative approach for extracting useful features from the instance to be solved with the aim of reducing the search space of local search algorithms for large-scale SCP and SPP instances.
In the design of local search algorithms for large-scale combinatorial problems, as the instance size increases, improving the computational efficiency becomes more effective than using sophisticated search strategies. The quality of locally optimal solutions typically improves if a larger neighborhood is used. However, the computation time of searching the neighborhood also increases exponentially. To overcome this, extensive research has investigated ways to efficiently implement neighborhood search, which can be broadly classified into three types: (i) reducing the number of candidates in the neighborhood [33,35,43,44], (ii) evaluating solutions by incremental computation [29,43,40,41], and (iii) reducing the number of variables to be considered by using linear programming and/or Lagrangian relaxation [15,19,38,44].
To suggest an alternative, we develop a data mining approach for reducing the search space of local search algorithms. That is, we construct a k-nearest neighbor graph by extracting variable associations from the instance to be solved in order to identify promising pairs of swapping variables in the large neighborhood search. We also develop a 4-flip neighborhood local search algorithm that flips four variables alternately along 4-paths or 4-cycles in the k-nearest neighbor graph (Section 3). We incorporate an efficient incremental evaluation of solutions (Section 2) and an adaptive control of penalty weights (Section 4) into the 4-flip neighborhood local search algorithm.

2-flip neighborhood local search
Local search (LS) starts from an initial solution x and then iteratively replaces x with a better solution x in the neighborhood NB(x) until no better solution is found in NB(x). For some positive integer r, let the r-flip neighborhood NB r (x) be the set of solutions obtainable by flipping at most r variables in x. We first develop a 2-flip neighborhood local search (2-FNLS) algorithm as a basic component of our algorithm. In order to improve efficiency, the 2-FNLS first searches NB 1 (x), and then searches NB 2 (x) \ NB 1 (x) only if x is locally optimal with respect to NB 1 (x).
The BIP is NP-hard, and the (supposedly) simpler problem of judging the existence of a feasible solution is NP-complete, since the satisfiability (SAT) problem can be reduced to this problem. We accordingly consider the following formulation of a BIP that allows violations of the constraints and introduce over and under penalty functions with penalty weight vectors For a given x ∈ {0, 1} n , we can easily compute optimal Because the region searched by a single application of LS is limited, LS is usually applied many times. When a locally optimal solution is found, a standard strategy is to update the penalty weights and to resume LS from the obtained locally optimal solution. We accordingly evaluate solutions by using an alternative functionz(x), where the original penalty weight vectors w + , w − are replaced by w + , w − , and these are adaptively controlled during the search (see Section 4 for details).
We first describe how 2-FNLS is used to search NB 1 (x), which is called the 1-flip neighborhood. Let ∆z ↑ j (x) and ∆z ↓ j (x) denote the increases inz(x) due to flipping x j = 0 → 1 and x j = 1 → 0, respectively. 2-FNLS first searches for an improved solution by flipping x j = 0 → 1 for j ∈ N \ X. If an improved solution is found, it chooses j that has the minimum value of ∆z ↑ (x), otherwise it searches for an improved solution by flipping x j = 1 → 0 for j ∈ X.
We next describe how 2-FNLS is used to search NB 2 (x) \ NB 1 (x), which is called the 2-flip neighborhood. We derive conditions that reduce the number of candidates in NB 2 (x) \ NB 1 (x) without sacrificing the solution quality by expanding the results as shown in [44]. Let ∆z j 1 ,j 2 (x) denote the increase inz(x) due to simultaneously flipping the values of x j 1 and x j 2 . Lemma 1. If a solution x is locally optimal with respect to NB 1 (x), then ∆z j 1 , Based on this lemma, we consider only the set of solutions obtainable by simultaneously flipping x j 1 = 1 → 0 and x j 2 = 0 → 1. We now define whereS(x) = {i ∈ S j 1 ∩S j 2 | s i (x) = b i }. By Lemma 1, the 2-flip neighborhood can be restricted to the set of solutions satisfying x j 1 = x j 2 andS(x) = ∅. However, it might not be possible to search this set efficiently without first extracting it. We thus construct a neighbor list that stores promising pairs of variables x j 1 and x j 2 for efficiency (see Section 3 for details).
To increase the efficiency of 2-FNLS, we decompose the neighborhood NB 2 (x) into a number of sub-neighborhoods. Let NB (j 1 ) 2 (x) denote the subset of NB 2 (x) obtainable by flipping x j 1 = 1 → 0. 2-FNLS searches NB (j 1 ) 2 (x) for each j 1 ∈ X in ascending order of ∆z ↓ j 1 (x). If an improved solution is found, the pair j 1 and j 2 that has the minimum value of ∆z j 1 ,j 2 (x) among NB (j 1 ) 2 (x) is selected. 2-FNLS is formally described as follows.
Input: A solution x and penalty weight vectors w + and w − .
Output: A solution x.
Step 3: For each j 1 ∈ X in ascending order of ∆z ↓ 2 (x) that has the minimum value of ∆z j 1 ,j 2 (x) and set x j 1 ← 0 and x j 2 ← 1. If the current solution x has been updated at least once in Step 3, return to Step 1, otherwise output x and exit.
If implemented naively, 2-FNLS requires O(σ) time to compute the value of the evaluation functionz(x) for the current solution x, where σ = i∈M j∈N a ij denote the number of nonzero elements in the constraint matrix (a ij ). This computation is quite expensive if we evaluate the neighbor solutions of the current solution x independently. To overcome this, we first develop a standard incremental evaluation of ∆z ↑ j (x) and ∆z ↓ j (x) in O(|S j |) time by keeping the values of the left-hand side of constraints s i (x) (i ∈ M G ∪ M L ∪ M E ) in memory. We further develop an improved incremental evaluation of ∆z ↑ j (x) and ∆z ↓ j (x) in O(1) time by keeping additional auxiliary data in memory (see Appendix B for details). By using this, 2-FNLS is also able to evaluate ∆z j 1 ,j 2 (x) in O(|S j |) time by using (3).

Exploiting variable associations
It is known that the quality of locally optimal solutions improves if a larger neighborhood is used. However, the computation time to search the neighborhood NB r (x) also increases exponentially with r, since |NB r (x)| = O(n r ) generally holds. A large amount of computation time is thus needed in practice in order to scan all candidates in NB 2 (x) for large-scale instances with millions of variables. To overcome this, we develop a data mining approach that identifies promising pairs of flipping variables in NB 2 (x) by extracting variable associations from the instance to be solved using only a small amount of computation time.
Based on Lemma 1, the 2-flip neighborhood can be restricted to the set of solutions satisfying x j 1 = x j 2 andS(x) = ∅. We further observe from (3) that it is favorable to select pairs of flipping variables x j 1 and x j 2 which gives a larger size |S j 1 ∩ S j 2 | in order to obtain ∆z j 1 ,j 2 (x) < 0. Based on this observation, we keep a limited set of pairs of variables x j 1 and x j 2 for which |S j 1 ∩ S j 2 | is large in memory, and we call this the neighbor list ( Figure 1). We note that |S j 1 ∩ S j 2 | represents a certain kind of similarity between the subsets S j 1 and S j 2 (or column vectors a j 1 and a j 2 of the constraint matrix (a ij )) and we keep the k-nearest neighbors for each subset S j (j ∈ N ) in the neighbor list.
and α is a program parameter that we set to five. Let L [j 1 ] be the set of variables x j 2 stored in the j 1 th row of the neighbor list. We then reduce the number of candidates in NB 2 (x) by restricting the pairs of flipping variables x j 1 and x j 2 to pairs in the neighbor list j 1 ∈ X and We note that it is still expensive to construct the whole neighbor list for large-scale instances with millions of variables. To overcome this, we develop a lazy construction algorithm for the neighbor list. That is, 2-FNLS starts from an empty neighbor list and generates the j 1 th row of the neighbor list L [j 1 ] only when 2-FNLS searches NB (j 1 ) 2 (x) for the first time. We can treat the neighbor list as an adjacency-list representation of a directed graph, and represent associations between variables by a corresponding directed graph called the k-nearest neighbor graph (Figure 2). Using the k-nearest neighbor graph, we extend 2-FNLS to search a set of promising neighbor solutions in NB 4 (x). For each variable x j 1 (j 1 ∈ X), we keep j 2 ∈ (N \ X) ∩ L [j 1 ] that has the minimum value of ∆z j 1 ,j 2 (x) in memory as π(j 1 ). The extended 2-FNLS, called the 4-flip neighborhood search (4-FNLS) algorithm, then searches for an improved solution by flipping x j 1 = 1 → 0, x π(j 1 ) = 0 → 1, x j 3 = 1 → 0 and x π(j 3 ) = 0 → 1 for j 1 ∈ X and j 3 ∈ X ∩ L [π(j 1 )] satisfying j 1 = j 3 and π(j 1 ) = π(j 3 ), i.e., flipping the values of variables alternately along 4-paths or 4-cycles in the k-nearest neighbor graph. Let ∆z j 1 ,j 2 ,j 3 ,j 4 (x) denote the increase inz(x) due to simultaneously flipping x j 1 = 1 → 0, x j 2 = 0 → 1, x j 3 = 1 → 0 and x j 4 = 0 → 1. 4-FNLS computes ∆z j 1 ,j 2 ,j 3 ,j 4 (x) in O(|S j |) time by applying the standard incremental evaluation alternately. 4-FNLS is formally described by replacing the part of the 2-FNLS algorithm after Step 2 with the following: Step 3 : For each j 1 ∈ X in ascending order of ∆z ↓ 2 (x) that has the minimum value of ∆z j 1 ,j 2 (x) and set x j 1 ← 0 and x j 2 ← 1. If the current solution x has been updated at least once in Step 3 , return to Step 1.
Although a similar approach has been developed in local search algorithms for the Euclidean traveling salesman problem (TSP) in which a sorted list containing only the k-nearest neighbors is stored for each city by using a geometric data structure called the k-dimensional tree [26], it is not suitable for finding the k-nearest neighbors efficiently in high-dimensional spaces. We thus extend it for application to the high-dimensional column vectors a j ∈ {0, 1} m (j ∈ N ) of BIPs by using a lazy construction algorithm for the neighbor list.

Adaptive control of penalty weights
In our algorithm, solutions are evaluated by the alternative evaluation functionz(x) in which the fixed penalty weight vectors w + , w − in the original evaluation function z(x) has been replaced by w + , w − , and the values of It is often reported that a single application of LS tends to stop at a locally optimal solution of insufficient quality when large penalty weights are used. This is because it is often unavoidable to temporally increase the values of some violations y + i and y − i in order to reach an even better solution from a good solution through a sequence of neighborhood operations, and large penalty weights thus prevent LS from moving between such solutions. To overcome this, we incorporate an adaptive adjustment mechanism for determining appropriate values of penalty weights w + [32,44,38]. That is, LS is applied iteratively while updating the values of the penalty weights w + after each call to LS. We call this sequence of calls to LS the weighting local search (WLS) according to [34,36]. This strategy is also referred as the breakout algorithm [31] and the dynamic local search [25] in the literature.
Let x denote the solution at which the previous local search stops. We assume that the original penalty weights are sufficiently large. WLS resumes LS from x after updating the penalty weight vectors w + , w − . Starting from the original penalty weight vectors ( w + , w − ) ← (w + , w − ), the penalty weight vectors w + , w − are updated as follows. Let x * denote the best solution obtained so far with respect to the original evaluation function z(x). Ifz(x) ≥ z(x * ) holds, WLS uniformly decreases the penalty weights by ( w + , w − ) ← β( w + , w − ), where 0 < β < 1 is a program parameter that is adaptively computed so that the new value of ∆z ↓ j (x) becomes negative for 10% of variables x j (j ∈ X). Otherwise, WLS increases the penalty weights by WLS iteratively applies LS, updating the penalty weight vectors w + , w − after each call to LS until the time limit is reached. Note that we modify 4-FNLS to evaluate solutions with both z(x) and z(x), and update the best solution x * with respect to the original objective function z(x) whenever an improved solution is found. WLS is formally described as follows. Note that we set the initial solution to x = 0 in practice.

Algorithm WLS(x)
Input: An initial solution x.
Output: The best solution x * with respect to z(x).
Step 1: Step 2: Apply 4-FNLS(x, w + , w − ) to obtain an improved solutionx . Let x be the best solution with respect to the original evaluation function z(x) obtained during the call to 4-FNLS(x, w + , w − ). Setx ←x .
Step 3: If z(x ) < z(x * ) holds, then set x * ← x . If the time limit is reached, output x * and halt.

Computational results
We report computational results of our algorithm for the SCP instances from [8,38] and the SPP instances from [10,20,27]. Tables 1 and 2 summarize the information about the original and pre-solved instances. The first column shows the name of the group (or the instance), and the numbers in parentheses show the number of instances in the group. The second column "z LP " shows the optimal values of the LP relaxation problems. The third column "z best " shows the best upper bounds among all algorithms and the settings in this paper. The fourth and sixth columns "#cst." show the number of constraints, and the fifth and seventh columns "#vars." show the number of variables. Since several preprocessing techniques that often reduce the size of instances by removing redundant rows and columns are known [10], all algorithms are tested on the pre-solved instances. The instances marked " " are hard instances that cannot be solved optimally within at least 1 h by the latest mixed integer programming (MIP) solvers. We first compare our algorithm with three of the latest MIP solvers called CPLEX12.6, Gurobi5.6.3, and SCIP3.1 [1], and a local search solver called LocalSolver3.1 [9]. LocalSolver3.1 is not the latest version, but it performs better than the latest version (LocalSolver4.5) for SCP and SPP instances. We also compare our algorithm with a 3-flip local search algorithm developed by [44] (denoted by YKI) for SCP instances. All algorithms are tested on a MacBook  1-5 (5) 149   Pro laptop computer with a 2.7 GHz Intel Core i7 processor, and are run on a single thread with time limits as shown in Tables 1 and 2.  Tables 3 and 4 show the relative gap (%) z(x)−z best z(x) ×100 of the best feasible solutions achieved by the algorithms under the original (hard) constraints. The numbers in parentheses show the number of instances for which the algorithm obtained at least one feasible solution within the time limit.
We first observe that our algorithm achieves good upper bounds comparable to those of the 3-flip neighborhood local search [44] for many SCP instances. We note that the 3-flip neighborhood local search algorithm introduces a heuristic variable fixing technique based on Lagrangian relaxation that substantially reduces the number of variables to be considered to 1.05% from the original SCP instances on average. We next observe that our algorithm achieves best upper bounds in 17 out of 42 instances for all SPP instances, including 7 out of 11 instances for hard instances in particular. We note that local search algorithms and MIP solvers are quite different in character. Local search algorithms do not guarantee optimality because they typically search only a portion of the solution space. MIP solvers, however, examine every possible solution, at least implicitly, in order to guarantee optimality. Hence, it is inherently difficult to find optimal solutions by local search algorithms even in the case of small instances,   (6) 0.00% (6) 0.00% (6) 0.00% (6) 13.89% (1) 1.60% (6) us01-04 (4) 0.00% (4) 0.00% (4) 0.00% (3) 11.26% (2) 0.04% (4) t0415-0421 (7) 0.66% (7) 0.60% (7) 1.61% (6) -(0) 0.92% (6) t1716-1722 (7) 7.63% (7) 15.94% (7) 2.76% (7) 36.09% (1) 1.70% (7) v0415-0421 (7) 0.00% (7) 0.00% (7) 0.00% (7) 0.05% (7) 0.00% (7) v1616-1622 (7) 0.00% (7) 0.00% (7) 0.00% (7) 4.60% (7) 0.09% (7) (41) while MIP solvers find optimal solutions quickly for small instances and/or those having a small gap between the lower and upper bounds of optimal values. In view of this, our algorithm achieves sufficiently good upper bounds compared to the other algorithms on the benchmark instances, particularly for the hard SPP instances. Tables 5 and 6 show the completion rate of the neighbor list by rows, i.e., the proportion of generated rows to all rows in the neighbor list. We observe that our algorithm achieves good performance while generating only a small part of the neighbor list for the large-scale instances. Tables 7 and 8 show the relative gap of the best feasible solutions achieved by our algorithm for different settings. The second column "no-list" shows the results of our algorithm without the neighbor list, and the third column "no-inc" shows the results of our algorithm without the improved incremental evaluation (i.e., only applying the standard incremental evaluation). The fourth column "2-FNLS" shows the results of our algorithm without the 4-flip neighborhood search (i.e., only applying 2-FNLS). Tables 9 and 10    algorithm are set to one. From the results, we can see that the overall computational efficiency of our algorithm was improved 16.90-fold and 17.26-fold on average for SCP and SPP instances, respectively in comparison with the naive algorithm that has neither the neighbor list nor the improved incremental evaluation (i.e., only applying the standard incremental evaluation), and our algorithm attains good performance even when the size of the neighbor list is considerably small. We also observe that the 4-flip neighborhood search substantially improves the performance of our algorithm, even though there are fewer calls to 4-FNLS compared to 2-FNLS.