The Set Covering Problem Revisited: An Empirical Study of the Value of Dual Information

: This paper investigates the role of dual information on the performances of heuristics designed for solving the set covering problem. After solving the linear programming relaxation of the problem, the dual information is used to obtain the two main approaches proposed here: (i) The size of the original problem is reduced and then the resulting model is solved with exact methods. We demonstrate the eﬀectiveness of this approach on a rich set of benchmark instances compiled from the literature. We conclude that set covering problems of various characteristics and sizes may reliably be solved to near optimality without resorting to custom solution methods. (ii) The dual information is embedded into an existing heuristic. This approach is demonstrated on a well-known local search based heuristic that was reported to obtain the most successful results on the set covering problem to this day. Our results demonstrate that the use of dual information signiﬁcantly improves the eﬃcacy of the heuristic both in terms of solution time and accuracy.

1. Introduction.With the boost in computing technology and the striking advances in linear programming (LP) solvers, many large-scale combinatorial optimization problems can now be solved in a reasonable time.Although the performance of integer programming (IP) solvers is not comparable to that of LP solvers, many moderate-size hard IP problems in academic and industrial contexts are being solved with an increasing success every passing day.Consider for instance the famous state-of-the-art CPLEX solver which has just become free for academic use.As of 2012, the latest version of its mixed integer linear programming solver is now on average 50% faster than its earlier releases in the last ten years.Even more impressive, this latest version has improved the performance of the previous version by 15% within merely six months (IBM, 2012).
Realizing these promising developments with the exact methods, we revisit the set covering problem (SCP) and conduct an empirical study on a set of problems that appeared in the literature over the last three decades.This famous problem is defined below.Definition 1.1 Given a collection S of sets over a finite universe U , a set cover J ⊆ S is a subcollection of these sets, whose union is U .When each set in the collection has an associated cost, then the set covering problem is about finding a set cover J such that the total cost is minimized.
Our motivation for selecting SCP is two-fold: First, SCP has a wide popularity among researchers and practitioners because a wide range of applications from scheduling to routing, and from manufacturing to telecommunications can be cast (possibly with side constraints) as set covering problems.Second, this wide interest allows us to review a large body of work from the literature as well as access many acknowledged and frequently studied set of problems.To obtain a fairly representative problem set, we have strived to compile from the literature not only the research problems but also some actual problems that arise in practice.
We need to emphasize that the need for fast and efficient heuristic methods persists especially for largescale combinatorial problems.On this account, SCP is no different.It is fair to say that unless N P = P, there will always be hard SCP instances that are intractable with the exact methods.On one hand the competition between heuristic and exact methods has become fiercer.On the other hand, it is also known that exact and heuristic methods can be complementary to each other.Leveraging on this idea, we also propose such a complementary approach by considering the LP-IP relationship, particularly, through the use of dual information.Based on our comprehensive empirical study, we shall indeed infer that the dual information is a significant source for designing new heuristics and exact methods with excellent empirical performance.While primal-dual heuristics for the SCP have been thoroughly investigated in approximate as well as exact algorithms, we believe that their full empirical potential is yet to be realized.
We take a first step in this direction.
We make the following research contributions: (i) We demonstrate that the dual optimal variables obtained by solving the LP relaxation of the SCP bear important information about the optimal solution of the SCP.(ii) We show that this dual information can also be used to increase the performance of the existing local search heuristics.(iii) We support our discussion with a comprehensive computational study on a large set of SCP instances that are widely used in the literature.Our results indicate that the proposed approach performs very well in terms of solution time and quality.Not only we find the best known or optimal solutions for most of the problem instances, for one particular hard problem, we even update the best known solution in the literature.This paper is organized as follows: Section 2 summarizes the SCP literature.In Section 3, we discuss the motivation for using dual information to efficiently solve the SCP.Section 4 starts with a thorough description of the compiled set of problems.Then, the numerical results are reported in two parts.First, we report our results with an integer programming approach based on solving a restricted problem formed by the columns with zero reduced costs in the optimal LP solution.Second, we incorporate the dual information into an existing local search heuristic and compare our results with those of the original heuristic.Section 5 concludes the paper and sets forth several future research directions.
2. Literature Review.SCP is long known to be N P-hard in the strong sense (Garey and Johnson, 1979).Therefore, many heuristic and enumerative algorithms have been developed to effectively solve the SCP.Exact algorithms generally rely on the branch-and-bound method to obtain optimal solutions (Balas and Carrera, 1996;Beasley, 1987;Beasley and Jornsten, 1992;Fisher and Kedia, 1990).Beasley (1987) uses subgradient optimization and a heuristic algorithm to bound the problem.Beasley and Jornsten (1992) employ the same method but improve the solution quality through Gomory f-cuts with a better branching strategy.Fisher and Kedia (1990) use a primal and a dual heuristic for bounding.Similarly, Balas and Carrera (1996) use a primal and a dual heuristic and a dynamic subgradient procedure and iteratively improve the bounds by variable fixing.
Since solving a large SCP with an exact method takes an excessively long time, sacrificing optimality but obtaining fairly good solutions within an acceptable time by means of a heuristic is a compromise.
Many researchers list various heuristics and approximation algorithms, and they show that their empirical performance is quite good (Caprara et al., 2000;Gomes et al., 2006;Grossman and Wool, 1997).In the literature, there are several approaches to develop a heuristic algorithm.Among these, we have greedy algorithms, randomized search, heuristics based on linear programming and Lagrangian relaxations, and the closely related primal-dual methods.The simplest algorithms are the greedy algorithms, which can be used to solve large-scale set covering problems in relatively negligible time.However, their myopic nature may easily yield solutions far from optimality.Haouari and Chaouachi (2002), Feo and Resende (1989), as well as Vasko and Wilson (1984) introduce randomness and penalization into the greedy algorithms to improve solution quality.Along this line, three local search heuristics appear in (Lan et al., 2007;Yagiura et al., 2006;Marchiori and Steenbeek, 1998).Finger et al. (2002) conduct an analysis on benchmark instances by measuring the correlation between the cost of a solution and the closeness to the optimal solution.This study gives useful insights to understand the problem structure and develop problem-specific local search algorithms.Several meta-heuristics have also been proposed for the SCP.
Among these, we can list simulated annealing (Brusco et al., 1999;Jacobs and Brusco, 1995), genetic algorithms (Aickelin, 2002;Beasley and Chu, 1996;Lorena and Lopes, 1997), tabu search (Caserta, 2007;Musliu, 2006;Kinney et al., 2004), ant colony optimization (Ren et al., 2010), and electromagnetism meta-heuristic (Azimi et al., 2010).In a recent study, Muter et al. (2010) devise a generic framework that uses information from the LP relaxation for promoting meta-heuristics to diversify or intensify while searching for the optimum of set covering-type optimization problems.Muter et al. also consider the role of dual information in their numerical study on the vehicle routing problem with time windows.First, they use the dual information for altering the randomized selection mechanism in the meta-heuristic.
With this new mechanism, the meta-heuristic is encouraged to generate routes (sets) that are more likely to have negative reduced costs.Second, the dual information is used to reduce the size of the column pool by removing those columns with higher reduced costs.Muter et al. report that the dual information does not increase the effectiveness of their algorithms.However, in this study, we assert the contrary through a fundamentally different setting and implementation.
Similar to our work in this paper, several studies design heuristics based on the Lagrangian relaxation or the LP relaxation of SCP (Caprara et al., 1999;Ceria et al., 1998;Hochbaum, 1982).The resulting primal-dual approach has been commonly used for approximating N P-hard optimization problems that can be modeled as IP problems, such as the metric traveling salesman problem, the Steiner tree problem, the Steiner network problem, and the set covering problem (Vazirani, 2002).Bar-Yehuda and Even (1981) are the first researchers who have considered a generic primal-dual approach to approximate the set covering problem.The basis of the primal-dual approach is finding only a feasible solution to the dual of the LP relaxation of the IP formulation of SCP presented in the next section.Using this solution, an integral solution for the SCP is constructed.Although the worst case performance of the primal-dual algorithm of Bar-Yehuda and Even is poor (Hall and Vohra, 1993), its empirical performance turns out to be much more promising.Therefore, several studies have sprung out of the primal-dual approach in the set covering literature (Bertsimas and Vohra, 1998;Melkonian, 2007;Williamson, 2002;Yelbay, 2010).
3. An Overview of Primal-Dual Methods.In this section, we discuss in-depth our motivation for using the relationship between the IP formulation of the set covering model and its LP relaxation.
In a nutshell, we gather dual information from the optimal solution of the LP relaxation, and then considerably reduce the problem size so that the resulting SCP can be solved by an IP solver with much less computational effort.
Before delving into the details of this approach, we first give the mathematical model of the SCP.
Using Definition 1.1, we obtain the integer programming model of the SCP as subject to x j ∈ {0, 1}, j ∈ S. (3) Here c j > 0 is the coverage cost of the jth set; x j is a binary variable, which is equal to 1, if j ∈ J; a ij is a binary parameter, which is equal to 1, if item i is covered by the jth set.The set of constraints (2) ensures that each item is covered by at least one set, and the constraints (3) impose integrality on the variables.If the cost of coverage is the same for each set; that is, the problem is called as the unicost set covering problem.When we consider the LP relaxation of the SCP, the integrality constraints (3) are replaced by simple bounds on the variables and a continuum of values is considered for the variables.The dual of the LP relaxation of SCP is then given by maximize subject to where the dual variables y i , i ∈ U correspond to the coverage constraints in the LP relaxation of ( 1)-( 3).
As mentioned previously, our motivation is to use the LP information to obtain an integer solution for the SCP.A straightforward approach is to solve the LP relaxation and then use the dual information to identify the columns with zero reduced costs.These columns can be considered as promising ones that should likely appear in the IP optimal solution.Along this line, for instance, Hochbaum (1982) solves the dual LP ( 4)-( 6) and constructs a set cover composed of all primal variables with a zero reduced cost.Such approaches fall into the general category of primal-dual methods.Primal-dual methods find a feasible solution for the (primal) IP model ( 1)-( 3) and a feasible solution for the dual LP model ( 4)-( 6).
In fact, the dual optimal solution can be obtained easily, since solving the LP model to optimality is not a major concern with the current status of the LP solvers.Using then elementary duality and the relation between the IP model and its LP relaxation, it is easy to see that the objective function values of a feasible IP solution and the optimal LP solution yield a pair of upper and lower bounds for the SCP, respectively.Therefore, the main drive behind the primal-dual methods is to find a way to minimize the gap between the objective function value of a feasible IP solution and that of an optimal or feasible dual LP solution.
This important relationship between the IP formulation and its LP relaxation has prompted us to concentrate on the best possible result that can be obtained by a primal-dual heuristic that only adds a set to the cover, if the associated reduced cost is zero with respect to a feasible solution of ( 4)-( 6).This consideration boils down to finding an optimal solution for the following mixed integer linear programming (MILP) model: subject to j∈S In this model the sets of constraints ( 8) and ( 10) ensure that dual and primal problems are feasible, respectively, and the constraints (9) prescribe that a primal variable has a zero reduced cost when it is set to 1.
Although the MILP model ( 7)-( 12) nicely encompasses the main idea behind most of the existing primal-dual approaches, it is important to note that solving the MILP model is much more challenging than solving the IP model ( 1)-( 3).However, one may still wish to solve the MILP problem for smallscale instances of SCP, since an assessment of its optimal objective function value and the corresponding optimal solution may give an insight whether a further investigation of applying a primal-dual approach is worthwhile.To test this proposition and support the main motivation of the current study, we first solve the MILP models of a large set of SCP instances with known optimal solutions, including the problem classes (b), (c), and (a) with the exception of the groups scpnrg and scpnrh.The details of the problem instances are presented in Section 4. Figure 1 displays the distribution of the percentage gap between the sum of the dual variables i∈U y i in the optimal solution of the MILP model ( 7)-( 12) and the optimal objective function value of the dual LP ( 4)-( 6).Similarly, it depicts the distribution of the percentage gap between the cost of the primal integer solution j∈S c j x j in the optimal solution of the MILP model and the optimal objective function value of ( 1)-( 3). Figure 1 has a very important implication; the feasible IP solution obtained from the proposed MILP coincides with the optimal IP solution in almost all problem instances.Furthermore, the sum of the dual variables resulting from solving the proposed MILP is equal, or very close, to the objective value of the optimal LP solution.The results obtained with the proposed MILP show that the dual optimal solution indeed yields valuable information that could help us select the optimal sets for the SCP.In the subsequent discussion, we concentrate on exploiting this dual information for solving a variety of SCP classes that arise in practice and in the literature.4. Computational Study.In this section, we conduct a set of experiments to support our idea that the dual information may be used to develop a mathematical programming based heuristic as well as to improve the performance of local search heuristics.We first define the problem classes and the experimental setup.Then, we present the results of a heuristic that uses the dual information to extract the most promising columns and then solves the SCP optimally over those columns.Finally, we also incorporate the dual information into a well-known local search heuristic and observe that its performance indeed improves significantly.(Oliveira and Pardalos, 2005).In this problem setting, items are points to be covered in two dimensional Euclidean space.Each point (a potential transmitter) is also associated with a set of concentric circles covering neighboring points.That is, each circle corresponds to a set in a set covering instance, and all points in the corresponding circle are covered when the set is selected.The cost of a set is typically modeled as a power function of the Euclidean distance between the center and the farthest point in the circle.
⋄ (c) Crew scheduling problems (16 instances): Fourteen of these are medium-scale realworld airline crew scheduling problems from American Airlines, and two of them are bus driver scheduling problems as described by (Balas and Carrera, 1996).⋄ (d) Railway problems (7 instances): These are large-scale real-world railway crew scheduling instances from Italian railways and are available in the OR-library (OR-lib, 2012).
⋄ (e) Hard cost and coverage correlated problems (30 instances): These are randomly generated instances based on the method given in (Rushmeier and Nemhauser, 2010).This method ensures that each row and column has at least two nonzero entries.The cost of a set is generated proportionally to the number of items included in the set.This class is known as the hard cost and coverage correlated set covering problems.⋄ (f ) Unicost problems (21 instances): This class includes various types of combinatorial optimization problems modeled as unicost set covering problems.Unicost problems are generally assumed to be more challenging relative to their non-unicost counterparts.The Steiner triple instances (labeled as "ST S") are regarded as the toughest problem set in this class.We refer to the OR-library (OR-lib, 2012) for a more detailed description of the instances in this class.
It is fair to state that some of the problems that we include in the compilation are not as widely studied as those repeatedly used in the literature.This is in fact necessary because most of the standard benchmark problems solved in many past studies should be considered as relatively easy for the current state-of-the-art solvers.For example, the groups of instances scp4, scp5, scp6, scpa, scpb, scpc, scpd in problem set (a) from the highly cited OR-library can be solved to optimality within less than a second by a standard IP solver.Similarly, one group of instances in problem set (e), the group scpe in problem set (f), and all of the instances in problem sets (b) and (c) are relatively easy.Consequently, in the sequel we focus on and report numerical results for the remaining hard instances only.
The LP and IP solutions in this study were obtained by ILOG IBM CPLEX 12.1 running on a personal computer with an Intel Core i5 processor and 4 GB of RAM.For all problem instances, the upper limit on the solution time is set to 7,200 seconds.The batch processing of the instances is carried out through simple C++ scripts.

4.2
The Role of Dual Information.We first work with a mathematical programming based heuristic that uses the dual information to extract the most promising columns and then solves the SCP to optimality over those columns.As we discuss in Section 3, this approach is motivated by the MILP model ( 7)-( 12).However, solving this model directly is too demanding, and so we attack it in two phases.
In the first phase, we solve the LP relaxation of ( 1)-( 3) and identify the most promising columns.In the second phase, we obtain an integer feasible solution to SCP by solving (1)-( 3) over these columns only.
We refer to this IP as the restricted IP or the restricted SCP.One option to construct the restricted problem is to use the columns with zero reduced costs in the first phase.Even though we reduce the size of the problem with this approach, we may still end up solving a large restricted IP.This is indeed a valid concern as we observed with some of the unicost problems, for which the size of the restricted IP is identical to that of the original problem.Therefore, as an alternate method, we propose to solve a restricted IP only over the columns that are basic in the optimal solution of the LP relaxation.There are two great benefits of this approach: we can reduce the size of the restricted IP considerably and we know its exact size in advance.
Tables 1-4 in the appendix report the computational results as well as some summary statistics.The optimal or best known objective values from the literature are provided in Column 4. The results retrieved from CPLEX by solving (1)-(3) are displayed in the next four columns.Observe that for the last two groups of instances scpnrg and scpnrh, CPLEX does not even return a feasible solution when it hits the time limit of two hours.The data corresponding to the restricted SCP set up with all sets with zero reduced costs (Restricted SCP-ZeroRC) in the optimal solution of the LP relaxation of SCP are presented in Columns 9-12.Results for the restricted IP solved over the basic columns (Restricted SCP-BC) in the optimal solution of the LP relaxation of SCP follow in Columns 13-16.In these tables, "OFV" stands for "objective function value", and "T LP " and "T IP " denote the computation times for solving a root relaxation and the total solution time, respectively.For Restricted SCP-ZeroRC and Restricted SCP-BC, the total times reported include the effort spent while solving the LP relaxation of SCP.Columns 9 and 13, labeled as |S ′ |, represent the number of sets in the restricted SCP.The gaps reported under "Gap IP " are calculated with respect to the best known results from the literature.Table 1 displays the results for the hard instances among the standard benchmark problems (a).These results indicate that the restricted SCP yields integer solutions to within 1% of the best known solutions for most of the instances.
There are only 2 instances where a single unit difference in the objective value results in a gap larger than 2% due to the small magnitude of the best known objective value.Moreover, for these problem instances there is no gain in solution quality from incorporating all columns with zero reduced costs into the restricted SCP.For the group scpnrh, Restricted SCP-BC even outperforms Restricted SCP-ZeroRC.
Results from a set of real-world large-scale railway crew scheduling instances are summarized in Table 2.
Remarkably, the percentage gaps are lower than those obtained for the standard benchmark instances.
Observe that none of these instances has been solved to optimality to date, and only the best known feasible solutions are available for these instances in the literature.CPLEX chokes on these instances and fails to produce a feasible solution at termination due to the time limit.The results retrieved from the restricted SCP are competitive with the best known solutions and even improve that associated with the instance rail2536.In addition, note that even the restricted SCP hits the time limit for three instances, and the corresponding solutions could potentially be improved by allowing more time.Table 3 depicts the results for the instances in class (e).For this problem class, the restricted SCP reduces the solution time from an average of 53.8 seconds to under 1 second with an average gap of 1.33%.Table 4 reports the results for the unicost problems (f).The structure of the unicost problems is different than those of the other problems given in our study.The number of columns is smaller than the number of rows.
The computational results imply that solving the LP relaxation and selecting those columns with a zero reduced cost does not decrease the problem size.Therefore, we do not observe any noteworthy benefit of using dual information for unicost problems, except for in a few instances.
Figure 2 depicts the empirical cumulative distributions of the percentage gaps and the computation times, respectively, grouped by class.The percentage gaps are calculated with respect to the best known solutions in the literature as in the Tables 1-4 in the appendix.These computational results reveal that there may only be a slight advantage of including all columns with zero reduced costs in the restricted SCP.Therefore, in these figures we only report the results obtained by Restricted SCP-BC.In these plots, each point indicates the cumulative fraction of instances with percentage gaps (or computation times) less than the corresponding value on the x−axis.The negative value on the horizontal axis of Figure 2(a) is due to the instance rail2536 for which we improve the best known objective function value.
These figures clearly illustrate that extremely good results can be obtained when the IP formulation (1)-( 3) is solved over the columns with zero reduced costs.For problem sets (a), (d), (e) with a nonunicost structure, the percentage gap does not exceed 2% except in a small fraction of the instances.The corresponding number for the unicost instances in the problem set (f) is higher.However, this different behavior is easily accounted for by observing that for these instances the size of the restricted SCP is generally identical to that of the original problem.If any reduction is present, it is small in magnitude.
Consequently, we conclude that the proposed mathematical programming based heuristic is more effective for non-unicost SCP instances.The relatively longer computation times observed in Figure 2(b) for some of the instances in problem classes (a), (d), and (f) can presumably be decreased at the expense of slightly larger percentage gaps.

Improving a Local
Search Method with the Dual Information.In the previous section, we demonstrated that the dual information embedded in the LP relaxation of SCP is a significant tool to extract the set of promising columns in an SCP problem.In this section, we rely on the same dual information to enhance the performance of an existing SCP heuristic.Ultimately, we would like to conclude that one may leverage on the dual information to design both simple mathematical programing based heuristics (see Section 4.2) and to elicit better feasible solutions from local search algorithms.
To serve this purpose, we use the well-known randomized local search heuristic Meta-RaPS (Lan et al., 2007) which, to the best of our knowledge, has not been outperformed in the literature so far.Meta-RaPS consists of two phases applied several times.In the construction phase, one set is randomly selected  (⋆) and computation times for the restricted SCP solved over the columns with zero reduced costs. (⋆) : The negative value on the horizontal axis of Figure 2(a) is due to the instance rail2536 for which we improve the best known objective function value.
among the set of best candidates and added to the set cover.This process is applied iteratively until all items are covered and a feasible solution to SCP is obtained.While determining the best candidates, Meta-RaPS uses one of the following four priority rules: c j /k 2 j , c j /k j , √ c j /k j and c j / k j , where k j is the number of currently uncovered items that could be covered by set j.Each construction phase is followed by a neighborhood search phase starting from the current solution.Here, some of the sets in the current feasible solution are removed from the cover, and feasibility is restored as in the construction phase.However, in this phase the search for candidates to be inserted into the set cover is restricted to a set of "promising" columns identified during the course of the algorithm in order to speed up the computations.The authors refer to this restricted pool of columns as the "core problem" and define it as the set of columns added to the candidate list during the construction phase.Lan et al. test different versions of Meta-RaPS on the standard benchmark instances (a).We implemented the best performing one, labeled as "Meta-RaPS with randomized priority rules and core problem definition."The algorithm uses a set of parameters, such as the number of iterations performed in the construction and neighborhood search phases, the percentage of the feasible solutions that are removed during the neighborhood search, and so on.Lan et al. present the values of all relevant parameters that they use to solve the standard benchmark instances (a).We also employ the same set of parameter values in our numerical experiments.
Our implementation of Meta-RaPS follows the original form described in Lan et al. (2007).However, to argue the value of dual information, we solve the problem only over the sets with zero reduced costs with respect to the optimal dual solution of the LP relaxation of SCP.Observe that this approach, referred to as Meta-RaPS-LP, is consistent with the definition of the restricted SCP in the previous section and allows us to test the value of dual information on a common ground.
Table 5 in the appendix summarizes the statistics on the percentage gaps and the solution times obtained by both Meta-RaPS and Meta-RaPS-LP on the standard benchmark instances (a).Due to the randomness inherent in the algorithms, we run them 5 times on each instance and report statistics for each instance.The computational results demonstrate that except for in a few instances Meta-RaPS-LP performs on a par with Meta-RaPS in terms of the solution quality at a much less computational effort.
This observation implies that the set of columns identified by the dual information often includes at least one optimal or near-optimal solution.Meta-RaPS yields a lower minimum gap in 5 instances while the corresponding figure for Meta-RaPS-LP is 4.However, Meta-RaPS-LP beats Meta-RaPS in 8 instances in terms of the average gap.The respective number for Meta-RaPS is 4. Table 5 also summarizes the computation times of the heuristic with and without the dual information.Like Lan et al. (2007), we report the time when the best solution is encountered for the first time.As previously, the reported times for Meta-RaPS-LP include the time to solve the LP relaxation of SCP.The results clearly state that we can achieve significant computational savings.Overall, the average solution time is reduced to 70.94 seconds from 237.14 seconds if we can tolerate a minor jump in average solution quality from 0.93% to 1.17%.Moreover, observe that this difference in solution quality is mainly attributed to the two instances scpnre2 and scpnrf 5.For these instances, the set of columns with zero reduced costs in the optimal LP solution lacks some crucial columns.Compare Table 1 with Table 5 to observe that the gaps of Meta-RaPS-LP for scpnre2 and scpnrf 5 are identical to those of the restricted SCP for these instances.That is to say, Meta-RaPS-LP could not do any better.If we ignore scpnre2 and scpnrf 5, then the average percentage gap of Meta-RaPS jumps to 1.04% while that of Meta-RaPS-LP is reduced to 0.69%.
We next test the effect of embedding the dual information on the solution quality and time for problem classes (d) and (e).In both of these problem sets, the sizes of the instances are larger relative to those in the standard benchmark set (a).Furthermore, the problem set (d) boasts some very large set covering instances.Thus, we increase the number of iterations spent in the neighborhood search to 500 from its original value of 400 and set the maximum time limit to 7,200 and 3,600 seconds for problems sets (d) and (e), respectively.Since using the dual information does not have a significant benefit for solving the unicost instances (f), we omit them from our subsequent numerical results.Tables 6-7 in the appendix summarize the results on the problem classes (d) and (e), respectively.Although the solution times attained by Meta-RaPS and Meta-RaPS-LP for the railway instances in Table 6 are close to each other, both the average and minimum percentage gaps achieved by Meta-RaPS-LP are always less than half of those by Meta-RaPS (the only exception is the average percentage gap for the instance rail516).Finally, Table 7 presents the results for the hard cost and coverage correlated problems (e).We achieve great reductions in the computational effort expended with better solution quality for this class of problems.
As indicated by a † sign, Meta-RaPS-LP finds a better solution relative to Meta-RaPS for all of the instances.
Figure 3 illustrates the information presented in Tables 5-7.It depicts the empirical cumulative distributions of the percentage gaps and solution times obtained by Meta-RaPS with and without the dual information.Figure 3 (a) shows that the dual information helps to decrease the percentage gaps in almost all instances (except for some of the standard benchmark problems).We observe in Figure 3(b) that when dual information is used, the empirical cumulative distributions of the computation times shift to the left as desired.The significant improvements in the percentage gaps for the railway instances are sometimes attained at the expense of increased solution times.
Finally, in Figure 4 we present a detailed analysis of the solution quality versus the solution time for problem sets (d) and (e).As the algorithm progresses, we take snapshots of the percentage gaps at different times.Figure 4(a) illustrates that we generally do not obtain a considerable decrease in the percentage gap as the solution time increases for railway instances.However, the benefit of using the dual information stands out clearly when we check the percentage gaps.After 5,000 seconds, the percentage gaps are less than 25% in 70% of the instances when we apply Meta-RaPS without the dual information.
Within the same duration, the maximum percentage gap decreases to 14% for the same percentage of the instances when we embed the dual information.Note that the final horizontal piece on the curve for MetaRa-PS-LP after 2,000 seconds is explained by observing that solving the LP relaxation of SCP takes more than 2,000 seconds for two out of the seven railway instances (see Table 2).Figure 4(b) summarizes the results of a similar analysis for the hard cost and coverage correlated problems.MetaRA-PS achieves a percentage gap of slightly less than 13% for 80% of all instances after 10 seconds.The percentage gaps are decreased to less than 4% for the same percentage of all instances in 10 seconds when we use the dual information.5. Conclusions and Future Research Directions.Our empirical study supports the claim that the dual optimal solution of the LP relaxation of SCP provides an important instrument for tackling this celebrated problem.By using the dual information, significant reductions in problem size and gains in solution quality and speed can be achieved for large-scale instances that are, otherwise, out of reach for off-the-shelf solvers.As our results demonstrate, there is a trade-off between incorporating all columns with zero reduced costs in the restricted SCP versus solving this IP over the basic columns only.Clearly, the former yields integer solutions of higher quality and suggests that an algorithm may benefit from visiting alternate optimal solutions of the LP relaxation of SCP.It is yet to be determined which of these multiple optimal solutions plays a more significant role in improving the IP solution.This may be an interesting path to explore for simple primal-dual heuristics as well as for more sophisticated local search methods proposed to solve SCP.
It is well-known that in many practical applications, such as vehicle routing, scheduling and so on, a large-scale SCP is solved within a branch-and-bound or a branch-and-price setting.In such a setting, the proposed approach here may be used to solve the integer programming problem formed by the columns of the restricted master problem.This approach may then give a better incumbent solution that could speed up the overall convergence of the optimal algorithm.
We also observed that a large class of standard benchmark instances for SCP can be solved very efficiently by standard exact methods.There is a clear need for gathering new problem sets for benchmarking purposes.However, we emphasize that most of the unicost problem instances remain hard for off-the-shelf solvers.
Finally, we embedded the dual information into a well known local search method and demonstrated that dual information improves both the solution time and quality.This improvement is more apparent for large-scale SCP instances.In future research studies, more sophisticated algorithms may be developed that make use of the dual information as proposed in this work.

Figure 1 :
Figure 1: The distribution of the optimality gap of the sum of the duals from the MILP model ("LP gap") and the distribution of the optimality gap of the primal integer solution from the MILP model ("IP gap").

4. 1
Problem Classes and Experimental Setup.Here are the problem sets and the testing environment that we used in our empirical study.Instances not available in the OR-library (OR-lib, 2012) can be downloaded from http://people.sabanciuniv.edu/sibirbil/scp/.⋄ (a) Standard benchmark problems from the OR-library (65 instances): This class includes randomly generated non-unicost instances used widely in the literature (OR-lib, 2012).⋄ (b) Eucledian-type cost and coverage correlated problems (320 instances): This problem class was first introduced in (Yelbay, 2010) motivated by the presence of a set covering structure in multicast routing in wireless ad hoc networks

Figure 2 :
Figure2: The empirical cumulative distribution of the percentage gaps(⋆) and computation times for the restricted SCP solved over the columns with zero reduced costs.(⋆): The negative value on the horizontal axis of Figure2(a) is due to the instance rail2536 for which we improve the best known objective function value.

Figure 3 :
Figure 3: The empirical cumulative distributions of the percentage gaps and computation times of MetaRa-PS and MetaRa-PS-LP for problem classes (a), (d), and (e).
−RaPS−LP (b) Hard cost and coverage correlated instances

Figure 4 :
Figure 4: The progress of the percentage gaps during the course of the algorithm.
Best known objective function value in the literature.⋄: Terminated due to time limit.

Table 4 :
Performance statistics for the restricted SCP on unicost problems (f).IP T LP Gap IP (%) |S ′ | OFV T IP Gap IP (%) |S ′ | OFV T IP Gap IP (%) Best known objective function value.⋄: Terminated due to time limit.
⋆: Optimal objective function value