Exploring Further Advantages in an Alternative Formulation for the Set Covering Problem

Carlos III University of Madrid, Electronic Technology Department, Leganés, Spain University of Alcalá, Physics and Mathematics Department, Faculty of Sciences, Alcalá de Henares, Spain Pontificia Universidad Católica de Valparaı́so, Escuela de Ingenieŕıa Informática, Valparaı́so, Chile University of Extremadura, School of Technology, Cáceres, Spain Universidad Diego Portales, Escuela de Ingenieŕıa Industrial, Santiago de Chile, Chile


Introduction
e set covering problem (SCP) is a classical problem shown to be NP-complete by Karp [1] and whose optimization version is NP-hard. Although this is a traditional problem, SCP is widely considered in the current scientific literature because it fits problems in relevant areas, such as engineering, vehicle routing, medical domain, and facilities allocation (see e.g., [2][3][4][5][6]).
Most contributions in the SCP field considered the traditional SCP formulation introduced by Chvatal [7] and defined as follows. Let E be a set of m objects and let S be a collection of n subsets of E, where each subset has a nonnegative cost associated. en, the purpose of SCP is to get a minimum cost family of subsets X ⊆ S, such that each element of E belongs to at least one subset of the family X.
is traditional formulation does not directly deal with two aspects: solution unsatisfiability and set redundancy. e solution unsatisfiability aspect is related to the possibility of generating unfeasible solutions during the search. e set redundancy aspect is related to the possibility of generating nonoptimized solutions in cost, including redundant components (subsets). e noninclusion of these two aspects in the formulation means that the solving method has to address them to ensure good performance.
In the last years, Bilal et al. [8] proposed an alternative SCP formulation. Its main contribution was that both redundant sets and unfeasible solutions were directly penalized in the fitness function. erefore, the solving method does not have to control such aspects in contrast to the traditional SCP formulation. Nevertheless, the contribution of the alternative formulation becomes questionable due to two main issues. First, Vasko et al. [9] demonstrated that the calculation effort to remove redundant components from an SCP solution is almost negligible.
us, including a redundancy removal operator in a solving method addressing the traditional formulation does not increase excessively the computational cost. Second, there are simple methods for transforming unfeasible solutions into feasible ones, such as the one proposed by Beasley and Chu [10]. As a consequence, alternative formulation seems not to be advantageous.
Analysing the work proposed in the alternative formulation in Bilal et al. [8], they compared the alternative formulation to the traditional one. To this end, they solved the standard Beasley's OR library [11] through two algorithms: a simple descent heuristic (DH) addressing the alternative formulation and a standard greedy heuristic (GH) addressing the traditional one. As a result, DH outperformed GH, which is valuable for justifying the alternative formulation. However, Vasko et al. [9] later applied the same GH using the traditional formulation on the same instances, but included a simple redundancy removal operator, obtaining better results than the ones shown in Bilal et al. [8] for DH. Once again, the alternative formulation seems to have questionable merit. However, this comparison initiated in Bilal et al. [8] might have some limitations: (i) e heuristic techniques considered might not be the most appropriate according to the current state of the art, in which metaheuristics, especially swarm intelligence algorithms (SIAs), provide the best results in general. (ii) e authors independently compared the two formulations using different algorithms. is focus could be correct because the algorithm best suited for a formulation could be very different, even in type, from the algorithm for the other formulation. However, there is no study combining aspects from the two formulations that could provide some advantages for the same solving method. (iii) e authors did not consider any statistical method for comparing both formulations. Instead, they only compare the average solution obtained.
On this basis, this paper questions whether there is any advantage from the concepts involved in the alternative formulation, beyond the novel problem formulation. is idea leads us to propose two studies focused on solution quality as a way to know if there is any concept in the alternative formulation, which could be considered for enhancing a method using the traditional formulation. To this end, the authors select two different metaheuristics adequate for the studies, although other metaheuristics could have been selected without loss of generality. e only demonstration of the concept utility from the alternative formulation is valuable. is means that future solving methods could include these novel concept.
is research focus implies that the authors are not focused on getting the best absolute results solving Beasley's OR library, but they understand that the solutions obtained should be reasonable, as will be discussed later. e first study focuses on identifying if there is any concept in the alternative formulation, which could be applied for guiding the search process of a solving method addressing the traditional formulation. To this end, the authors generate two versions of the same SIA addressing the traditional formulation. In the first version, (i), the search process of SIA is guided by concepts from the traditional formulation. In the second version, (ii), the search process of SIA is guided by concepts from the alternative formulation. Further details of this first study are as follows: (i) is study requires a solving method, whose search process is closely linked to the optimization problem. e ant colony optimization (ACO) algorithm meets this requirement, being very sensitive to the heuristic information operator designed based on the problem to solve. us, two heuristic information operators are considered: in (i), a usual operator based on the traditional formulation and in (ii), a novel operator based on concepts from the alternative formulation.
(ii) e two ACO approaches in (i) and (ii) include the same usual operator for removing redundant sets.
No operator for transforming unfeasible solutions into feasible ones is considered because ACO does not generate them. (iii) e two ACO approaches in (i) and (ii) are applied for solving Beasley's OR library. e results obtained were analysed through a widely accepted statistical method. Both approaches were tuned through the automatic method iterated F-Race, preventing errors from a manual method [12]. e second study focuses on identifying if there is any concept in the alternative formulation, which could be considered for designing the operators needed for transforming unfeasible solutions into feasible ones while removing redundant columns. Note that this type of operators is widely applied in most SIAs solving SCP. To this end, the authors generate two versions of the same SIA addressing the traditional formulation. In the first version, (iii), SIA considers the widely applied operator proposed in Beasley and Chu [10]. In the second version, (iv), SIA considers a novel operator inspired by concepts from the alternative formulation. Further details of this second study are as follows: (i) e second study requires a solving method, which could generate unfeasible solutions. e artificial bee colony (ABC) is one of the many metaheuristics meeting this requirement. us, two different feasibility operators are considered in (iii) and (iv). (ii) e two ABC approaches in (iii) and (iv) are applied for solving Beasley's OR library. e results obtained were analysed through a widely accepted statistical method. In this case, the parameter configuration is 2 Mathematical Problems in Engineering taken from the literature (see [13]) because the feasibility operator is considered as an external tool.
To summarize, the motivation of this research is to identify if there is any concept from the alternative formulation, which could be used to solve the traditional SCP problem. To the best of our knowledge, this is the first work performing this research. Figure 1 summarizes the main tasks performed in the two studies. e contributions to the field are as follows: (i) A first study is proposed to identify concepts of interest in the alternative SCP formulation, which could be applied for guiding a search method addressing the traditional SCP formulation. e study results in that the gain concept in the alternative SCP formulation is useful for guiding the search, outperforming the results obtained by a usual heuristic information operator from the literature.
(ii) A second study is proposed to identify concepts of interest in the alternative SCP formulation, which could be considered for designing feasibility operators. e study results in that the gain concept in the alternative SCP formulation is useful for designing this type of operators, outperforming a usual operator in the literature. is contribution is especially interesting because this type of operators is widely applied in the metaheuristic SCP field as a black-box method.
ese two contributions are especially interesting for future works, implementing techniques for solving the traditional SCP formulation. From the first study, it is shown that the gain concept in the alternative SCP formulation is useful for guiding the search. at means that this concept could be considered during the design of novel solving methods for SCP. From the second study, it is shown that the gain concept is useful for designing feasibility operators.
at means that this concept could be considered to improve techniques already shown to be useful for solving the problem, as well as proposing novel feasibility operators. As stated before, the second future scope line is especially interesting because feasibility operators are widely applied in the literature as black-box techniques outside the solving method. e rest of this paper is structured as follows. Section 2 discusses related work. In Section 3, a formal statement of both SCP formulations is provided. In Section 4, the main aspects of the ACO algorithm in the first study are discussed. Section 5 discusses the main aspects of the ABC algorithm in the second study. In Section 6, the experimental methodology followed is discussed and the solution quality results are analysed. Finally, Section 7 concludes and introduces future works. Table 1 includes a summary of the notation considered throughout this work.

Related Work
e literature about SCP is extensive. Some authors considered exact algorithms, such as branch-and-bound and branch-and-cut techniques [14][15][16] or linear programming [17][18][19]. More recently, Caprara et al. [20] compared several exact algorithms, concluding that the best exact technique was CPLEX.
It is well known that exact techniques require excessive computer resources on large problems. erefore, much effort was focused on exploring heuristic and metaheuristic algorithms, which could find near-optimal (or even optimal) solutions for large problems in reasonable computing time.
Starting from heuristic methods, Chvatal [7] applied a classical GH. Although GHs are simple and fast to implement, they seldom produce good quality solutions. Some researchers tried to improve GHs by adding randomness (see e.g., [9,[21][22][23][24]). Highly sophisticated heuristics based on Lagrangian relaxation were also considered, yielding very good solutions (see e.g., [20,[25][26][27]). From this brief review, it is shown that the number of proposals considering heuristic methods is limited for SCP in the last years. Note that in other optimization problems, the proposal of heuristics is usual (e.g., [28]).
Analysing the previous contributions according to the results obtained, exact algorithms provided excellent results, solving reduced SCP problems. Focusing on larger SCP problems addressed by approximate techniques (heuristics and metaheuristics), the authors check that heuristics do not provide as good results as the more sophisticated metaheuristics. us, the best results were usually obtained by SIAs. In this line, we should mention the valuable contributions of Naji-Azimi et al. [48,49] and Balaji and Revathi [57] who got optimal or near-optimal solutions for classical SCP benchmarks.
All the works listed before have in common that they considered the traditional SCP formulation. On the contrary, the alternative formulation received limited attention. As far as the authors know, there are only two works Percentage of columns to be removed during the generation of a solution in ABC. aη t j Heuristic factor of column j ∈ R t−1 at step t ≥ 0 for the alternative formulation, η(a) t j > 0. a i,j Value in the cell (i, j) of A. It equals 1 if the j-th column covers the i-th row and 0 otherwise, i ∈ I, j ∈ J. argmax · { } Point/points in which a function gets its maximum value/values. argmin · { } Point/points in which a function gets its minimum value/values. aϕ t j e sum of the gains of covering the noncovered rows which could be covered by column j ∈ R t−1 at t ≥ 0. β Relative importance of heuristic information, β ≥ 0. C Set of costs, C � c 1 , c 2 , . . . , c n . cη t j Heuristic factor of column j ∈ R t−1 at step t ≥ 0 for the classical formulation, η(c) t j > 0. c j Cost associated to the j-th column, c j ∈ R + , j ∈ J. c min (e i ) Cost of the cheapest set among the sets covering e i , c min (e i ) ∈ C, i ∈ I. cϕ t j Number of noncovered rows which could be covered by column j ∈ R t−1 at step t ≥ 0. Δ t Solution generated by an ant at step t ≥ 0,  Figure 1: Summary of the main tasks performed in the two studies in this paper.
considering the alternative formulation. In Bilal et al. [60], they solved an SCP variant through an iterated tabu-search metaheuristic. In Crawford et al. [61], they compared the results obtained solving the traditional and alternative formulations through the ACO algorithm. e research presented in this paper was inspired by a very preliminary work discussed before (see Crawford et al. [61]). In this contribution, there is no study regarding the existence of concepts in the alternative formulation, which could be considered for solving methods addressing the traditional formulation. In Lanza-Gutierrez et al. [56], the authors applied an SIA to solve SCP by a CSO algorithm but with a completely different approach.

Set Covering Problem Statements
Let I � 1, 2, . . . , m { } and J � 1, 2, . . . , n { } be the row and column sets, respectively. Let E � e 1 , e 2 , . . . , e m be a universe of m elements and let S � s 1 , s 2 , . . . , s n be a collection of n subsets of E, such that s j ⊆ E and ∪S � E, with j ∈ J. Each subset s j has a non-negative cost associated c j ∈ C, where C � c 1 , c 2 , . . . , c n . e optimization problem is formally defined by assuming a binary matrix A of m-rows and n-columns, where the rows are the elements of the universe and the columns are the subsets. Let a ij be the value in the cell (i, j) of A given by for i ∈ I and j ∈ J, where e i ∈ E. us, e objective of SCP is to find a subset of S covering (containing) all the elements of E at a minimal cost. A solution to SCP is usually expressed as a binary vector X � x 1 , x 2 , . . . , x n , where x j � 1, if the set s j is part of the solution, 0, otherwise.
en, the cost of the solution X is j∈J c j x j .
An SCP solution expressed as binary vector, Average cost obtained from a distribution, z ∈ R + . z max Maximum cost obtained from a distribution, z max ∈ R + . z min Minimum cost obtained from a distribution, z min ∈ R + . z opt Optimum solution of a given instance, z opt ∈ R + . z t Column selected at step t ≥ 0, z t ∈ R t−1 .
Next, we give a formal statement of the two SCP formulations.

Traditional Formulation.
e SCP fitness function is en, given m elements and n subsets, the objective is to find a collection of subsets to subject to e constraint in equation (7) ensures that each row is covered by at least one column. If this constraint is not satisfied, the solution is considered unfeasible. e constraint in equation (8) is only for the integrity of the mathematical programming. Hence, this equation does not need to be addressed as a constraint in heuristic approaches.

Alternative Formulation.
In this formulation, covering an element is identified with collecting a gain at a given cost. Let c min (e i ) ∈ C be the cost of the cheapest set among the sets covering the element e i given by where argmin · { } provides the point/points in which a function gets its minimum value/values. en, the gain g i ∈ R + of covering an element e i is where ψ ∈ R + is a very small positive constant. Based on this gain concept, the SCP fitness function is where en, given m elements and n subsets, the objective is to find a collection of subsets to max f 1 ′ , (13) subject to x j , y i ∈ 0, 1 { }, ∀i ∈ I, ∀j ∈ J. (15) e constraints in equations (14) and (15) are only for the integrity of the mathematical programming. According to this formulation, there are no unfeasible solutions as happens with the traditional formulation. Note that unfeasible solutions still exist for the problem. However, the alternative formulation penalizes such issue instead of discarding the solution. Moreover, it also penalizes directly redundant sets beyond having a higher cost as occurs for the traditional formulation.
us, the use of redundancy removal operators is not needed, in contrast to the traditional formulation, where it is highly recommended.

Ant Colony Optimization
e ACO algorithm is inspired by ant colony behaviours. e ACO process is focused on the search of the optimal path in a graph based on an artificial ant colony. us, ants work cooperatively and communicate through heuristic information depending on the problem and pheromone trails. Pheromone trails are a type of distributed information, which is dynamically updated by the ants. Pheromones keep the experience gained during the search process while remarking promising areas of the search space.
Let Δ t−1 ⊆ J be the solution generated by an ant at Reviewing the scientific literature [42,62], a usual heuristic information expression for a column j ∈ R t−1 at step t is where cϕ t j is the number of noncovered rows in Δ t−1 , which could be covered by column j at step t. is value is where ‖·‖ is the cardinal of a set and I j denotes the row set covered by column j, for j ∈ J and I j ⊆ I. In this work, we propose a heuristic information inspired by the gain concept from the alternative formulation introduced in Section 3.2 as where aϕ t j is the sum of the gains of covering the noncovered rows in Δ t−1 by column j at step t. us, where g i is given in equation (10).
To simplify the notation, we define the heuristic information for a column j at step t based on whether we consider the traditional formulation or the alternative one. at is, Algorithm 1 shows the procedure of a general ACO. Next, the main steps are detailed.
(1) Initialization: in the beginning, we propose to preprocess the SCP instances by using column domination and column inclusion [18]. Next, the algorithm parameters are initialized. Traditionally, ACO algorithms do not include an initialization step to generate the ps aco solutions in the population. Instead, pheromone trails are randomly assigned and then solutions are generated according to this random information. at means that the algorithm could need to run some iterations before having the right information about the solution component quality. At this point, we propose to include a greedy population initialization step in ACO based on Lu and Vasko [48]. is step corresponds to line 1 of Algorithm 1. (2) Solution construction method: each ant starts with an empty solution where columns are added iteratively until all rows are covered. Consequently, this strategy causes all solutions generated to be feasible. Most ACO-based algorithms consider a similar state transition rule, preferring solution components with high pheromone and heuristic values (see e.g., [42,62]. A possible way to generate solutions is the single row oriented method (SROM) proposed by Ren et al. [43].
In that work, it was demonstrated that SROM reduces the computation burden compared to other methods. us, SROM is used in this paper as the solution construction method. Additionally, we also consider the ant colony system (ACS) proposed by Dorigo and Gambardella [63] as an extension of the ACO algorithm. ACS includes a pseudo-random-proportional rule, providing a direct way to balance between exploration and exploitation during the selection of the solution component. If z t ∈ R t−1 denotes the column selected at step t, then the ACS rule is where q is a random number uniformly distributed in [0, 1], q 0 ∈ [0, 1] is a parameter determining the relative importance of exploitation versus exploration, ζ t ∈ R t−1 is the column provided by SROM at step t, and arg max · { } provides the point/points in which a function gets its maximum value/values. us, if q ≤ q 0 , then it returns the nonselected column having the highest value of (τ j ) α (η t j ) β at step t, where τ j > 0 denotes the pheromone trail of column j and α ≥ 0 and β ≥ 0 denote the relative importance of pheromone trails and heuristic information, respectively.
is step corresponds to line 4 of Algorithm 1. (3) Local search: it is well known that local search is effective to improve ACO performance. We consider the local search proposed by Ren et al. [43], where for each column in Δ t , the algorithm determines if the column should be removed or replaced by one or more columns while keeping solution feasibility. is step corresponds to line 5 of Algorithm 1.
(4) Update pheromone trails: we consider that pheromone trails are updated based on the max-min ant system (MMAS) approach proposed by Stützle and Hoos [64].
In this method, after each ant generates a full solution, all pheromone trails are decreased uniformly to simulate evaporation, forgetting part of the historical experience. Next, a small amount of pheromone is deposited on the columns corresponding to the best solution found. To this end, MMAS considers the best solution found in the current iteration, instead of the best solution found from the beginning of the algorithm. We opted for the second option as did Ren et al. [43]. us, the search can concentrate fast around the best solution found. is strategy could result in a bad performance if the algorithm is trapped in bad solution areas. However, this risk is reduced due to the ACS strategy detailed in Step 2. Formally, pheromone trails are updated following where ρ ∈ [0, 1) is the pheromone persistence and ω j ≥ 0 is the amount of pheromone put on column j provided in Stützle and Hoos [64]. Additionally, they also proposed that the range of pheromone where ϵ ∈ (0, 1) denotes a ratio coefficient and L is the best solution found from the beginning of the algorithm, L⊆J. is step corresponds to line 7 of Algorithm 1.

Artificial Bee Colony
e ABC algorithm is inspired by honey bee behaviours, the search process being guided by three types of artificial bees: workers, onlookers, and scouts. e general procedure of ABC is shown in Algorithm 2. It starts by generating an initial population of ps abc solutions. For every row, a random column with covering possibilities is selected until all rows are covered. Next, along iterations, the population is managed by n w − ps abc workers and 1 − n w onlookers, which are randomly recruited in each iteration. e behaviour of each bee is as follows: (i) A worker takes a random solution from the population to generate a new solution by adding a random number of columns between 0 and abc add (in percentage) of columns in the SCP instance. is step is followed by an elimination of random columns between 0 and abc eli (in percentage) of columns in the SCP instance. e fitness value of the individual generated by the worker is obtained. In the case that the fitness value of the new individual is better than the previous individual assigned to the worker, then the new individual replaces the previous one. In the opposite case, the counter is increased for the number of trials for improving the current solution. Otherwise, the counter is set to zero. If such counter reaches the limit threshold, the worker is transformed into a scout bee.

Mathematical Problems in Engineering
(ii) An onlooker generates a new solution following a similar procedure as for workers, but selecting the solution with probability to its quality, instead of randomly. e concept of the limit threshold is not used in onlookers. (iii) A scout discards its current solution and generates a new one by following the same strategy as for generating the initial population. As expected, the counter of trials is initialized to zero.
As both workers and onlookers can generate unfeasible solutions because of the random elimination of columns, it is mandatory to manage this issue. Crawford et al. [13] proposed to consider the usual heuristic by Beasley and Chu [10] for transforming unfeasible solutions into feasible ones while reducing the cost of the solution in a later step. is heuristic is shown in Algorithm 3. Here, the first stage in lines 3 − 7 transforms an unfeasible solution into a feasible one. e second stage in lines 8 − 12 removes redundant columns. In this algorithm, note that Δ, α i , and ξ i are the set of columns in a solution, the set of columns that row i ∈ I covers, and the number of columns in Δ that cover the row i, respectively.
Focusing on the first stage, the steps required to make a solution feasible include the identification of uncovered rows and the addition of columns to the solution so that all rows are covered. e search for the missing columns in the proposal of Beasley and Chu [ where V is the number of noncovered rows in the solution, i.e., the ratio between the cost of a column and the number of noncovered rows, which could be covered by such column.
As an alternative strategy, this paper proposes to guide the search based on the concepts from the alternative SCP formulation, that is, i.e., the ratio between the cost of a column and the sum of the gains of covering the noncovered rows by such column.

Experimentation
is section discusses the experimental methodology and analyses the results obtained in the first and second studies.

Experimental Methodology.
We apply the two approaches in each study for ACO and ABC algorithms to solve Beasley's OR library. is dataset is widely used to report empirical results in the current literature (see e.g. [9,40,48]).
is library includes 65 non-unicost instances generated randomly, as detailed in Table 2. For further details about the random generation of these instances, see [18,65]. For each instance in the library, the number of rows, the number of columns, and the cost of each column are provided. Additionaly, for each row, the number of columns that covers and also the list of columns which cover that row are also provided. For a complexity study of the search space in this benchmark, we refer readers to the work by Finger et al. [66], which considered the fitness-distance correlation landscape metric to this end. In Table 2, "Density (%)" contains the percentage of 1′s in the A matrix in equation (2). "Optimal solution" shows two possible values, known and unknown, according to whether the instances have a solution tested to be optimal, or instead it could not be checked because of problem complexity. us, we only know the best historical solutions found for the sets nrg and nrh.
We combine two stop condition criteria for performing the experimentation: reaching a given number of fitness evaluations or getting the optimal solution. If at least a condition holds, the algorithm ends. For ACO, we assume 10,000 fitness evaluations as a stop condition. As we will discuss later, this value is enough for performing the experimentation. For ABC, we consider 500 iterations based on Crawford et al. [13].
Before running the experimentation, we should configure both algorithms. In the case of ABC, we can assume the parameters provided in Crawford et al. [13] for the two approaches of ABC considered here because (i) the authors also solved SCP and (ii) the approach based on the alternative formulation only modifies the heuristic operator for solution feasibility and the operators guiding the search are not modified. In the case of ACO, we should configure the two approaches of ACO considered here because (i) we do not have any set of parameters from previous works for the approach of ACO used and (ii) the approach based on the alternative formulation modifies how the search is performed in comparison to the traditional one, and then we should configure the two approaches independently.
us, for the first study, we consider ps abc � 200, n w � 100, limit � 50, abc add � 0.5%, and abc eli � 1.2% of Crawford et al. [13]. For the second study, we get the parameters of the two ACO approaches using F-Race. is method configures a metaheuristic starting with a set of candidate values for each parameter. en, it discards bad performance configurations as soon as statistically sufficient evidence is reached against them, focusing on the most promising ones.
Concretely, we consider the iterated F-Race implementation for R software by López-Ibáñez et al. [67]. Following the authors' recommendations, we divided the benchmark into three groups according to the problem size (m × n) to get a consistent configuration. us, Group A includes instance sets 4, 5, and 6; Group B includes instance sets a, b, c, and d; and, finally, Group C includes instance sets nre, nrf, nrg, and nrh. Table 3 shows the candidate values for each parameter based on previous works [43] and the configurations obtained for each group and ACO approach. Note that d-ACO denotes ACO with the traditional heuristic information expression and n-ACO denotes ACO with the alternative heuristic information expression.
Once both ACO approaches are configured, 30 independent runs are performed for each instance and algorithm. Next, we analyse if there are significant differences between the behaviour of the two algorithms regarding solution quality and execution time for each instance. To this end, the authors consider the Wilcoxon-Mann-Whitney test [68] to validate several hypotheses. e implementation of this test is the one provided in the assessment performance tool described in Knowles et al. [69] and available in Fonseca et al. [70].
However, indirectly the RPD metrics used for the assessment evaluation consider the optimal solution for the instances, which can be considered as the solutions provided by the corresponding exact techniques. us, the RPD metric evaluates how far the solution found by the metaheuristic is from the optimal solution provided by an exact technique.
As a solution quality metric, we consider the relative percentage deviation (RPD), which evaluates how far the solution found by the metaheuristic is from the optimal solution known in the literature. e lower the RPD value, the better the solution obtained. us, indirectly, this metric evaluates the performance of the technique in comparison with the corresponding generic exact technique solving the same problem instance. ree RPD metrics are included: the average RPD, rpd ∈ [0, 1]; the minimum RPD, rpd min ∈ [0, 1]; and the maximum RPD, rpd max ∈ [0, 1]. ey are calculated as Mathematical Problems in Engineering where z, z min , and z max denote the average solution cost, the minimum solution cost, the maximum solution cost from a distribution of 30 samples solving a instance, respectively, and z opt is the optimum solution cost of the instance. Note that the cost of the best solution found during one run is given by equation (4). us, although both SCP fitness formulations are different, we can compare the results obtained without loss of generality.
Regarding the computing platform considered to perform the experimentation, the authors used two computing nodes in a computing cluster. Each node has two 2.33 GHz Intel Xeon E5410 with four cores each and a 1600 MHz DDR3 16 GB RAM, running a Linux operating system. All executions were performed in a single core without parallelism because the goal of this paper is not to explore parallelism. e reason for considering such unconventional infrastructure is the possibility of performing many independent executions because of the needs for the statistical test required to validate the proposal. at means that for a single execution or a reduced set of them, a conventional computer could be considered. To avoid the operating system tasks affecting the total computing time obtained during the experimentation, one core in each computing node was idle. Additionally, the authors also checked that the RAM in the computing node was enough to not apply memory swap. As expected, the computing power capacity of the processor definitively affects the time required to find the solution to the problem, most of the operations being related to CPU computing and accessing the principal memory (RAM). Note the same computing nodes are considered for all the experiments in this work to not bias the conclusions reached regarding computing time.
Regarding programming languages, ACO algorithm was fully implemented in Java for Java Development Kit (JDK) 1.7. ABC algorithm was fully implemented in C. e scripts for managing the executions and collecting the results were implemented in bash. Note that the usage of two different programming languages for implementing ABC and ACO does not affect the conclusions reached in computing time.
is fact is because ABC and ACO computing times are not compared in this work. Tables 4 and 5 show for each instance and case study, the RPD metrics (rpd, rpd min , rpd max ), average execution time reaching the stop condition (time(s)), and average fitness evaluations needed for reaching the best solution found during the exploration (evals). In both tables, lower rpd and time(s) values are given in bold for each instance. In Table 5, d-ABC denotes ABC with the default heuristic feasibility operator and n-ABC denotes ABC with the heuristic feasibility operator based on the alternative SCP formulation.

Analysis of the Experimental Results.
Analysing both tables regarding RPD metrics, we check that (i) n-ACO seems to outperform or match d-ACO in most instances and (ii) n-ABC seems to outperform or match d-ABC in most instances. Focusing on computing     times, we reach a similar behaviour, where n-ACO appears to need a shorter time than d-ACO, except for b, c, and nrh instances, and n-ABC appears to need a shorter time than d-ABC in general. Focusing on evaluations, the evals field is related to the number of iterations reached as follows. In ACO, ps aco evaluations are performed for each iteration. In ABC, a number of evaluations varying between ps abc and two times ps abc are performed for each iteration. us, for ABC, the maximum number of evaluations will be a value in the range [100, 000, 200, 000]. For ACO, the maximum number of evaluations will be 10,000 as defined before. Analysing the evals field, the authors reach that the number of evaluations needed is distant from the stop condition defined, and then the stop condition is adequate in both studies. Table 6 shows the average RPD metrics for each instance group, where "ipv" field denotes the percentage of improvement by considering the alternative approach of the algorithm instead of the default version. Analysing this table, it is observed that (i) n-ACO provides better RPD values than d-ACO for all the groups and (ii) n-ABC also provides better RPD values than its default version. e RPD metrics obtained are in line with other works from the literature, with RPD values lower than 1.0%. In this regard, Table 7 shows the values of some recent successful approaches solving the problem. However, we should remark that the purpose of this work is not to outperform other techniques solving the standard SCP benchmark.
At this point, it seems that the alternative approach of the algorithms provides better performance in both cases. However, we do not know if the differences observed are significant. To this end, the statistical methodology procedure described by Lanza-Gutierrez et al. [56] was applied. First, we removed all possible outliers. en, we analysed the normality of data, obtaining that we cannot assume normal distribution in any case. Consequently, the median should be considered as average value for calculating z in equation (25).
Next, we study if there are significant differences in the solution quality of the algorithms. Starting with the first study, we consider the Wilcoxon-Mann-Whitney test with hypotheses H 0 : rpd a ≥ rpd b and H 1 : where rpd a and rpd b are the average RPD of the algorithm a and b for a given instance, respectively. e p values obtained for each instance and ACO approach are shown in Table 8 under the title RPD analysis, where p values lower than the significance level α � 0.05 are given in bold, i.e., the confidence level is 0.95. Note that the unilateral test performed between the two possibilities was the one that matches with the descriptive analysis, the other test being marked with a dash in the table. Also note that in case of equality between the average RPD values, the two unilateral tests are performed. For the second study, we consider the Wilcoxon-Mann-Whitney test with similar hypotheses as before H 0 : rpd c ≥ rpd d and H 1 : e p values obtained are also shown in Table 9 with the same notation as in Table 8.   [48] TLBO20 -0.06 -Lu and Vasko [48] TLBO10 -0.09 -Naji-Azimi et al. [49] EM-like -0.20 -Lu and Vasko [48] TLBO -0.28 - where time(s) a and time(s) b are the average execution time of the algorithm a and b for a given instance, respectively.
e p values obtained for each instance and algorithm are shown in Table 8 under the title Execution time analysis, where p values are given in bold with the same criterion as before. For the second study, we consider the Wilcoxon-Mann-Whitney test with similar hypotheses as before e p values obtained are also shown in Table 9.
Based on the previous statistical analysis, Table 10 shows the percentage of cases where an algorithm provides the best significant performance compared to another for each study in terms of RPD and execution time. Focusing on the first study and RPD values, we verify that n-ACO provides better behaviour than d-ACO in 52.31% of cases. However, it is important to remark that d-ACO never provides better results than n-ACO, meaning that n-ACO clearly outperforms d-ACO. For execution time, n-ACO needs lower execution times than d-ACO in 55.38% of cases and d-ACO needs lower execution times than n-ACO in 23.08% of cases. is fact could mean that the alternative heuristic information needs higher execution times under certain conditions. However, most cases in which d-ACO needs lower execution times correspond with instances whose optimal solution is not reached, and then the 10, 000 evaluations are performed for each algorithm, e.g., for instances a 1, a 2, a 3, c 1, c 2, c 3, c 5, nrh 1, nrh 2, nrh 3, and nrh 4. In such unfavourable cases, the differences observed are not of concern as shown in Figure 2(a). On the other hand, most cases in which n-ACO needs lower execution times correspond with instances whose optimal solution is reached. In such cases, a greater difference is observed favoring n-ACO. is fact is because n-ACO reaches the stop condition before d-ACO, e.g., for instance sets 4, 5, 6, d, nre, and nrg. Focusing on the second study, we verify that n-ABC provides better behaviour than d-ABC in 63.64% of cases, where d-ABC outperforms n-ABC in 7.27% of cases. is unfavourable situation occurs in small instances, where the search space is reduced. For execution time, n-ABC needs lower execution times than d-ABC in 69.23% of cases and d-ABC needs lower execution times than n-ABC in 4.62% of cases. As before, this unfavourable situation mainly occurs when n-ABC does not reach the optimal solution, penalizing the additional computation of the gain concept in the alternative SCP formulation. is behaviour is shown in Figure 2 e previous analysis is completed with the landscape study in Tables 11 and 12 for the solutions obtained solving the instances through ACO and ABC, respectively. e metrics in such tables quantify solution quality (QMetric ∈ [0, 1]), the rate of success (SRate ∈ [0, 1]), and speed of reaching a solution (SSpeed ∈ [0, 1]). QMetric follows an exponential formulation which allows distinguishing between the performance of two algorithms, which obtained solutions close to the optimum fitness. SRate is defined as the number of successful runs that the algorithm reaches the optimum fitness divided by the total number of runs. SSpeed quantifies the number of evaluations taken to reach the optimum fitness. For the three metrics, the value 1 indicates the highest quality. More details about the three metrics, as well as formulation, are listed in [71]. Analysing Tables 11 and 12, we check that, in general, both n-ACO and n-ABC provide a higher or equal QMetric than d-ACO and d-ABC approaches. For SSpeed, we check that both d-ACO and d-ABC need a lower or equal number of evaluations than n-ACO and d-ABC to reach the optimum fitness. For SRate, we check that both d-ACO and d-ABC reach the optimum fitness a greater or equal number of times than n-ACO and d-ABC. at means that n-ACO and n-ABC obtained better quality solutions, are better in convergence, and provide a more robust performance than the default approaches.
Up to this point, we know that the concepts included in the alternative formulation positively affect the search process in the first study, where the concepts from the alternative formulation are considered for guiding purposes. A mapping of the solutions visited by n-ACO and d-ACO could help to effectively show how the two approaches explore the search space. To this end, we consider the mapping method (MaM) proposed by Autuori et al. [72], where a mapping function converts a multidimensional space solution in one dimensional space through two steps: (i) a binary conversion is applied to the solution (for SCP,     is representation is used for identifying the different zones explored by the algorithms. Note that the number of zones corresponds with the number of elements in the binary encoding (the number of columns). To this end, all the binary representations are added, resulting in a frequency diagram, showing how usually the algorithm includes a column in a solution. Table 13 shows three metrics analysing the frequency diagrams previously generated for each ACO approach and an instance from each group 4, 5, 6, a, b, c, d, nre, nrf, nrg, and nrh. Note that the frequency diagrams were generated using all the solutions built in the 30 runs for each algorithm. e metrics used were also proposed by Autuori et al. [72] and are (i) the number of unexplored zones ("uz"), the number of explored zones ("ez"), and the number of large explored zones ("lez"). e metrics are related as follows. Convergence is considered high if lez is few in number. Diversity is considered good if the number of ez is large. Analysing Table 13, we check that ez is usually higher for n-ACO than for d-ACO, meaning that d-ACO improves diversity during the search compared to the traditional approach. is fact is especially relevant for large instances, as occurs for nrg_1 and nrh_1, where ez metric is significantly large. Focusing on lez, we check that the differences observed between both approaches are not as pronounced as for ez. However, such differences could mean that d-ACO has a lack of convergence during the search, and then future authors should manage this fact. From these two studies solving the traditional SCP, we verify that (i) the concepts from the alternative formulation are useful in guiding the search process of a metaheuristic and (ii) the concepts from the alternative formulation are useful in updating a usual heuristic feasibility operator from the literature. e improvement in both studies was observed in terms of the solution quality, the rate of success, and the speed of reaching a solution. e conclusion in (ii) is especially interesting because this type of operators is generically applied in metaheuristics (generating unfeasible solutions), and the proposal could be directly incorporated into many solving methods.

Conclusions and Future Scope
Traditionally, SCP is formulated without addressing two issues: solution unsatisfiability and set redundancy, meaning that the solving method has to implement mechanisms to control such aspects. In recent years, an alternative SCP formulation was proposed, whose main contribution was that both issues were directly addressed by including penalties in the fitness function.
Reviewing the current scientific literature, we check that the alternative SCP formulation has received limited attention. Hence, we question whether there is any advantage of using this formulation beyond addressing set redundancy and feasibility aspects.
is idea led us to propose two studies based on a metaheuristic approach.
e aim is to identify if there is any concept in the alternative formulation, which could be considered for enhancing a solving method using the traditional formulation. e first study considers an ACO algorithm in two contexts: (i) solving the problem by addressing the traditional SCP formulation and (ii) solving SCP addressing the traditional formulation but using concepts from the alternative one for guiding the search. e second study considers an ABC algorithm in two contexts: (i) solving SCP addressing the traditional formulation and (ii) solving SCP addressing the traditional formulation but including concepts from the alternative one for updating a usual heuristic feasibility operator from the literature.
As a result of the first study, the authors conclude that it is possible to consider the gain concept from the alternative SCP formulation to successfully guide ACO search addressing the traditional SCP formulation. e benefits of the novel guide are shown in terms of solution quality, convergence, execution time, and diversity. From the second study, the authors conclude that it is possible to consider the gain concept from the alternative SCP formulation to update a feasibility operator from the literature. e benefits of the novel feasibility operator are shown in terms of solution quality, execution time, and convergence. e first conclusion is interesting for designing novel guide strategies for solving the traditional SCP formulation. e second conclusion is especially interesting because feasibility operators are widely considered in metaheuristic approaches solving the SCP because of the usual generation of unfeasible solutions. is type of operators is integrated into the solving method as black-box methods. at means that it is straightforward to interchange one method for another. is situation implies that the feasibility operator based on the alternative SCP formulation presented here could be integrated with a reduced effort in already published works from the literature, as well as in future works to evaluate each specific use case.
As future lines of research, it would be interesting to consider additional metaheuristics to this study, as well as a larger dataset with bigger problems. Additionally, it could be interesting to extend this work by taking into account the performance of the solving methods and the search space complexity of the instances based on the landscape metrics.

Data Availability
e results shown in this paper were obtained by solving some freely available datasets in the literature. ey can be found in http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/ scpinfo.html.