Automatic design of hybrid stochastic local search algorithms for permutation flowshop problems with additional constraints Operations Research Perspectives

Automatic design of stochastic local search algorithms has been shown to be very effective in generating algo- rithms for the permutation flowshop problem for the most studied objectives including makespan, flowtime and total tardiness. The automatic design system uses a configuration tool to combine algorithmic components following a set of rules defined as a context-free grammar. In this paper we use the same system to tackle two of the most studied additional constraints for these objectives: sequence dependent setup times and no-idle constraint. Additional components have been added to adapt the system to the new problems while keeping intact the grammar structure and the experimental setup. The experiments show that the generated algorithms outperform the state of the art in each case.


Introduction
Automatic algorithm design (AAD) has shown to be able to produce state-of-the-art algorithms for the permutation flowshop problem [43]. The method proposed by Pagnozzi and Stützle [43] is based on using an automatic configuration tool to assemble algorithmic components following rules defined as a context-free grammar. The algorithmic components were implemented in the EMILI framework, a flexible framework that allows the generation of stochastic local search (SLS) algorithms. In particular, the EMILI framework allows the definition of both high specialized problem-specific components as well as general problem-agnostic components. In order to use an automatic configuration tool to design an SLS algorithm, such as irace, the grammar is converted to a set of parameters. In this paper, AAD is used to tackle the permutation flowshop problem with the sequence dependent setup times and the no-idle constraints.
The permutation flowshop problem (PFSP) is a very well known problem [14]. It has been shown to be NP -hard for different objectives with the exception of the two machine case for the makespan objective [23]. The problem models a flowshop where a set of jobs have to be processed on a group of machines. Several additional constraints have been proposed in the literature to take into account different scenarios. Often, machines have to be set up before being able to process a job. For instance, a machine may need to be cleaned or calibrated before processing another job. The setup time may depend not only on the job that has to be processed, but also on the changes made to the setup of the machine before processing the previous job. The permutation flowshop problem with sequence dependent setup times, PFSP sdst , has been introduced to model this scenario. This problem has been shown to be NP -hard, considering the makespan objective, even when there is only one machine [18]. Considering the complexity hierarchies for scheduling problems, this result can be extended to the total completion time and total tardiness objectives [46]. Several SLS algorithms such as iterated local search [65], iterated greedy [51] and population-based algorithms [30,50,58] have been proposed to solve the permutation flowshop problem with such constraint.
No-idle permutation flowshop (PFSP ni ) is another such variant. PFSP ni models a scenario where machines cannot have idle times. This constraint is necessary in some contexts such as the steel industry, ceramic production or in photolithography methods used in the production of integrated circuits [45]. PFSP ni with the makespan objective is also an NP -hard problem [1]. Following the same reasoning about complexity hierarchies for scheduling problems [46], the no-idle permutation flowshop problem can be assumed to be NP -hard also when considering the total completion time and total tardiness objectives. Among the SLS algorithms proposed for this problem there are iterated greedy [40], memetic algorithms [55] and variable neighborhood search [61].
In this paper we extend the work carried out on automatic algorithm design for the permutation flowshop problem by considering the permutation flowshop problem with the sequence dependent setup times and the no-idle constraints. For each constraint we consider the minimization of the makespan, the total completion time and the total tardiness. For each objective and constraint, the algorithms generated by our AAD system are compared with the best performing algorithm from the literature. The results show that the generated algorithms outperform the state of the art.
The paper is structured as follows. In Section 2 there is a definition of PFSP, PFSP ni and PFSP sdst . In Section 3, there is a description of how the automatic design works and how the system was updated for these problems. The experimental setup is reported in Section 4. We report the experimental results for the sequence setup times problem and the noidle problem, respectively, in Section 5 and Section 6. Finally, the conclusions are in Section 7.

Permutation flowshop with additional constraints
In its standard formulation, the PFSP models a flowshop in which a series of n jobs {J 1 , …, J n } have to be processed one at a time, in order, on a set of m machines {M 1 ,…,M m }. The jobs are released at time 0 and the jobs are executed in order with no preemption allowed. A solution is represented by a permutation π = {π(1), …, π(k), …, π(n)} that specifies the processing order of the jobs. The processing time needed for a job j on a machine i is indicated as p i,j . The completion time of a job π(j) on a machine i is given by Eq. (1), where C i,j− 1 is the completion time of the last job processed on machine i while C i− 1,j is the completion of job π(j) on the previous machine, that is In PFSP sdst , each machine has to undergo a setup operation before being able to process the following job. A matrix S of dimension n ×n ×m is defined, where S i,l,k is the setup time that machine i needs when passing from working on job l to job k. The setup time has to be considered when computing the completion times for each job. Consequently, Eq. (1) is modified as follows: The time required to setup the machine depends on the job that has to be processed and on the last job processed. In PFSP ni , machines cannot have idle times, that is, as soon as a job is completed, the next one should be ready to start processing. The completion time is calculated as in Eq. (3) where a i,π(j− 1),π(j) is calculated as in Eq. (4) and it is used to ensure that there is no idle time between the starting of job π(j) and the ending of the previous job π(j − 1), that is The most common objective considered for the flowshop problem is minimizing the makespan, C max , that is the time needed to complete all jobs. The makespan is defined as C max = C m,n . Another common objective considers minimizing the sum of the completion times which minimizes the occupation time of the machines. The sum of completion times is defined as This objective is also known as total completion time and it is equal to the minimization of the flowtime when the release times for all jobs are equal to zero. Finally, we consider also the minimization of the total tardiness, which tries to minimize the tardiness of all jobs. Assigning a due date to each job so that d i is the due date of job J i , the tardiness of J i is defined as max(0, C m,i − d π(i) ) and the total tardiness is ) .
Summarizing, in this paper we are considering the makespan, sum of completion times and total tardiness objectives for both sequence dependent setup times constraint and the no-idle constraint. An analysis of the state of the art for these problems is given in Sections 5 and 6.

Automatic algorithm design
Historically, implementing an SLS algorithm for some problem has always been a manual engineering process. A designer would implement an SLS algorithm of his choice, usually the one he knows the most, as well as one or more alternative behaviors for each aspect of the algorithm [20]. Moreover, SLS algorithms have often many parameters that need to be set in order to better adapt the algorithm to the problem it has to solve. The designer would choose among the alternative behaviors and set the parameters of the algorithm in a manual trial-and-error process or, more recently, using automatic algorithm configuration (AAC). Given an application scenario, AAC tools apply different techniques in order to find the best parameter setting, known as configuration, for a target application [21,22,34]. Automatic algorithm design stems from the work carried out on automatic algorithm configuration and is based on combining AAC tools with configurable algorithmic frameworks [59]. A configurable algorithmic framework implements one or more SLS algorithms as templates in such a way that for every design choice of the algorithm one can choose among different alternatives. The framework exposes all these choices as parameters so that an AAC tool can be used to find the best configuration, that is, a new algorithm adapted to solve a specific problem. This idea has been considered both in a top-down and bottom-up approach. The top-down approach focuses on one algorithm template and expresses all the design choices as parameters, e.g. implementing a simulated annealing algorithm where all the components are parameters. Notable examples of this approach can be found in SAT solvers [27], frameworks for ant colony optimization algorithms [32] and for multi-objective evolutionary algorithms [2][3][4]. Recently, this approach has been applied to generate iterated local search algorithms for the standard PFSP [6,7], the unconstrained quadratic problem [11], the test-assignment problem [12] and the simulated annealing algorithm [15].
In the bottom-up approach instead, the template structure is not fixed, that is, different types of SLS algorithms can be instantiated. Moreover, some degree of hybridization between different algorithms is allowed, enabling the system to generate new combinations. Such flexibility requires the use of an algorithmic framework to define the components and handle their integration into an algorithm. Furthermore, since the template is not fixed, defining the set of parameters that the parameter tuner has to optimize is not a trivial task. For this reason, context-free grammars have been used to specify how the algorithms should be composed [8,36,42]. Grammars have the advantage of limiting all the possible combinations of components to only those that generate a valid algorithm.
The bottom-up approach has been applied to several problems such as PFSP with the weighted tardiness objectives, unconstrained binary quadratic problem and traveling salesman problem with time windows [35]. The proposed system uses the ParadisEO framework [10] as algorithmic framework, irace as parameter tuner and the grammar is converted to a set of finite parameters following the approach proposed by Mascia et al. [37]. In a most recent publication [43], a system based on the same principles, but with a new algorithmic framework, the EMILI framework, has been used to generate new state-of-the-art algorithms for the PFSP problem with the makespan, sum of completion times, total tardiness objectives. This system, with some additions to the framework, is the same used in this study.
A line of research related to automatic algorithm design can be considered the one on hyperheuristics [9]. These methods propose techniques to generate heuristics that can be seen as specific components in automatic algorithm design [59]. Hyperheuristics based on genetic programming [29], where evolutionary algorithms are used to generate computer programs, have been used to generate heuristics and heuristic components for problems such as SAT problems [16,17], scheduling problems [5,19], bin packing [33,57] and traveling salesman problem [25,26]. Grammars have been also used together with genetic programming in methods called grammatical evolution [8,42]. Grammatical evolution has been applied to generate local search heuristics and ant colony optimization algorithms [8,54,63].

Grammar based AAD with the EMILI framework
An idea of how the grammar works can be given by making a small example. Let us consider a rule deriving an iterated local search (ILS) algorithm as shown in Eq. (5) The rule states that to instantiate an ILS algorithm one needs to instantiate first the components <LocalSearch>, <Termination>, <Perturbation> and <Acceptance>. For example, an ILS algorithm for the PFSP could be described by The algorithm described in Eq. (6) is an ILS that uses ls pfsp as local search, it is executed for 20 s, performs two random steps in the exchange neighborhood as perturbation and accepts only improving solutions. Fig. 1 shows a snapshot of the grammar used for this study. The different components available for each component type are listed in Section 3.2.
When converting the grammar to parameters, a distinction has to be made between simple and complex rules. Simple rules can be directly translated to parameters. For instance, a rule that sets all possible alternatives for a neighborhood can be directly translated into a categorical parameter. Complex rules, that is recursive rules or groups of rules that can form a loop, need to be explicitly expanded. This means that the rule, or the group of rules in case of loops, generate a new set of parameters each time it is expanded. Consequently, a limit needs to be set to the total number of expansions. A more detailed explanation of how the grammar is defined and converted in parameters can be found in [37]. In this work we fix the maximum number of expansions to three.
The EMILI framework is based on a generalized version of a hybrid SLS algorithm. Hybrid SLS algorithms are defined in [20] as those that manipulate at each search step a single solution combining two or more search types. The framework supports also "simple" SLS algorithms that are identified in [20] as SLS algorithms that use only one type of search step. An outline of an iterated local search algorithm, which represents a hybrid SLS template as implemented in the framework, is shown in Algorithm 1.
The algorithm works as follows. An initial candidate solution is generated using a heuristic (Line 2). Then an SLS is applied to the candidate solution (Line 3). The algorithm executes the main loop until the termination criterion is met (Line 4). In the main loop, the candidate solution is perturbed (Line 5), the SLS is applied to the perturbed solution (Line 6), and an acceptance criterion is used to decide whether to keep the current candidate solution or to accept the perturbed solution (Line 7).
As explained in [37], this structure can describe many different SLS algorithms. For instance, simulated annealing can be instantiated by choosing an initial solution component that returns a random candidate solution, a perturbation that generates random neighboring solutions of the candidate solution, an acceptance criterion based on the Metropolis condition and not applying any SLS to the perturbed solution.
The EMILI framework classifies algorithmic components in problem dependent and problem independent components. The first are components that have to access and modify the data structures representing the problem and the solution. Typically, initial solution heuristics, neighborhoods and perturbations belong to this category. Problem independent components, instead, only need to compare solutions or access information about the search process (e.g. number of iterations). These components can be defined once and then used with any problem Fig. 1. Context-free grammar that contains the rules used to build algorithm templates for this study. Note that rules ILS together with LocalSearch define a recursion that can be exploited to generate hybrids combining various algorithms.
In the next section, we will give a brief description of the algorithmic components used for this study.

Algorithmic components
All the components used in this work are listed in Table 1. Most of these components were implemented for the previous work on PFSP as general components and, therefore, can be used in this work without any change. The components added to the framework for this work are reported in the table in bold. In the following, we give a description of the components implemented for this study and a brief presentation of the components already present in the framework. A more detailed description of these components can be found in our previous publication [43].

Neighborhood
A neighborhood of a solution consists of all the solutions that can be generated by applying a modification rule. The framework provides several base neighborhood definitions for PFSP: exchange, insert, transpose, binsert, finsert and twinsert. The first two are based on exchanging the position of two jobs (exchange) and removing one job and inserting it in another position (insert). The others, with the exception of twinsert, represent a subset of the first two. The transpose neighborhood exchanges only adjacent jobs. In binsert a removed job can be inserted only before its original position, while in finsert the insertion point has to be after the original position. Instead, twinsert considers all the permutations that can be created by removing and inserting groups of two adjacent jobs. The two jobs are reinserted in the same order in which they are removed. Additionally, Taillard's technique to speedup the exploration of the insert neighborhood has been adapted to PFSP ni and PFSP sdst .

Construction heuristics
SLS algorithms use construction heuristics to generate the initial solution from which they start to explore the solution space. In our previous study several construction heuristics for the PFSP have been implemented in the EMILI framework as general components and, therefore, were also used in this study. Typically, construction heuristics build a solution by adding solution components in a step-by-step process until a complete solution is constructed. A construction heuristic can be defined by the way the solution component is selected, the selection rule, and how it is added to the partial solution, the construction rule. Solution components can be added in two ways by either appending the solution component at the end of the partial solution or inserting it in the position that gives the best solution value. Heuristics that use insertion as construction rule are also called insertion heuristics. In some cases, a local search may be applied to the partial solution.

Construction Heuristics
NEH [41], NEH tb [13], NEH edd [28], LR [31], NLR The NEH heuristic [41] is rated as one of the most effective heuristics for PFSP. This insertion heuristic selects the jobs in descending order of the sum of processing times. NEH has been so influential that several improvements and variations have been proposed. The ones implemented in EMILI are NEH tb [13], NEH edd [28], FRB5 [47] and NEH rs . NEH tb introduces a different rule to break ties when a job can be inserted in more than one position resulting in the same objective function value. NEH edd selects the job in descending order of due dates and is feasible only for the total tardiness objective. FRB5 executes a local search on the partial solution after each step of the heuristic. Finally, the NEH rs heuristic modifies the initial step of the NEH by choosing randomly the first job of the permutation.
Considering the other heuristics implemented, the LR heuristic [31] uses for the selection an index function that considers the idle times and an approximation of the sum of completion times. At each step the index function is calculated and the job with the smaller value is appended at the end of the partial solution. NLR is an insertion heuristic that uses the index function of the LR heuristics to select the next job. The RZ heuristic appends first to the partial solution the jobs that minimize a function based on the weighted sum of processing times [48]. The generated solution is improved by means of a local search. NRZ and NRZ 2 are insertion heuristics that are based on the RZ heuristic. NRZ uses the solution generated by RZ before applying the local search as selection rule of an insertion heuristic. The generated solution is still improved as in RZ with a local search. In NRZ 2 the local search is not applied, generating a typically worse solution but in little time. SLACK is a construction heuristic for the total tardiness objective that appends jobs to the partial solution by selecting at each step the job with the minimal tardiness.
For this work, the NAG insertion heuristic [40] proposed for PFSP ni TCT was implemented in EMILI. This heuristic uses the index function of the LR heuristic and after each insertion another index function is used to select y jobs to remove and reinsert in the partial solution. A number x of initial sequences is generated by choosing for each sequence a different first job to append to the partial solution. The parameters y and x can Table 3 Average RPD results of EMBO, MRSILS, IG rs and IR stms . If the result of one of the algorithms is in bold face it means that it is statistically significantly better then the others according to the Wilcoxon signed-rank test with a 95% confidence using the Bonferroni correction to take into account multiple comparisons.  1: Output The best solution found π * , 2: π := NEH() 3: π := IR sttct 2(π) 4: π * := π 5: while ! time is over do 6: π := IG(π) 7: π := IR sttct 2(π ) 8: if f (π ) < f (π * ) then 9: π * := π

Iterative improvement
Iterative improvement algorithms are local search algorithms that explore the solution space in an iterative fashion by going from one solution to an improving neighboring solution. This process typically stops when no improving neighbor can be found or, in some cases, after a certain number of steps. These algorithms take as input the starting solution, the neighborhood relation and the way to choose which candidate neighbor is selected for the next iteration, known as pivotal rule. The algorithm can consider also multiple neighborhood relations as in the variable neighborhood descent (VND).
The algorithms already implemented in the EMILI framework and that were used in this study comprehend the most widely used pivotal rules FirstImprovement and BestImprovement as well as the iRZ local search and VND. In FirstImprovement the exploration of the neighborhood of the current solution is stopped as soon as an improving solution is found. In BestImprovement instead, the whole neighborhood is explored and the best neighbor is returned. The iRZ algorithm iterates the local search phase defined in the RZ heuristic until it cannot find any improving solutions. The VND explores, in order, a set of neighborhoods passing from one neighborhood to the next when no improvement is found. Each time an improving solution is found the algorithm starts again from the first neighborhood. The algorithm stops when it scanned all the neighborhoods with no improvement. Additionally, two other algorithms have been implemented for this study, STH [58] implemented for PFSP sdst MS and als that has been implemented as a problem independent component. STH selects a block of jobs of size [1,3] and evaluates b insertion points, choosing the best. The process is iterated b times, where b is a parameter of the algorithm. The als local search takes as parameters two iterative improvements algorithms (l 1 and l 2 ) and, at each iteration, applies them alternatively.

Perturbation
The perturbation in an SLS has the role of letting the search process escape local minima by changing the current solution in a way that cannot be undone by the local search. The most simple way of implementing a perturbation is by taking a random neighbor of the current solution. random_move will perturb the current solution by executing num random steps in the N neighborhood. Instead, vr_move expands the concept of random_move by allowing to specify multiple neighborhoods N. The number of random steps to execute per neighborhood is specified by the parameter num. The neighborhood is changed to the next one after it iterations.
A widely used perturbation scheme for the PFSP is the iterated greedy (IG) perturbation. This scheme is composed of a destruction phase and a construction phase. In the destruction phase a number d of jobs are removed from the solution. In the construction phase, the jobs are inserted in the partial solution, one by one, in the position that minimizes the objective function value. This perturbation is implemented in IG, IG ni and IG st where the last two use Taillard's acceleration for, respectively, PFSP ni MS and PFSP sdst MS to find the best insertion point in the construction phase. In IG io the jobs to be reinserted are considered in the descending order of sum of processing times. With IG lsps , a local search is used to further improve the partial solution after each reinsertion. The MRSILSp perturbation keeps a pool of size solutions containing the best size solutions. If the pool is full, the worst solution of the pool is discarded; when the pool is not yet full, the current solution is perturbed by executing t random steps in the transpose neighborhood and returned. When the pool is full, a random solution is selected from the pool and perturbed using the IG perturbation.

Termination condition
Termination conditions are components that trigger the stop of an SLS algorithm when a certain condition is verified. Several termination conditions have been considered. local minima will stop the execution when there is no more improvement. maxsteps instead stops the execution when maxi iterations have been completed. maxstepsorlocmin combines the first two: the algorithm will be stopped either when there is no more improvement or after maxi iterations. non_imp_it stops the algorithm if no improvement is achieved in maxi iterations. Finally, nstepsorlocmin works in the same way as maxstepsorlocmin, but the maxi parameter is always set to the number of jobs.
Commonly in simulated annealing the temperature T is updated during the algorithm execution. The temperature would start at a certain value T s and then decrease, according to a schedule, until it reaches a final value T e . The schedule we used updates the temperature every it iterations following the rule T n+1 = α⋅T n − β where α and β are real values between 0 and 1. The criteria sa and psa update the temperature using this rule, with psa setting α always equal to 1. Instead, ft, rsacc and karacc do not update the temperature. In the rsacc [52] acceptance criterion, the temperature T rs is linked to the average processing time of the instance to be solved and it is calculated as where T p is a parameter. The karacc [24] acceptance criterion adapts the rsacc criterion to the total tardiness by calculating the temperature T kar as where LB Cmax is the lower bound for the makespan calculated using the method defined by Taillard [60]. Additionally, an SLS algorithm can be set to always accept the perturbed solution regardless of its solution quality.  1: Output The best solution found π * , 2: π := NEH() 3: π := IR sttt 2(π) 4: π * := π 5: while ! time is over do 6: π := random_move(π, binsert) 7: π := IR sttt 2(π ) 8: π := rsacc(π, π ) 9: if f (π ) < f (π * ) then 10: π * := π 11: end if 12: end while 13: Return π * Algorithm 7. IR sttt .

Adding VNS to the EMILI framework
The VNS algorithm is considered as a special case of the ILS algorithm. The similarity is evident if we consider the outline of both algorithms, shown in Algorithm 2 for the VNS and in Algorithm 1 for the ILS. Both use a heuristic to generate an initial solution and use a local search for intensification. A VNS algorithm is characterized by the shake and the neighborhood change as shown in Algorithm 2. The shake works as a perturbation applying random changes to the current solution according to one neighborhood. The shake keeps a set of neighborhoods and selects the one to apply according to the parameter k. The neighborhood change component acts as an acceptance criterion and controls the parameter k. When the current solution is accepted, k is set to zero otherwise it is incremented by one.

Experimental settings
In this section we report the setup used for the automatic design and for the comparisons with the current state-of-the-art algorithms. In order to apply the automated design approach presented in this paper, the grammar presented in Section 3 needs to be adapted to each objective and PFSP variant. The resulting six grammars maintain the same general structure shown in Fig. 1, but have different variant-specific and objective-specific components. For example, the speedup for the insert neighborhood for PFSP sdst MS is not present in the grammars for PFSP sdst TCT and PFSP sdst T and the same applies for no-idle. Considering these differences, the number of parameters to tune were 627 for the three objectives of PFSP ni , 535 for PFSP sdst MS and 507 for PFSP sdst TCT and PFSP sdst T . The configuration space generated from the grammar needs to be explored by an automatic configuration tool. For this task we chose irace, a publicly available AAC tool [34]. For each objective and PFSP variant, irace was run twice with a budget of 10 5 experiments per run for a total of 2⋅10 5 experiments. The best configurations at the end of the first run were given as initial configurations for the second run. The training set used for the automatic configuration is the same one used in [43]. This set is composed of 40 randomly generated instances following the procedure described in [39]. The instances are divided in groups of five with jobs size n ∈ {50, 60, 70, 80, 90, 100} with 20 machines plus five instances of size 250 × 30 and five of size 250 × 50.
The automatically generated algorithms are compared with the current state of the art using the most commonly used benchmark in-stances for each PFSP variant and objective. The state-of-the-art algorithms have been implemented to the best of our ability following the respective papers using, for the experiments, the parameter settings reported by the authors. Regarding PFSP sdst , the experiments for the three objectives were made using the benchmark presented in Ruiz and Maroto [49]. This benchmark is composed of four sets of instances. Each set is based on the original 120 instances of the Taillard's benchmark [60] comprising 12 groups of 10 instances with jobs n ∈ {20, 50, 100, 200, 500} and machines m ∈ {5, 10, 20}. The four sets of instances, called SDST10, SDST50, SDST100 and SDST125, have the setup times sampled uniformly in the range [1,9], [1,49], [1,99] and [1,124].
In the case of PFSP ni , for makespan and sum of completion times we used the benchmark presented in Ruiz et al. [53] that is composed of 250 instances in groups of 5 with number of jobs n ∈ {50, 100, 150, 200, 250, 300, 350, 400, 450, 500} and machines m ∈ {10, 20, 30, 40, 50}. The instances generated for this benchmark do not consider due dates. Hence for PFSP ni T we used the benchmark presented in Vallada et al. [64] that was proposed for the standard PFSP. This benchmark is composed of 540 instances divided in groups of 45 with number of jobs n ∈ {50, 150, 250, 350} and machines m ∈ {10,30,50}. In all cases, with the only exception of PFSP ni T , the performances have been evaluated computing the relative percentage variation (RPD) that can be calculated as follows: where R a is the solution reported by algorithm a and R * is the best known solution. In the case of PFSP ni T , since the benchmark set has instances where the total tardiness of the best solution is equal to zero, the relative deviation index (RDI) was used. The RDI is calculated as where R w is the worst solution generated considering all the tested algorithms.
The tuning was executed on a Xeon 5410 CPU at 2.33 Ghz while the experiments were conducted on an Opteron 6410 CPU running at 2.1 Ghz. All machines use CentOS 6.2. All the algorithms in the comparisons have been implemented in the EMILI framework and each execution was single threaded. All the parameter settings as well as the best solutions Table 6 Average RPD results of IG rs and IR sttt . If an algorithm is statistically significantly better according to the Wilcoxon signed-rank test with a 95% confidence, the result is shown in bold face. found for each benchmark are reported in the supplementary pages [44]. In the following, we report the algorithms generated by our AAD system, as well as the results of the comparison with state-of-the-art algorithms for each of the constraints and objectives tackled. Finally, all the algorithms tested are executed with a maximum running time that is calculated as T max = n⋅(m /2)⋅t ms, where n is the number of jobs, m is the number of machines and t is a parameter. In our tests we used for the parameter t the values {60, 120, 240}.

Makespan
Many different metaheuristics have been proposed for this problem like GRASP, genetic and memetic algorithms [49] before the introduction of the IG rs algorithm [51]. The IG rs algorithm is a very simple and powerful metaheuristic based on the NEH heuristic, a FirstImprovement local search that explores the iRZ neighborhood, the IG perturbation and a fixed temperature Metropolis like acceptance criterion. This algorithm, similarly to when it was proposed for the minimization of the makespan in the standard PFSP [43], has remained the best performing algorithm for quite sometime before new algorithms were proposed.
In 2014, MRSILS st [65] showed to outperform IG rs . MRSILS st is an ILS algorithm that uses the NEH heuristic to generate the initial solution, an insertion based local search, a strictly improve acceptance criterion and the MRSILSp perturbation presented in Section 3. Recently, a migrating birds optimization, EMBO was proposed as new state of the art [58]. The EMBO algorithm divides the population in a leader solution and two groups of followers. At each iteration, the leader solution is updated by applying the STH algorithm. Afterwards, k solutions are selected among the swap and insert neighborhood of the leader solution. Each follower is considered going from the closest to the leader to the furthest. The solution produced applying the STH algorithm is compared with k − x best neighbors from the previous follower solution and x neighbors selected among the swap and insert neighborhood of the current follower. Additionally, a tabu list is used to improve the neighbors selection. All the solutions are characterized by an age variable that is incremented at each iteration if the solution is not updated. If the age variable reaches max age the solution is substituted by a random one.
The automatically generated algorithm, IR stms , is shown in Algorithm 3 with the parameters in Table 2. This algorithm is a rather simple IG algorithm. It uses the NRZ heuristic to generate the initial solution and a FirstImprovement local search exploring the insert neighborhood. The perturbation is the same as IG rs with a stronger destruction phase while the acceptance is closer to a classical SA acceptance.

Operations Research Perspectives 8 (2021) 100180
statistically significantly better for almost all the instance sizes with IG rs being the second best followed by MRSILS and EMBO. Considering the smallest running time, IG rs shows better results than IR stms when considering instances with 200 or more jobs. Although the difference between the two algorithms gets smaller, this result does not change for instances with 500 jobs even when considering longer running times. In this case, IR stms may be less efficient due to the lack of instances with this size in the training set. Another interesting finding about IG rs performance is that, differently from the results presented in the papers introducing EMBO and MRSILS st [58,65], IG rs is always able to outperform MRSILS and EMBO. One possible explanation is that this result is due to our implementation. This can be excluded in the case of EMBO, where a comparison with the best solutions found by the original implementation, as reported by the authors, shows that our implementation has better results. A similar comparison cannot be done with MRSILS, but we are confident that the algorithm was implemented as described by the authors. Since we could not find in the papers any reference about the use of Taillard's acceleration, another possible explanation is that EMBO and MRSILS were compared with a IG rs algorithm that was not using this acceleration greatly reducing its performance.
Finally, the performance of IG rs in our experiments and the similarity of IR stms to this algorithm further confirms that the IG algorithm is quite effective when solving the PFSP with the makespan objective even when we take into account the sequence dependent setup times constraint.

Total completion time
Although the makespan objective for PFSP sdst has been extensively studied, the total completion time and total tardiness for the PFSP sdst have not received the same attention by the research community. To the best of our efforts, we were unable to find algorithms proposed for PFSP sdst TCT and PFSP sdst T . The algorithms generated for these problems will be compared with the IG rs algorithm presented in [51] for PFSP sdst to solve the makespan and total tardiness objectives. Furthermore, this algorithm has shown to have generally good performances when tackling PFSP in general.
The algorithm generated for total completion time, IR sttct , is quite different from IR stms as it is composed of three nested ILS. The algorithm outline is shown in Algorithms 4, 5, and 6, while the parameters are listed in Table 5. The innermost ILS, IR sttct3 , uses a VND as local search and stops either when it cannot improve anymore or after n iterations; additionally, it uses the IG lsps perturbation while always accepting the perturbed solution. The second level ILS instead, use the simple IG perturbation with a Metropolis like acceptance criterion. Finally the outer layer ILS also use the IG perturbation but with a lower value for the d parameter. Considering the acceptance criteria of the different layers, such complicated structure may be explained as a way to vary the strength and type of perturbation during the execution. The comparison with IG rs is shown in Table 4 and Fig. 3. Overall, IR sttct outperforms IG rs with results that are always statistically significant.

Total tardiness
The algorithm generated for this problem, IR sttt shown in Algorithms 7 and 8, is a two layers ILS that uses the NEH heuristic to generate the initial solution. The most interesting feature of this algorithm is that the inner layer employs, alternatively, two local searches. One explores the finsert neighborhood and the other the twinsert neighborhood. The inner ILS also uses the IG perturbation as well as the rsacc acceptance criterion that makes it very similar to IG rs . The outer layer uses one random step in the binsert neighborhood as a perturbation and, similarly to the inner ILS, the rsacc acceptance. Comparing the temperature of the acceptance criteria, the parameter settings of IR sttt are shown in Table 7. It seems that the two ILS have two well defined roles. In fact, the inner ILS has a higher probability of accepting non improving solutions while the outer layer, with a low temperature, is more focused on intensification. The results of the comparison with IG rs are shown in Table 6 and in Fig. 4. Similarly to the results obtained for PFSP sdst TCT , IR sttt clearly outperforms IG rs and it is always statistically significantly better.

Makespan
Among the different algorithms proposed for this problem, GVNS [61] has been the state of the art for a long time. Recently, a memetic algorithm, MANEH [55] has shown to outperform GVNS. Since MANEH uses a modified version of GVNS to improve the best individual of the population, both algorithms are used in the comparison with IR nims , the generated algorithm.
The GVNS algorithm [61] is a variable neighborhood search where the neighborhood structures used in the VND are an IG algorithm and an ILS. Both SLS algorithms use as local search the iRZ algorithm. The initial solution is generated using the NEH heuristic and the algorithm uses a random move in the insert and in the exchange neighborhood. Finally a solution is accepted only if it improves on the current solution.
MANEH is a memetic algorithm in which the population is initialized using the NEHRS heuristic [55]. At each iteration, the parents are selected using tournament selection. The random sample crossover (RSC) is used to generate two new solutions that are first mutated using a random insert move and then improved using the iRZ local search. If this results in a better solution, the child replaces the worst individual in the population. After a number of steps equal to a quarter of the population size, the best individual is improved using a modified version of the GVNS algorithm, SAGVNS. In SAGVNS, the strictly improve acceptance criterion of GVNS is replaced by a fixed temperature Metropolis acceptance criterion similar to the one used in IG rs . Algorithms 9 and 10 show the outline of IR nims , the automatically generated algorithm for PFSP ni MS . The parameter settings are shown in Table 9. Contrary to IR stms , this algorithm is composed of two ILS. As is often the case for the makespan objective, the local search used by the innermost ILS is a FirstImprovement local search exploring the accelerated insert neighborhood. Regarding the perturbation, the innermost uses the IG perturbation while the outermost uses the IG lsps with the same local search used in the innermost ILS. The comparison of IR nims with MANEH and GVNS is shown in Fig. 5 while in Table 8 the results are grouped by instance size. IR nims clearly stands out as the best algorithm, outperforming both MANEH and GVNS with GVNS being better than MANEH. Overall IR nims is statistically significantly better than MANEH and GVNS. However, the results grouped by instance size GVNS shows better results when t = 60 for instances of size {300 × 20}, {400 × 30}, {500 × 40}, {500 × 50}. The results are not statistically significant and, for longer running times, IR nims has better performances.

Total completion time
The current state-of-the-art algorithm for PFSP ni TCT is VigDE [62] that implements a differential evolution algorithm combined with an IG algorithm. Each individual of the population in the DE algorithm has two chromosomes. One represents the solution and the other represents two parameters of the IG algorithm, that is, the probability p of applying the local search and d, the number of jobs destroyed in the destruction-construction perturbation. The algorithm uses the NEH rs heuristic to generate the initial population. At each iteration, all individuals undergo a mutation and each mutated individual is refined using the IG algorithm with probability p. The mutated individual replace the parent, if it has a better objective function value. In the experiments, we used the version of VigDE presented in [40], where the algorithm has been improved by using the NAG heuristic to generate the first individual of the population.
The results of the comparison between VigDE and IR nitct are shown in Fig. 6 and Table 10. IR nitct clearly outperforms VigDE in all instances sizes and the results are always statistically significant.

Total tardiness
The current state-of-the-art algorithm for PFSP sdst T is DTLM [56]. DTLM is a discrete teaching learning algorithm. This population based algorithm divides the population in three groups. The best solution is considered the teacher while the best λ × PS individuals are elite learners, where PS is the population size and λ is a parameter of the algorithm. At each iteration, each individual undergoes two updating phases in which the produced solution substitutes the individual if it was better. In the first phase a new solution is built by applying the PMX crossover to each solution with a consensus permutation built from the elite learners. In the second phase the three groups are updated differently. An ILS algorithm is applied to the first elite learner while the other elite learners undergo a crossover with the first. A path-relinking procedure is applied to each non elite individual in which all the solutions from the individual to a random elite individual are considered and the best one is selected. If no new solutions are produced, a destruct-construct perturbation is applied.
The algorithm automatically generated for the total tardiness objective, IR nitt , is shown in Algorithm 12 and 13 with the parameters shown in Table 13. The algorithm is again a two layers ILS that uses the SLACK heuristic to generate the initial solution. The innermost ILS uses a FirstImprovement local search that explores the exchange neighborhood. The perturbation consists of random steps in the binsert and finsert neighborhoods while the acceptance criterion is based on a fixed temperature Metropolis condition. The external ILS instead, uses a IG perturbation and an acceptance criterion that accepts only improving solutions. The results of the experiments are shown in Table 12 and in Fig. 7. IR nitt greatly outperforms DTLM.
1: Input current solution π. 2: Output The best solution found π * , 3: π := FirstImprovement(π, local minima, exchange) 4: π * := π 5: while maxsteps() do 6: π := vr_move(π, binsert, finsert)) 7: π := FirstImprovement(π , local minima, exchange) 8: π := karacc(π, π ) 9: if f (π ) < f (π * ) then 10: π * := π 11: end if 12: end while 13: Return π * Algorithm 13. IR nitt2 . obtained for the standard PFSP [43] give further confirmation to AAD and this method in particular. Looking at the generated algorithms, as already observed with standard PFSP, it is interesting to note that a two levels structure is preferred in the majority of cases, being present in IR sttt , IR sttct , IR nims and IR nitt . This result is in accordance with our previous study on standard PFSP where two out of three algorithms possessed the same structure. Interestingly, the result of DTLM in PFSP ni T , the result of MANEH in PFSP ni MS and EMBO in PFSP sdst MS show that population based algorithms seem not to be well suited to tackle the PFSP. Further confirmation of this trend can be found by considering the state-of-the-art algorithms for the standard PFSP [43,52].
Several directions can be taken to further progress this research. First, it would be of great interest to investigate whether the level of structural complexity found in generated algorithms, like IR sttct , is really needed. A second direction is to use the system to generate algorithms for other problems as well as to further expand EMILI to better support other types of SLS methods such as population based algorithms. Finally, the components implemented in the EMILI framework are the result of the advance knowledge present in the literature regarding the specific problems tackled. A fourth direction to investigate would be to add to this system the ability of automatically generate new components. In particular, construction heuristics may be generated following a AAD process where AAC tools are used to generate new heuristics by combining different components.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments
The project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and Fig. 7. Average RDI and 95% confidence intervals of DTLM and IR nitt for t = 60 (left), T = 120 (center) and T = 240 (right).

Table 12
Average RDI results of DTLM and IR nitt . If an algorithm is statistically significantly better according to the Wilcoxon signed-rank test with a 95% confidence, the result is shown in bold face.