A mathematical formulation and an NSGA-II algorithm for minimizing the makespan and energy cost under time-of-use electricity price in an unrelated parallel machine scheduling

In many countries, there is an energy pricing policy that varies according to the time-of-use. In this context, it is financially advantageous for the industries to plan their production considering this policy. This article introduces a new bi-objective unrelated parallel machine scheduling problem with sequence-dependent setup times, in which the objectives are to minimize the makespan and the total energy cost. We propose a mixed-integer linear programming formulation based on the weighted sum method to obtain the Pareto front. We also developed an NSGA-II method to address large instances of the problem since the formulation cannot solve it in an acceptable computational time for decision-making. The results showed that the proposed NSGA-II is able to find a good approximation for the Pareto front when compared with the weighted sum method in small instances. Besides, in large instances, NSGA-II outperforms, with 95% confidence level, the MOGA and NSGA-I multi-objective techniques concerning the hypervolume and hierarchical cluster counting metrics. Thus, the proposed algorithm finds non-dominated solutions with good convergence, diversity, uniformity, and amplitude.


INTRODUCTION
The industrial sector is one of the largest consumers of energy in the world. According to EIA (2016), this sector consumes around 54% of the total energy distributed globally.
Among the various forms of energy used by the manufacturing industry, electricity has been one of the most consumed. In China, for example, this sector consumes about 50% of the electricity produced in the country (Liu et al., 2014).
In recent years, electricity prices have continuously increased for manufacturing companies in industrialized countries (Willeke, Ullmann & Nyhuis, 2016). In Norway, the model integrated into a genetic algorithm for a problem similar to that described previously. They tested their method in instances of up to 200 jobs. The results indicated a 4.20% reduction in total energy consumption compared to the traditional genetic algorithm.
Other studies address the scheduling problem and consider a second objective beyond minimizing energy consumption. Cota et al. (2018) proposed a mathematical model and applied a mathematical heuristic called multi-objective smart pool search for the UPMSP-SDS. The objective functions are to minimize the makespan and the total energy consumption. In the experiments, they used a set of instances with up to fifteen jobs, and five machines randomly generated. They adopted hypervolume and set coverage metrics to compare the proposed algorithm with the e-constraint exact method. They showed that the objectives are conflicting and that energy consumption strongly influences the solution's quality. Cota et al. (2019) introduced the MO-ALNS and MO-ALNS/D algorithms to handle instances of up to 250 jobs and 30 machines of the same problem described previously. The MO-ALNS algorithm is a multi-objective version of the Adaptive Large Neighborhood Search-ALNS (Ropke & Pisinger, 2006), and MO-ALNS/D combines the multi-objective MOEA/D (Zhang & Li, 2007) with ALNS. The MO-ALNS/D algorithm was able to find better results than MO-ALNS in most instances in the hypervolume, set coverage, and Hierarchical Cluster Counting (HCC) (Guimaraes, Wanner & Takahashi, 2009) metrics. (Wu & Che, 2019) proposed a memetic differential evolution (MDE) algorithm for the UPMSP in which the objectives are also to minimize the makespan and the total energy consumption. The computational results showed that the proposed approach provides a significant improvement over the basic DE. Also, the MDE outperforms the SPEA-II and NSGA-II algorithms. (Liang et al., 2015) presented the Ant Colony Optimization algorithm with the Apparent Tardiness Cost (ACO-ATC) rule for the UPMSP seeking to minimize the total tardiness and the energy consumption. In this problem, machines need to wait until jobs are ready. However, it is necessary to decide whether the machine remains on or off during the wait. Turning off the machine to wait for the job to be ready saves energy. On the other hand, keep on the machine while waiting for the job saves time because it eliminates the need to prepare the machine again. They compared the ACO-ATC results with the classic ACO and a GRASP-based algorithm (Feo & Resende, 1995). The proposed algorithm was better than the other approaches in most of the tested instances.
There are studies that only address the minimization of the total energy cost. Ding et al. (2016) presented two approaches to UPMSP: the first introduces a time-interval-based Mixed Integer Linear Programming (MILP) formulation. The second consists of a reformulation of the problem using the Dantzig-Wolfe decomposition and a column generation heuristic. According to the results, the MILP formulation overcame the column generation method in terms of solution quality and execution time when electricity prices stay stable for a relatively long period. On the other hand, the column generation method performed better when the electricity price frequently changed (i.e., every half hour). Cheng, Chu & Zhou (2018) improved the formulation by Ding et al. (2016) and performed computational experiments with 120 randomly generated instances to compare the two formulations. The results showed that the new formulation achieves better results concerning the solution quality and execution time. Saberi-Aliabad, Reisi-Nafchi & Moslehi (2020) proposed the fix-and-relax heuristic algorithm in two stages for this same problem. In the first stage, jobs are assigned to the machines, and the second one solves a scheduling problem on simple machines. They tested its method in 20 instances randomly generated following the same parameter values as other previous studies. They compared the proposed method with the algorithms of Che, Zhang & Wu (2017) and Cheng, Chu & Zhou (2018). The results showed that the fix-and-relax algorithm overcame the others.
Finally, we present studies that address the scheduling problem considering minimizing the energy cost combined with another objective. Zeng, Che & Wu (2018) dealt with the bi-objective uniform parallel machine scheduling to minimize the total energy cost and the number of machines. They proposed a new mathematical model and a heuristic algorithm for it. The computational results showed that the heuristic method generates high-quality solutions in a reasonable time limit for instances of up to 5,000 jobs. Cheng, Wu & Chu (2019) presented a mathematical formulation and a genetic algorithm for the UPMSP. They considered the objective of minimizing the weighted sum of makespan and total electricity cost. The results presented by their formulation overcome that of the genetic algorithm in terms of solution quality. Kurniawan et al. (2017) proposed a genetic algorithm with a delay mechanism for the UPMSP to minimize the weighted sum of makespan and total energy cost. The proposed algorithm handled instances of up to 30 jobs and 15 machines. The results showed that the proposed method provided better solutions than the classical genetic algorithm.
Although there are studies correlated to ours, of our knowledge, there is no work addressing the unrelated parallel machine scheduling problem with sequence-dependent setup times to minimize the makespan and the total energy cost. Table 1 summarizes the characteristics of scheduling problems treated by our work compared to literature references.

PROBLEM STATEMENT
To define the UPMSP-SDS, we characterize the problem in this section and introduce a MILP formulation to solve it.
The following are the characteristics of the problem addressed in this work: • There are a set N ¼ f1; . . . ; ng of jobs, a set M ¼ f1; . . . ; mg of machines, and a set L ¼ f1; . . . ; og of different operation modes, such that each operation mode l 2 L is associated with a multiplication factor of speed v l and a multiplication factor of power l ; • The machines are unrelated parallel. In other words, the processing time of job j 2 N can be different on each machine i 2 M; • There is a planning horizon that consists of a set of H ¼ f0; . . . ; jHjg of time instants, and we must perform all jobs within this horizon; • All jobs are available to be processed at the beginning of the planning horizon h ¼ 0; • Each job j 2 N must be allocated to exactly one machine i 2 M; • There is a processing time p ij to process a job j 2 N on a machine i 2 M; • There is a sequence-dependent setup time S ijk to execute a job k 2 N after another job j 2 N on a machine i 2 M; • Each machine i 2 M has a power p i at normal operating speed; • The operation mode l 2 L of each job determines the multiplication factor of power ( l ). It also determines the multiplication factor of speed (v l ), which, in turn, is related to the execution time of each job; • There is a set D of days on the planning horizon H; • Each day is discretized into sizeD time intervals. For example, for discretizing a day in minutes, sizeD = 1,440; for the discretization of one day in hours, sizeD = 24; • To each day t 2 H, we have a peak hour, which starts at the time startp t 2 H and ends at the time endp t 2 H; • ET off and ET on represent the energy tariff ($/KWh) in off-peak hours and on-peak hours, respectively. Table 2 presents the decision and auxiliary variables that are needed to model the problem.
Thus, we can define the problem through Eqs.
(1)-(12). (1) min TEC (2) Subject to: The objectives of the problem are to minimize, simultaneously, the makespan and the total energy cost, defined by Eqs. (1) and (2), respectively. The set of constraints (3) ensures that every job j 2 J is allocated on a machine has a single operation mode, and ends its execution inside the planning horizon. Constraints (4) define that if the job k is assigned to the machine i immediately after the job j, then the start time of the job k must be greater than the sum of the end time of the job j and the setup time between them. It is important to highlight that for the previous model to be valid, the setup and processing times must satisfy the triangular inequality, as defined by Rosa & Souza (2009). The set of constraints (5) determines a lower bound for the makespan. Constraints (6) and (7) define a lower bound for the partial energy cost in the on-peak hours (PEC on ) and the total energy cost for off-peak hours (PEC off ), respectively. It is important to note that a job can be partially performed in the on-peak hours and partially in the off-peak hours and that the total energy cost is directly related to the energy price and the job execution time. Constraint (8) ensures a lower bound for the total energy cost. Constraints (9)-(12) define the domain of the decision and auxiliary variables of the problem.
The calculation of the energy cost of a job j depends on its execution time during the onpeak and off-peak time. Thus, there are six possible cases: Case 1: The job j starts and ends before the on-peak hours; Case 2: The job j starts before the on-peak hours and ends in the on-peak hours; Case 3: The job j starts and ends in the on-peak hours; Case 4: The job j starts during the on-peak hours and ends after the on-peak hours; Case 5: The job j starts and ends after the on-peak hours; Case 6: The job j starts before the on-peak hours and ends after the on-peak hours.
To illustrate cases 1 to 5, let Fig. 1. It shows the execution of five jobs N ¼ f2; 4; 1; 5; 3g in the scheduling of a single machine i ¼ 1 in a single operation mode l ¼ 1 on day t ¼ 1 of the planning horizon. Let the start of the on-peak hours (startp 1 ) equal to 18; the end of the on-peak hours (endp 1 ) equal to 21; the multiplication factor of power ( l ) equal to 1; the energy consumption of machine at normal operation (p 1 ) equal to 100; the energy tariff in the on-peak hours (ET on ) equal to 0:10$=KWh and in the off-peak hours (ET off ) equal to 0:05$=KWh; the multiplication factor of speed v l equal to 1. In this example, we consider discretization in hours. This figure shows that jobs 4, 1, and 5 are performed in the on-peak hours, partially or totally, and jobs 2 and 3, in turn, in the off-peak hours.
For this example, Eqs. (6) and (7) are reduced to Eqs. (13) and (14) below: 1 Â 100 Â 0:10 Â 24 24 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl}  Table 3 illustrates the contribution of each job to the total energy cost, according to the example in Fig. 1. The column "# Job" represents the job, the column "Case" shows the contemplated case, and the columns "Contr. on-peak" and "Contr. off-peak" show the contributions of the job to the energy cost of each job in the on-peak and off-peak hours, respectively.
The total energy cost found to the schedule shown in Fig. 1 is 105.  To illustrate case 6, consider Fig. 2. It shows the execution of three jobs N ¼ f2; 1; 3g on a single machine i ¼ 1 in operating mode l ¼ 1 during day t ¼ 1 of the planning horizon. Let also the start of the on-peak hours (startp 1 ) equal to 18; the end of the on-peak hours (endp 1 ) equal to 21; the multiplication factor of power ( l ) equal to 1; the energy consumption of machines at normal operation (p 1 ) equal to 100; the energy tariff in the on-peak hours (ET on ) equal to 0.10 $=KWh and in the off-peak hours (ET off ) equal to 0:05 $=KWh; and the multiplication factor of speed equal to 1. Such as in the previous example, we consider discretization in hours. This figure shows that job 1 is performed in the on-peak hours and jobs 2 and 3, in turn, in the off-peak hours.
The contribution of the job 1 to the energy cost in the on-peak hours is 30, and the contribution to the cost in the off-peak hours is 50.
Thus, calculating similarly to the previous example, we conclude that the total energy cost for the schedule shown in Fig. 2 is 155.

WEIGHTED SUM METHOD
We used the weighted sum method (Marler & Arora, 2004) to solve the multi-objective optimization problem addressed using a mathematical programming solver. This method converts the multi-objective problem into a single objective problem using the weighted sum of the objectives.
For this, consider Eq. (15): where: • a: real number in range [0, 1]; • jHj: represents the cardinality of the set H; • Cost max : is the estimate for the maximum energy cost used to normalize the total energy cost. It is calculated using a heuristic, as shown in "Initial Population"; The problem constraints are those defined by Eqs.
(3)-(12). Algorithm 1 describes all the steps of the weighted sum method implemented. Algorithm 1 receives as input: the set D with the values for a and the time limit. In line, we initialize the non-dominated set (NDS) as empty. Then, we execute the loop defined between lines 2-7 for each value a. In line 3, we obtain the result from the execution of the model. Then, we get the Makespan and TEC values resulting from the model execution. Then, in line 6, we add the solution obtained to the NDS. Finally, in line 8, the method returns the generated non-dominated set. In the weighted sum method, the decision-maker must define a weight for each objective function. The value of this weight reflects the relative importance of each objective in the overall solution. We adopted several combinations of weights to find the most significant number of optimal Pareto solutions to the problem addressed.
We used the following parameters for Algorithm 1: • The set D ¼ f0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1g with the possible values for a; • The time limit for each execution of the mathematical model, defined as time_limit ¼ 800 Â n Â lnðmÞ sec for each a value, where m is the number of machines and n is the number of jobs; • sizeD ¼ 144: to discretize the day at intervals of 10 min each.

PROPOSED NSGA-II
The problem addressed belongs to the NP-hard class because it is a generalization of the identical parallel machine scheduling problem, which is NP-hard (Garey & Johnson, 1979).
For this reason, we used the NSGA-II multi-objective algorithm (Deb et al., 2002) to address the problem since there are in literature many reports of successful use of this algorithm (Deb, UdayaBhaskaraRao & Karthik, 2007;Liu et al., 2014;Wang et al., 2017;Babazadeh et al., 2018). This algorithm is an alternative to the exact method described in the previous section to find an approximation of the Pareto-optimal front in large instances in an adequate computational time for decision-making. Algorithm 2 describes how the implemented NSGA-II works. Algorithm 2 receives the following input parameters: the population size (size pop ), the probability of mutation (prob mut ), and the stopping_criterion. In line 2, we created an initial population P 0 . Then, in the main loop (lines 4-25), we combine the parent P t and offspring population Q t to generate a new population R t (line 5). In line 6, we apply the fast non-dominated sorting method to divide the population R t into non-dominated sets, called fronts, F 1 ; F 2 ; . . . ; F k . A front F i dominates another F j , if and only if, i < j and R t ¼ F 1 [ F 2 . . . F k . In lines 9-13, we select the best frontiers of F to include in the population P t+1 . We repeat this procedure as long as it is possible to include a new frontier in P t+1 without exceeding the population size. Then we check the size of the population Algorithm 1 Weighted sum method. input: D ¼ fa 1 ; a 2 ; Á Á Á ; a n g, time_limit obtained. If it is not exactly size pop , then we order the next frontier i of F that has not yet been included in P t+1 , according to the crowding distance, and we select the size pop -|P t+1 | first to fill all spaces of the population P t+1 . In lines 22 and 23, we apply the crossover and mutation operators in P t+1 to generate the population Q t+1 . The following subsections describe how an initial population is generated and the crossover and mutation operators, respectively.

Initial population
The initial population of the NSGA-II contains size pop individuals. Two of them are constructed through a greedy strategy, one of which considers only the objective of Algorithm 2 NSGA-II.
input :size pop , prob mut , stopping_criterion 1 P 0 )Generate initial population of size pop individuals ; minimizing the makespan. In this case, we always choose the operation mode related to the highest speed factor. The other individual considers only the total energy cost. In this case, we choose the operation mode related to the lowest consumption factor. The other individuals (size pop -2) of the initial population are randomly generated. Algorithm 3 describes the greedy strategy for generating an individual to the initial population.
Algorithm 3 starts with an empty initial individual (line 1). The loop between lines 2 and 7 allocates each job n on the machines. Therefore, first, we randomly select a job j, which has not yet been allocated (line 3). Then, we identified in the best machine i best of the individual s and the best position pos to insert the job j (line 5). In this case, we consider only one of the objectives of the problem: minimize the makespan or the total energy cost. Then, we allocate job j in position pos of machine i best in individual s (line 6). At the end of this procedure, we return a valid individual s (line 8).

Crossover
We used the binary tournament selection method to choose each pair of individuals for the crossover operator. We run two tournaments with two individuals each and select the winner of each tournament for the crossover. In our approach, the dominant individual wins the tournament. If both individuals are non-dominated, then we randomly choose an objective and use it to define the winner of the tournament. Figure 3 illustrates the crossover between two individuals. After selecting two individuals named parent 1 and parent 2, respectively, we applied the crossover operator to generate new individuals. We adopted the One Point Order Crossover operator from Vallada & Ruiz (2011) adapted to the parallel machine problem. We describe its operation below: 1. We define, at random, the crossover points of each machine, as shown in Fig. 3A; 2. We generate two offspring. The first receives the genes to the left of the crossover point defined on each machine of parent 1. The second receives the genes to the right, as shown in Fig. 3B; 3. We mark in parent 2 the genes present in each offspring, as shown in Fig. 3C; 4. We add the unmarked genes of parent 2 to offspring 1 and 2. We add these genes in the position that results in the lowest value for the objective function, whereas this problem has two objective functions, so we randomly select one at each crossover. In the end, we will have two new individuals, as shown in Fig. 3D.
We repeat this procedure until to generate size pop new individuals.

Mutation
We implemented three mutation operators (Swap, Insert, and Swap of operation mode), described below. These operators maintain the population's genetic diversity and reduce the chances that the algorithm getting stuck at a local optimum.

Swap
The swap operator works by randomly choosing a job j 1 , initially allocated in position a on machine i 1 and another job j 2 allocated in position b of machine i 2 . Then, we allocate job j 1 in position b of machine i 2 , and we allocate job j 2 in position a of machine i 1 . Figure 4 illustrates the swap between two jobs j 1 and j 2 . They are initially allocated on machines i 1 and i 2 , respectively. After swapping, we allocate job j 2 on machine i 1 and job j 1 on machine i 2 .

Insertion
The insertion operator consists of randomly choosing a job j 1 allocated at position a of machine i 1 and randomly choosing position b of another machine i 2 . Then job j 1 is removed from machine i 1 and inserted into position b on the machine i 2 . Figure 5 illustrates this move. The left side shows the scheduling before, and the right side shows it after the insertion move.

Mode change
In the operation mode change operator, we randomly select a job and change its operation mode at random. Figure 6 illustrates the application of this operator to the scheduling of offspring 1 of Fig. 3D, which involves 12 jobs. As can be seen, job 8, which is in the sixth position of machine 2, has operation mode 3. After the application of this operator, the job changes to operation mode 1.
The NSGA-II algorithm implemented performs a mutation with a probability equal to prob mut .

COMPUTATIONAL EXPERIMENTS
We coded the NSGA-II algorithm in the C++ language and implemented the mathematical model with the Gurobi 7.0.2 API ( Gurobi Optimization, 2020). We performed the tests on a microcomputer with the following configurations: Intel (R) Core (TM) i7-4510U processor with a frequency of 2 GHz, 16 GB of RAM, and 64-bits Ubuntu 19.10 operating system. Furthermore, we compared the performance of the NSGA-II algorithm with two basic multi-objective algorithms: MOGA of Murata & Ishibuchi (1995) and NSGA-I of Srinivas & Deb (1994). These algorithms use the same NSGA-II crossover and mutation operators described in "Crossover" and "Mutation" and the same stopping criterion.
This section is organized as follows. "Instances Generation" and "Metric description" describe the instances and the metrics used to assess the quality of the set of nondominated solutions generated by the algorithms. "Tuning of Algorithms' Parameters" shows the parameter calibration of the algorithms. "Results" reports the results.

Instances generation
Since, as far as we know, there is no set of instances in the literature for the problem addressed, we adapted two instance sets from the literature that deal with similar problems. The first one, called set1, is a subset of the small instances of Cota et al. (2018) satisfying the triangular inequality, in which we add information about the energy price on-peak and off-peak hours. The second set, named set2, is also a subset of the large instances of Cota et al. (2018), in which we included instances of 750 jobs. Table 4 shows the characteristics of these sets of instances, which are are available in Rego, Cota & Souza (2021).

Metric description
The quality of the set of non-dominated solutions found by a method can be analyzed under three aspects: convergence, extension, and distribution. Convergence refers to the proximity of this set to the Pareto-optimal front or to the reference set. In turn, the extension assesses the breadth of the region covered by this set of non-dominated solutions. Finally, the distribution refers to the uniformity of the spacing between the solutions within the set. The hypervolume metric is sensitive for convergence and extension and the HCC metric, in turn, is sensitive for distribution and extension.

Hypervolume
The hypervolume or S metric is a measure of quality often used to compare results from multi-objective algorithms and it was proposed by Zitzler & Thiele (1998). This metric has the ability to provide a combined estimate of convergence and diversity of a set of solutions (Deb, 2014). The hypervolume of a non-dominated set measures the area covered or dominated by this set's points, limited by a Reference Point (RP). In maximization problems, it is common to use the point (0; 0), while in minimization problems, an upper bound, also known as the Nadir point, is used to limit this area. In Fig. 7, the shaded area defines the hypervolume of the set of non-dominated solutions A for a problem with two objective functions, in which the point ðmax x ; max y Þ defines the upper limit. We denote by HVðAÞ the hypervolume of a set of non-dominated solutions A relative to a reference point (Deb, 2014). Guimaraes, Wanner & Takahashi (2009) to evaluate the quality of non-dominated sets that were obtained by  (2016) multi-objective optimization algorithms. It is based on hierarchical clustering techniques, such as the Sphere Counting (SC) (Wanner et al., 2006) metric. According to Guimaraes, Wanner & Takahashi (2009), the extent and uniformity of a non-dominated set is directly proportional to the HCC value calculated for it. We calculate the HCC for a set of points A as follows:

Hierarchical cluster counting (HCC) is a metric proposed by
1. Initially, we create a grouping for each point in the set, and consider that each group created is a sphere of radius equal to zero; 2. Then, we calculate the minimum distances of fusion, which is a new assumed value for the radius of the spheres capable of decreasing the number of clusters; 3. We group the points into the same cluster; 4. We repeat steps 2 and 3 until all the points belong to the same grouping; 5. We obtain the HCC value by adding, in each iteration, the product between the distances of fusion and the amount of grouping formed.
Consider Fig. 8, which illustrates the steps to calculate the HCC for a six-point nondominated set. Figure 8A shows the first cluster in which each point is in a different sphere with radius zero. Figure 8B shows the points grouped into five spheres, each with radius r 1 .  Figure 8C shows the points grouped into four spheres, each with radius r 2 . Figure 8D shows, in the Cartesian plane, the relationship between the number of clusters and the the radius of each cluster. The gray region area represents the value of the HCC metric for the set shown in Fig. 8.

Tuning of algorithms' parameters
The parameter values used in the NSGA-II, MOGA, and NSGA-I algorithms can affect its performance. Therefore, we use the Irace package (López-Ibáñez et al., 2016) to find the best values for these parameters. Irace is a software encoded in the R language that automatically performs an iterative procedure to find the most appropriate optimization algorithm settings. Table 5 shows the test scenarios used. In the first column, we present the description of each NSGA-II parameter; in the second column, the set of values tested for each parameter, and in the third column, the best value returned by Irace.

Results
In this section, we presented the results of two experiments used to evaluate the NSGA-II algorithm's performance. First, we compare the NSGA-II results with those of the weighted sum method in instances with up to 10 jobs and two machines. Then, we compared the performance of the NSGA-II algorithm with that of the MOGA and NSGA-I algorithms in larger instances, with up to 750 jobs and 20 machines. In both cases, we executed the algorithms 30 times in each instance.
We used the Relative Percentage Deviation (RPD HV i ) to evaluate the HV metric for each method Alg and instance i. It is calculated by Eq. (16): where HV RS i is the hypervolume value of the reference set in 30 executions of the algorithm Alg in the instance i. v can assume three values: min, max and avg, representing, respectively, the smallest, the largest, and the average of the hypervolume in 30 executions of the algorithm in the instance i.

NSGA-II Â Gurobi
In this section, we reported the results of the NSGA-II algorithm and the exact method in the set of instances set1. Table 6 shows the reference set data for these instances. In this table, the first two columns show the instance identifier and name, respectively. The next two columns show the number of jobs and machines, respectively. The fifth column presents the hypervolume of this reference set. Finally, the last column presents the reference point (C max ; TEC) used to calculate the hypervolume of each instance.
Tables 7 and 8 show the method results concerning the RPD HV and the HCC metrics. In these tables, the first column identifies the instance. The next three columns show the minimum, maximum and average values of the RPD HV and HCC, respectively, concerning the NSGA-II method. The fifth column shows the standard deviation of the results. The seventh column shows the upper bound (UB) returned by the exact method  concerning the RPD HV and HCC metrics. Finally, the sixth and eighth columns show the times, in seconds, of the NSGA-II and the exact method, respectively. We can see in Table 6 that in the set of instances set1, the RPD HV of the NSGA-II algorithm is lower in all comparisons (min, max e avg) compared to the exact method. We can also verify that the standard deviation of the NSGA-II algorithm in instances with up to 8 jobs (ID 1, 2, 3) is equal to zero. In other words, in these instances, all NSGA-II executions obtained the same set non-dominated. We also can note that the execution time of the NSGA-II is much less than that of the exact algorithm.
Concerning Table 8, we noted that the NSGA-II algorithm has a higher HCC value than the exact method in all comparisons. Thus, we can conclude that the non-dominated set of the NSGA-II method has better diversity and uniformity. Figure 9 presents the non-dominated sets obtained by an NSGA-II execution and the other using the exact method in two randomly selected instances. The first instance has six jobs and two machines, and the second has 10 jobs and two machines. In this figure, the blue dots represent the solutions of the NSGA-II, and the red dots represent the solutions of the exact method. The x axis represents the makespan, and the y axis represents the total energy cost.
We can notice in Fig. 9A that the NSGA-II non-dominated set contains all the solutions found by the exact method, plus two additional solutions. In this example, the two methods have the same amplitude, and the NSGA-II was able to find a set of solutions with higher cardinality. On the other hand, Fig. 9B shows that the non-dominated set contains six of the eight solutions found by the exact method and eight other solutions. In this example, the exact method showed better amplitude than the NSGA-II, but this obtained higher cardinality than the exact method.
Considering these results, we observed that the NSGA-II finds good quality solutions and requires less computational time than the exact method.

NSGA-II in large instances compared with other literature algorithms
Here, we presented the results of the NSGA-II, MOGA, and NSGA-I algorithms in the set of instances set2. Table 9 shows the reference set data for the instances of set2. Its organization follows the same description as the previous section's tables. Tables 10 and 11 report the RPD HV and HCC values, respectively, of the algorithms in the set of instances set2. As can be seen, NSGA-II achieved the best average results regarding hypervolume in all instances. On the other hand, it won in 2/3 of the instances concerning the HCC. These results indicate that the NSGA-II algorithm outperforms MOGA and NSGA-I algorithms concerning these metrics. Figures 10A and 10B illustrate the Pareto front obtained from each algorithm in two different instances. The first instance has 50 jobs and 20 machines, and the second has 750 jobs and 10 machines. As can be seen, the NSGA-II produced sets of non-dominated solutions with good convergence, diversity, uniformity, and amplitude when compared with others algorithms. Figures 11 and 12 show the boxplot of the RPD and HCC results, respectively. To verify if the differences between the results presented by the algorithms are statistically significant, we performed the hypothesis tests below.
In the first test, l 1 , l 2 , and l 3 represent the average RPD HV for NSGA-II, MOGA, and NSGA-I, respectively. In the second test, l 1 , l 2 , and l 3 represent the average HCC for the algorithms in the same sequence.  Before performing the hypothesis tests, we need to choose the test type, parametric or non-parametric. Generally, parametric tests are more powerful; however, to use them, it is necessary to satisfy three assumptions: 1. Normality: every sample must originate from a population with normal distribution, 2. Independence: the samples must be independent of each other, 3. Homoscedasticity: there must be equality of variances across samples.
We applied the Shapiro-Wilk normality test to the samples with the RPD HV and HCC values from each algorithm and showed its results in Table 12.
With confidence level of 95% (a ¼ 0:05), we can say that the results presented in Table 12 not present evidence that the results of the algorithms come from a population with normal distribution.
Thus, we applied the paired Wilcoxon signed-rank non-parametric test (Wilcoxon, 1945). Table 13 reports the results of this test obtained by the NSGA-II, MOGA, and NSGA-I algorithms for the samples of the RPD HV and HCC values.
According to Table 13, there are significant statistical difference between each pair of algorithms. Thus, these tests confirm the results in Tables 10 and 11, indicating that NSGA-II outperforms both MOGA and NSGA-I.

CONCLUSIONS
This paper addressed the unrelated parallel machine scheduling problem with sequencedependent setup times for minimizing the total energy cost and the makespan.
To solve it, we developed a mixed-integer linear programming formulation and applied the weighted sum method to generate sets of non-dominated solutions to the problem. Considering that this formulation could not solve larger instances of the problem, we adapted the NSGA-II algorithm to deal with them.
To test the two solution methods, we adapted instances of the literature to contemplate all the problem's characteristics. We divided these instances into two groups. The first  group consists of small instances with up to 10 jobs and 2 machines, while the second group contains large instances, with up to 750 jobs and 20 machines. We evaluated the methods concerning the hypervolume and HCC metrics. Initially, we used part of the set of instances to tuning the parameter values of the NSGA-II algorithm. To this end, we used the Irace package.
We validated the NSGA-II algorithm in small instances, comparing its results with those produced by the exact method. The NSGA-II algorithm showed good convergence and diversity. Besides, it spent much shorter CPU time than that required by the exact method.
In large instances, the results showed that the NSGA-II outperforms, with 95% confidence level, MOGA and NSGA-I algorithms concerning the hypervolume and HCC metrics. Thus, the proposed algorithm finds non-dominated solutions with good convergence, diversity, uniformity, and amplitude.
As future work, we suggest testing other crossover and mutation operators for the NSGA-II. Besides, we intend to implement other multi-objective algorithms, such as Strength Pareto Evolutionary Algorithm 2 (SPEA2), Niched Pareto Genetic Algorithm (NPGA), Pareto Envelope-based Selection Algorithm II (PESA-II), Multi-objective Variable Neighborhood Search (MOVNS), and Multi-objective Evolutionary Algorithm Based on Decomposition (MOEA/D).