MINIMIZING TOTAL TARDINESS IN A TWO-MACHINE FLOWSHOP WITH UNCERTAIN AND BOUNDED PROCESSING TIMES

. The two-machine flowshop scheduling problem with the performance measure of total tardiness is addressed. This performance measure is essential since meeting deadlines is a crucial part of scheduling and a major concern for some manufacturing systems. The processing times on both machines are uncertain variables and within some lower and upper bounds. This is due to uncertainty being an integral part of some manufacturing settings, making it impossible to predict processing times in advance. To the best of the author’s knowledge, this problem is addressed for the first time in this paper. A dominance relation is established and nineteen algorithms are proposed. These algorithms are extensively evaluated through randomly generated data for different numbers of jobs and four different distributions, representing both symmetric and non-symmetric distributions. Computational experiments show that the presented algorithms perform extremely well when compared with a random solution. In particular, the best of the considered 19 algorithms reduces the error of the random solution by 99.99% and the error of the worst algorithm (among the 19 algorithms) by 99.96%. The results are confirmed by a test of hypothesis and this algorithm is recommended.


Introduction
Performance measures in the scheduling literature may be categorized into two groups: those based on job completion times such as makespan and mean completion time, and those based on job due dates such as maximum lateness and total tardiness.The former is used more widely, while the latter is used when a penalty exists for completing jobs past their due dates.The performance measure of total tardiness is used when the objective is to minimize the sum of the gaps between job due dates and their completion times, given that job completion times are later than their due dates.The scheduling cost gets higher as the sum of the gaps increases, Allahverdi and Aydilek [7].Allahverdi and Aydilek [7] state that the total tardiness cost may also include the loss of goodwill, penalty cost in contracts, and damaged reputation.Allahverdi et al. [9] point out that meeting customer due dates is a main concern for some manufacturing systems, while Ding et al. [17] point out that customer satisfaction is attained by fulfilling job due dates.Furthermore, performance measures such as total tardiness are used to evaluate a manager's performance in automobile manufacturing and semiconductor industries, Seo et al. [27] or petroleum refineries, Arabameri and Salmasi [11].
Thus, considering the performance measure of total tardiness has many advantages.It should be noted that such problems cannot be answered by other performance measures, such as maximum lateness which is only concerned with the maximum gap between the due dates and completion times rather than the total of nonnegative gaps.
We now introduce some notation.Let  , denote the processing time of job  on machine  and let  , and   , denote the lower and upper bounds of  , , respectively.Let  denote the number of jobs.Finally, let   denote the due date of job .Various scheduling environments have been studied for the case when processing times are uncertain within some given upper and lower bounds.Allahverdi and Sotskov [8] investigated the case of  2| , ≤  , ≤   , | max where they present some dominance relations for the problem.Allahverdi et al. [10] provide heuristics for the problem of  2| , ≤  , ≤   , , no-wait| max , while Aydilek et al. [14] provide more effective heuristics for the same problem.Allahverdi and Allahverdi [4] propose twelve algorithms based on Johnson's algorithm for the case of four machines with the objective of minimizing the makespan,  4| , ≤  , ≤   , | max .Allahverdi and Allahverdi [5], on the other hand, investigate the case of four machines with the objective of minimizing the total completion time,  4| , ≤  , ≤   , | ∑︀   .Scheduling with uncertain processing times, within lower and upper bounds, have also been considered for other shop scheduling environments, e.g., Sotskov and Egorova [28], Sotskov et al. [29,30], Arık [12], Abtahi and Sahraeian [1], Horng and Lin [21], and Xu et al. [36].Allahverdi [3] surveys the scheduling problems with uncertain job descriptors within lower and upper bounds.

Problem description
In this paper, we study the stochastic problem of the two-machine flowshop with uncertain processing times (within upper and lower bounds) with the performance measure of total tardiness.That is, we consider the problem  2| , ≤  , ≤   , | ∑︀   .A dominance relation is provided and a total of 19 algorithms are proposed and compared with each other and with a random sequence of jobs.The dominance relation is established in the Section 3 and the algorithms are presented in Section 4. Computational experiments are given in Section 5 while concluding remarks are made in Section 6.Before we get into the addressed problem, we discuss some related papers in Section 2.
Table 1 gives the total tardiness for each instance.
As seen from Table 1, the best sequence for the first instance is (1, 2, 3), for the second instance is (2, 1, 3), and for the third one is (2, 3, 1).Hence, even for such a simple problem with only 3 jobs there is no optimal solution that solves this scheduling problem.Furthermore, these are just three instances that we look at in this example; there are an infinite number of instances and there is no way to know the best sequence for each instance.This illustrates the difficulty of this scheduling problem, especially in real world scenarios where the number of jobs is much larger than 3.Even for such a simple case, there is no known solution.
In this paper, we attempt to solve this problem through algorithms and theoretical means.In particular, along with a dominance relation we provide 19 algorithms to help solve this problem and compare these algorithms through statistical analysis to determine the most effective one.

Bibliographic review
Papers addressing the problem of total tardiness include Framinan and Listen [19] who propose a greedy algorithm and Vallada and Ruiz [32] who propose a genetic algorithm for the flowshop scheduling problem with the objective of minimizing total tardiness.Moreover, Vallada et al. [33] present an evaluation and review of heuristics that exist in the literature for flowshop scheduling problems with the same objective function.For the case of the two-machine, Bożejko et al. [15] proposes new elimination properties for accelerating the local search algorithm.Furthermore, Saber and Ranjbar [25] propose a mixed-integer programming model for the permutation flowshop to minimize both total tardiness and total carbon emissions.They also present a multiobjective decomposition-based heuristic algorithm for the problem.Rakrouki et al. [24] propose five different algorithms, including the genetic algorithm, to minimize total tardiness for the two-machine flowshop problem with the unavailability periods of the machines.de Athayde Prata et al. [16] address the flowshop scheduling problem where the machines can operate at different speeds with the objective of minimizing total tardiness.They propose two heuristic algorithms which are based on mixed-integer linear programming formulation, while Wang et al. [35] present a heuristic for the distributed flowshop scheduling problem with sequence-dependent setup times to minimize total tardiness.The papers mentioned above, and in fact the vast majority of scheduling literature, consider processing times to be deterministic, Keshavarz and Salmasi [22] and Seidgar et al. [26].Though this may be the case in some manufacturing settings, it is not so in others, Wang and Choi [34] and Gonzalez-Neira et al. [20].Uncertainty is an integral part of manufacturing and production, as a multitude of factors may cause processing times to change.These factors include the condition of available tools, machine operator fatigue, and disruption in the manufacturing environment.Other factors include the absence of past data used to predict processing times, the lack of experience, or untested processing tools, Tayanithi et al. [31].Therefore, it is essential to consider cases when processing times are uncertain where only the upper and lower bounds are known in advance.

Dominance relations
In this section, a dominance relation is established for the two-machine flowshop scheduling problem.Let  , represent the processing time of job  on machine ( = 1, 2).Let  [,] be the processing time of the job in position  on machine  for a given sequence.Likewise, let   and  [] represent the due date of job  and the due date of the job in position , respectively.Furthermore, let  [] symbolize the tardiness of the job in position .Processing times are uncertain and satisfy the inequality  , ≤  , ≤   , , where  , and   , denote lower and upper bounds on the processing time  , , respectively.Let and Thus, ST [,] is the sum of all processing times on machine  starting from position 1 up to position , while  [𝑝] is the difference between the sum of processing times on machines 1 and 2 starting from position 1 up to position  for machine 1 and  − 1 for machine 2. The maximum of  [1] , . . .,  [] is denoted by ∆ [𝑝] .Then, the tardiness of the job in position  [2] can be computed as Hence, total tardiness (TT) is computed as Consider two job sequences  1 and  2 where the sequence  1 has job  in an arbitrary position  and job  in an arbitrary position , where  < .The sequence  2 is derived from  1 by interchanging the jobs in positions  and  (i.e., job  is in position  and job  in position ).The sequences  1 and  2 are written as  1 = { 1 , ,  2 , ,  3 } and  2 = { 1 , ,  2 , ,  3 }, where  1 ,  2 , and  3 are subsequences denoting the jobs in positions 1, . . .,  − 1, those in  + 1, . . .,  − 1, and those in  + 1, . . ., , respectively.
Theorem 2. If the following three conditions are true, Then TT( 2 ) ≤ TT( 1 ).Therefore, job  should precede job  in order to minimize total tardiness.
Proof.Note that TT = ∑︀  =0  [𝑝] .As in the Theorem 1, we look at different cases separately.We will later combine them to get the result.
For each , we have Case 4.  = .
For each , we have Finally, we can combine all these cases to obtain, by assumption c.
Notice that  denotes the number of jobs in  2 .Even if  = 0, i.e., jobs  and  are adjacent, this inequality holds by condition  in the statement of Theorem 2. Of course, the inequality holds if  > 1.

Algorithms
The problem of minimizing total tardiness in a single machine with deterministic job descriptors is known to be NP-hard, Du et al. [18].It follows that this problem is certainly NP-hard for the case of 2 machines with stochastic job descriptors.Thus we aim to develop algorithms to minimize total tardiness for the problem in question.
To minimize the total tardiness, we compare the sequences obtained from 19 different algorithms, along with a random sequence The algorithms are described in the following table.The sequence of jobs associated to each algorithm is obtained by ordering the associated list in Table 1 in nondecreasing order.

Steps for the algorithm
These are the steps used in the algorithm.Let  be the number of jobs and  the number of replications.For each , do the following: (1) Create an empty list called alg list with 20 empty subsets (one for each algorithm).
(2) Create an empty list called er alg list with 20 empty subsets (one for each algorithm).
(3) Perform these steps  times: (a) Generate upper and lower bounds for processing times.For each job , the upper bound   , was generated from values between 1 and 100 and the lower bound  , was generated from [  , − ,   , ] (where  was uniformly generated from the interval [0, 50]).(b) Randomly pick processing times for machines 1 and 2 lying between the lower and upper bounds of the processing times.(c) Based on the upper and lower bounds of the processing times, generate due dates (a list of four due dates each corresponding to a combination of  and , explained below).(d) For each due date combination (there are a total of four of them), do the following: (1) Generate a sequence based on each algorithm.
(2) Create a list seq list which contains the sequence obtained from EDD as its first element, followed by the sequences induced by the algorithms, followed by a random sequence.(3) Create another list tt list which stores the total tardiness for each sequence in seq list.(4) Finally, append the results to the algorithm list we created above, alg list (so the total tardiness we got from the 1st algorithm is stored in the first sublist of alg list, the second total tardiness associated with the 2nd algorithm is stored in the 2nd sublist, and so on).
(5) Compute the errors of the algorithms as follows: (TT H − TT min)/(TT max − TT min) where TT H is the total tardiness of the algorithm, TT min is the minimum total tardiness (that of the best algorithm), and TT max is the greatest total tardiness (that of the worst algorithm).( 6) Similarly, append the errors obtained to the list er alg list.(4) Now take the average of the errors obtained from all replications for each algorithm.

Computational experiments
Computational experiments were carried out using the algorithms described above.To test the effectiveness of the algorithm for a wide range of cases, the number of jobs was varied from 100 to 1000 with an increment of 100.That is, the test was performed for  = 100, 200, 300, . . ., 1000.
Given a value of , upper and lower bounds for job processing times were generated for each replication.The upper bounds   , were uniformly generated from the range [1,100].Then the lower bounds  , were generated from [  , − ,   , ], where  was uniformly generated from the interval [0, 50].Negative lower bounds were replaced by 1.
Once the upper   , and lower bounds  , of processing times were generated, an instance  , was generated between  , and   , .Even though processing times are assumed to be uncertain, instances between   , and  , had to be generated to carry out computational experiments.Nonetheless, it would not be wise to generate the instances using only the uniform distribution, as it is possible that instances of processing times follow a different distribution.Accordingly, four different distributions were considered: uniform, positive linear, negative linear, and normal.Since the uniform and normal distributions are symmetric, while the positive and negative linear distributions are skewed, the considered distributions represent a wide range of distributions, Aydilek et al. [13].
Finally, due dates were generated based on the upper and lower processing times in the following way: Johnson's algorithm was applied to  ,1 and  ,2 to obtain a sequence, which was used to calculate the makespan  max .Denote the result by  max ().Similarly, another makespan was obtained by applying Johnson's algorithm to   ,1 and   ,2 , denoted by  max ( ).Then  max ( ) was computed as the average of  max () and  max ( ).Finally due dates were generated uniformly from the range [ max ( )× (︀ where  is the tardiness factor and  is the relative range of the due dates, see Kim [23] and Allahverdi and Aydilek [6].Such a method for generating due dates is standard in scheduling literature.The values for  and  are usually taken from the range 0 to 1.For the computations in this paper, various  and  values were tested to see which produced reasonable due dates.In particular, tests were performed to select two values  and  for  and two values  and  for .The values of , , ,  were selected form the list (0.1, 0.3, 0.5, 0.7, 0.9).Every combination of , , ,  (i.e., ( = ,  = ), ( = ,  = ), ( = ,  = ), ( = ,  = )) values were checked, so that a total of 4 × 3 × 4 × 3 = 144 cases were analyzed.Finally,  = 0.5, 0.7 and  = 0.3, 0.5 were found to generate reasonable due dates.
Furthermore, during this process, it was discovered that algorithm 4 (alg 4) performed the best for different values of  and .Therefore, once the  and  values were determined, attempts were made to improve algorithm 4 even further.Initially, the formula used for algorithm 4 was   +(  ,1 +  ,2 ).The computations were carried out for the formulas   +(  ,1 +  ,2 ), where  = 2, 3, 5, 7, 10, and  = 10 was found to be the most effective.Accordingly, the formula associated with algorithm 4 was modified as   + 10(  ,1 +  ,2 ).
The error of a given algorithm was calculated as where TT() is the total tardiness of the algorithm, Min() is the total tardiness of the best performing algorithm, and Max() is that of the worst performing algorithm.This value for the error was previously used by Allahverdi and Aydilek [6].Notice that by definition the worst algorithm gets an error of 100 while the best algorithm gets an error of zero.
For each value of  (10 different values are considered), a total of 50 replications were carried out so that the output is the average error of those 50 replications.Finally, the four different distributions were considered to randomly generate an instance based on the upper and lower bounds for the processing times.Therefore, a total of 10 × 50 × 4 = 2000 cases were considered to test the algorithm.

Algorithm comparisons
The errors of the algorithms for the four combinations of  and  are given in Tables 3-6 for the case of the uniform distribution.The tables for the other three distributions (normal, positive linear, negative linear) are omitted to save space where only the overall average errors, over ,  , and  values, are given in Table 7.However, summaries of these tables are given in Figures 5-16.
Figures 1-4 summarize the errors given in Tables 3-6, respectively.As seen in Figure 1, the difference between the errors of the algorithms (alg 1-alg 19) and that of the random sequence (alg 20) is very large.In fact, the gap keeps increasing with increasing  values.Therefore, the random sequence was omitted in the rest of the figures (Figs. 2-16) to present a clearer comparison between the given algorithms.Furthermore, it can clearly be seen in Figure 1 that alg 2, alg 3, alg 5, alg 6, alg 7, alg 8, alg 9, alg 10 produce a very similar outcome.The same is true for alg 11-alg 19.This is true for all the figures, so that representatives (alg 9 and alg 19) were chosen from these two groups of algorithms along with alg 1, which is EDD, and alg 4 (the best performing algorithm) and the rest were omitted to simplify the figures for the remaining ones.
As mentioned previously and as seen in the figures, algorithm 4 performs the best among all the considered algorithms.Furthermore, it appears that alg 2 to alg 10 perform better than alg 1, which is based on EDD (earliest due date), while alg 11 through alg 19 perform worse.
The figures indicate that the 19 algorithms perform better for the  and  combinations  = 0.5,  = 0.3 and  = 0.5,  = 0.5 compared to  = 0.7,  = 0.3 and  = 0.7,  = 0.5.Nonetheless, the performance of the algorithms increases with increasing  values for all of the considered combinations.

Statistical analysis
Statistical analysis of the algorithm comparisons have been conducted through confidence intervals and a test of hypothesis.The 95% and 99% confidence intervals of the algorithm errors are given in Table 8.Notice that the errors of alg 13 and alg 20 do not overlap, indicating that alg 13 performs much better than alg 20.Similarly, the errors of alg 4 and alg 20 do not overlap, and neither those of alg 4 and alg 13 or those of alg 4 and alg 9. Hence, alg 4 is statistically the best performing algorithm.
Test of hypotheses have been conducted through a two-sample  test.The following null (H 0 ) and alternative (H 1 ) hypotheses testing were performed.where (alg ) denotes the overall average error of the algorithm alg .The null hypothesis was rejected at a significance level of  = 0.01.
Thus, the null hypothesis is rejected at a significance level of  = 0.01.
Finally, the performance of alg 4 and alg 9 is statistically compared by using a two-sample  test, since alg 9 is compared to alg 4 in all the figures.As seen in the figures, the performance of alg 9 is the closest to alg 4 among the considered algorithms.The following H 0 and H 1 hypotheses testing are performed.H 0 : (alg 4) = (alg 9) H 1 : (alg 4) < (alg 9).The null hypothesis is rejected at a significance level of  = 0.01 and alg 4 is recommended.

Conclusion
This paper studies the case of a two-machine flowshop where the processing times are uncertain, within some upper and lower bounds.The objective is to find the sequence that minimizes the total tardiness ( 2| , ≤  , ≤   , | ∑︀   ).A dominance relation is established and nineteen algorithms (alg 1-alg 19) are proposed.The algorithms are compared with each other and with a random solution, denoted as alg 20.The proposed algorithms were tested by using four different distributions for generating processing times: uniform, positive linear, negative linear, and normal distributions, representing both symmetric and skewed distributions.
Computational experiments indicate that the worst algorithm, among the proposed nineteen algorithms, reduces the error of a random solution by more than 72%.They also indicated that the best algorithm (alg 4) reduces the error of the worst algorithm (among the proposed nineteen algorithms) by 99.96%.The results are statistically confirmed.Therefore, alg 4 is recommended for the considered problem.
In this paper, setup times were assumed to be included within processing times.However, this assumption may not be valid for certain scheduling environments.Therefore, it would be interesting to consider the problem of minimizing total tardiness for a two machine flowshop with uncertain setup times for future research.Furthermore, it would be interesting to consider the problem of a three machine flowshop with uncertain processing times with the objective to minimize total tardiness.

Figure 2 .
Figure 2. Errors of the four algorithms for  = 0.5 and  = 0.5 -Uniform Distribution.

Figure 3 .
Figure 3. Errors of the four algorithms for  = 0.7 and  = 0.3 -Uniform Distribution.

Figure 4 .
Figure 4. Errors of the four algorithms for  = 0.7 and  = 0.5 -Uniform Distribution.

Figure 5 .
Figure 5. Errors of the four algorithms for  = 0.5 and  = 0.3 -Positive Linear Distribution.

Figure 6 .
Figure 6.Errors of the four algorithms for  = 0.5 and  = 0.5 -Positive Linear Distribution.

Figure 7 .
Figure 7. Errors of the four algorithms for  = 0.7 and  = 0.3 -Positive Linear Distribution.

Figure 8 .
Figure 8. Errors of the four algorithms for  = 0.7 and  = 0.5 -Positive Linear Distribution.

Figure 9 .
Figure 9. Errors of the four algorithms for  = 0.5 and  = 0.3 -Negative Linear Distribution.

Figure 10 .
Figure 10.Errors of the four algorithms for  = 0.5 and  = 0.5 -Negative Linear Distribution.

Figure 11 .
Figure 11.Errors of the four algorithms for  = 0.7 and  = 0.3 -Negative Linear Distribution.

Figure 12 .
Figure 12.Errors of the four algorithms for  = 0.7 and  = 0.5 -Negative Linear Distribution.

Figure 13 .
Figure 13.Errors of the four algorithms for  = 0.5 and  = 0.3 -Normal Distribution.

Figure 14 .
Figure 14.Errors of the four algorithms for  = 0.5 and  = 0.5 -Normal Distribution.

Figure 15 .
Figure 15.Errors of the four algorithms for  = 0.7 and  = 0.3 -Normal Distribution.

Figure 16 .
Figure 16.Errors of the four algorithms for  = 0.7 and  = 0.5 -Normal Distribution.

Table 2 .
Description of each algorithm.

Table 7 .
Errors of the algorithms for all the distributions over ,  , and .

Table 8 .
95% and 99% confidence intervals of the errors of the algorithms.