An improved algorithm to minimize the total completion time in a two-machine no-wait flowshop with uncertain setup times

Since scheduling literature has a wide range of uncertainties, it is crucial to take these into ac- count when solving performance measure problems. Otherwise, the performance may severely be affected in a negative way. In this paper, an algorithm is proposed to minimize the total completion time (TCT) of a two-machine no-wait flowshop with uncertain setup times within lower and upper bounds. The results are compared to the best existing algorithm in scheduling literature: the programming language Python is used to generate random samples with respect to various distributions, and the TCT of the proposed algorithm is compared to that of the best existing one. Results reveal that the proposed one significantly outperforms the best one given in literature for all considered distributions. Specifically, the average percentage improvement of the proposed algorithm over the best existing one is over 90%. A test of hypothesis is conducted to further confirm the results.


Introduction
A two-machine flowshop is a manufacturing model with two machines and a set of jobs, each of which has two operations, where the first operation is performed on the first machine and the second on the second machine. Certain manufacturing settings require that these operations move from the first machine to the next with no idle time in between. This might be necessary, for instance, in cases where heat is involved and waiting would cause certain materials to cool down, thereby negatively affecting the performance, Baker and Trietsch (2009). Such a flowshop where no idle time is permitted is called a no-wait flowshop and is used extensively in many industries, including chemical, plastic, and pharmaceutical. Certain scheduling problems such as aircraft landing, patient scheduling, and bakery production require no-wait flowshops as well, Allahverdi (2016), Hall and Sriskandarajah (1996). Research regarding no-wait flowshops is growing, addressing many problems with different performance measures. Such papers include Ying and Lin (2018) and Li et al. (2018), addressing the makespan and total flow time, respectively. Since uncertainty is common in scheduling problems (Soroush (1999), Soroush (2007)), it is crucial to take these into account while optimizing a certain performance measure. In some manufacturing settings, for instance, certain job descriptions (e.g. processing times, setup times, due dates) are unpredictable. Many scheduling papers address such problems: Seo et al. (2005) addresses the case of minimizing the expected number of tardy jobs given normally distributed processing times, Cunningham and Dutta (1973) and Ku and Niu (1986) consider the case where jobs have processing times that are exponentially distributed, and Kalczynski and Kamburowski (2006) consider the problem where job processing times follow a Weibull distribution.
The time required to prepare a certain machine for a particular operation is known as the setup time of that operation. In particular, si,j denotes the setup time for the operation of job i on machine j. To minimize the total completion time (TCT) of a scheduling problem, it is crucial to consider setup times as well along with processing times. This is especially true for settings where setup times are long enough to make a difference in the total completion time, as neglecting them in such cases considerably impacts the performance. The paper Kopanos et al. (2009) discusses such cases at length. In fact, setup times should be considered separately from processing times in order to eliminate waste, increase productivity, improve resource utilization, and meet deadlines, Allahverdi (2015). Nonetheless, only 10% of the scheduling literature address setup times despite its common presence in manufacturing settings, (Allahverdi (2015) and Kopanos et al. (2009)). In some manufacturing environments, setup times need not be deterministic but are rather unpredictable and prone to change. Hence, it is not only important to consider setup times in a solution but to consider the unpredictability as well, which may be due to a wide range of reasons such as the breakdown of equipment, inadequate crew skills, and a shortage of necessary tools, Kim and Bobrowski (1997). Papers considering uncertain setup times include Allahverdi (2005), Allahverdi (2006a), Allahverdi (2006b), Allahverdi et al. (2003), addressing the problems 2 , , respectively. The first establishes a dominance relation for a two-machine flowhsop with respect to makespan and total completion time. The rest consider the same problem with respect to Cmax (makespan), ∑ (total completion time), and Lmax (maximum lateness), respectively. Allahverdi and Allahverdi (2020) address the scheduling problem of minimizing total completion time with uncertain setup times where only the lower and upper bounds are known. It establishes an algorithm to minimize the total completion time of such a problem. In this paper, we propose a new algorithm which significantly outperforms the one given in paper Allahverdi and Allahverdi (2020). The two algorithms are compared for four different distributions: uniform, positive linear, negative linear, and normal. Furthermore, a test of hypothesis is conducted to confirm the effectiveness of the new algorithm. The remainder of the paper is as follows: Section 1 explains the proposed algorithm and how it is applied. Section 2 describes the test that was conducted using the programming language python to compare the proposed algorithm to the best existing one in literature. Section 3 discusses and analyzes the results obtained from the test described in section 2. Section 4 conducts a test of hypothesis to determine the effectiveness of the proposed algorithm over the existing algorithm in literature. Section 5 constructs a 95% confidence interval for the percentage improvements of the proposed algorithm over the existing one. Section 6 summarizes and concludes the results obtained in the paper.

Notation
The following notation will be used throughout this paper. sj,k : Setup time of job j on machine k tj,k : Processing time of job j on machine k Us j,k : Upper bound on the setup time of job j on machine k Lsj,k : Lower bound on the setup time of job j on machine k n : number of jobs ∆ : The range below the upper bound of a setup time where the lower bound is selected from. In particular, if Usj,k is randomly generated from within the range (1,100), then the lower bound Lsj,k would be generated from the range (Usj,k −∆,Usj,k), provided that Usj,k −∆ is greater than or equal to 1. Otherwise, Lsj,k is generated from the range (1,Usj,k).

An improved algorithm
Minimizing TCT for a two machine no-wait flowshop with uncertain setup times is known to be NP-Hard. Since there is an optimal solution for the case of a single machine, we will transform a two-machine problem into a single machine one. Given processing times and lower and upper bounds of setup times tj,k, Lsj,k, and Usj,k for k = 1,2, we define the processing times of a single machine problem, t1 = [tj,1 + tj,2 + 0.5(Lsj,1 + Usj,1) + 0.25(Lsj,2 + Usj,2) for i = 1,··· ,n].
We then apply the SPT to order the jobs based on the induced processing times t1.

Testing the improved algorithm
The algorithm is compared with the best algorithm in Allahverdi and Allahverdi (2020) for the n values 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 and ∆ values 20,25,30,35,40,45, where n denotes the number of jobs and ∆ determines how the lower bound of setup times is generated based on the upper bound, as explained in the notation section. For each n and ∆ combination, r = 100 replications are conducted and the average and standard deviations of those replications are taken. In particular, the following steps are taken to compare the two algorithms.
1. Select the number of jobs n, the values for ∆, and the number of replications r.

2.
For each n and ∆ combination, do the following: (a) For R = 1, do the following: i.
Randomly generate values for upper and lower bounds of setup times Usj,k and Lsj,k for i = 1,··· ,n and k = 1,2. iii.
Transform the 2-machine problem into a single machine problem by giving different weights to the processing times and lower and upper bounds of setup times. Denote the processing times of the new single machine by t1. iv.
Apply the SPT on t1 and denote the sequence obtained by st1. We compare the TCT of st1 to st2, the sequence obtained from the algorithm given in paper Allahverdi and Allahverdi (2020). v.
Generate setup times within the lower and upper bounds (a number of different distributions are considered while generating the setup times). vi.
Compute the TCT of st1 and st2 given setup and processing times.

vii.
Compute the error for the TCT's using the formula (c) Take the average and standard deviations of TCT's obtained from all the replications.

Computational results
For each combination of n and ∆, 100 replications are performed for any given distribution. So a total of 10 × 6 × 4 × 100 = 24,000 different cases are considered. The results are listed in Tables 1-4 for the uniform, positive linear, negative linear, and normal distributions, respectively. The first two columns in the table are the considered n and ∆ values. The third column is the average of the errors obtained in the replications using the proposed algorithm. The fourth column is the average of the errors using the algorithm from Allahverdi and Allahverdi (2020). The fifth is the standard deviation of the errors using the proposed algorithm. Similarly, the sixth column is the standard deviation of the errors using the algorithm from Allahverdi and Allahverdi (2020). Finally, the last column is the percentage improvement of the proposed algorithm over the existing one using the formula (x-y)/x where x is the value obtained from column 4 and y is the value obtained from column 3.

Fig. 1. Percentage Improvement vs n
As seen in the tables, the average and median of the percentage improvement is generally the same across different distributions with just a difference of 1%, as can be seen in Table 5. This confirms the effectiveness of the proposed algorithm over the existing one and indicates that it is not dependent on a particular distribution but rather works on all four of them. Furthermore, the percentage improvement seems to improve with greater n, which is advantageous, since it implies that the algorithm will likely work for even larger values of n, perhaps with a greater effectiveness.

Hypothesis testing
A hypothesis test is conducted for a difference of means to determine the degree of improvement obtained by the new algorithm. We want to check whether the average total completion time for the proposed algorithm is indeed lower than that of the existing one. From now on, the best existing algorithm in literature is denoted as old-algorithm and the proposed algorithm is denoted as new-algorithm. Let µ 0 be the population mean for the TCT of old-algorithm and µ 1 be that of new-algorithm. We define the null and alternative hypotheses as follows: Hence, if µ1 is considerably less than µ0, we reject the null hypothesis that new-algorithm gives us similar total completion times as old-algorithm. Otherwise, we fail to reject the null hypothesis. The level of significance is taken to be α = 0.01. Given a certain distribution, 100 replications are performed for every combination of n and ∆, so the sample size is large enough to use the Z-test. Since the significance level α was taken to be 0.01 and since P(Z ≤ 2.33) = 0.99, we reject the null hypothesis if the Z-score is greater than 2.33 and fail to reject it otherwise. The calculated Z-scores are listed in Table 6 for each n and ∆ combination for all considered distributions. As seen in the table, all the Z-scores for every combination of n and ∆ are much greater than 2.33, clearly rejecting the null hypothesis. Furthermore, the Z-scores seem to increase as n increases, which seems to indicate that this result is true for greater values of n as well.

Confidence interval
The following table lists the 95% confidence intervals with respect to the new algorithm for the four distributions: uniform, positive linear, negative linear, and normal. It is evident that the confidence intervals are narrow, which is advantageous as they indicate the accuracy in the calculated means.