Transitive Reduction Approach to Large-Scale Parallel Machine Rescheduling Problem With Controllable Processing Times, Precedence Constraints and Random Machine Breakdown

This paper studied a novel parallel machine rescheduling problem with controllable processing times under machine breakdown and precedence constraints. This problem is strongly $\mathcal {NP}$ -hard, and we modeled it as a mixed-integer problem (MIP). The primal problem is decomposed into the discrete subproblem and the continuous one. We treat the discrete with the dispatching rule and analyze the continuous in the terms of mathematical programming. Pros and cons from the analysis lead us to implement the commercial solver. The proposed method is capable of efficiency and nonzero initial state. We introduce transitive reduction to cull the redundant constraints out of its directed acyclic graph (DAG) representation. Transitive reduction extends the efficiency from the dispatching rule to the solver. The proposed method can do the predictive scheduling and pick up the partial solution left by the machine breakdown in the reactive session. As a result, the complete method solves the rescheduling problem with efficiency, and allows for large cases. At last, the computational results showed that this technique significantly brought down the time and RAM consumption in using the solver. This technique allows the scheduler to solve big instances with computational economy and efficiency.


I. INTRODUCTION
The processing times are fixed in the traditional machine scheduling problems. But in the problems with controllable processing times, one can reduce the processing times by allocating extra production resources like electricity, manpower, money, etc. It is very sensible to take full advantage of limited resource to improve performance with computational efficiency and economy.
The associate editor coordinating the review of this manuscript and approving it for publication was Feiqi Deng .
Despite some special form [1] and the discrete ones [2], [3], [4], the resource consumption function describes the relationship between processing time and resource linearly and nonlinearly respectively as cpt ∈ cpt lin , cpt conv [5], [6], [7], [8], [9], [10], [11]. In the linear case, processing time is linearly decreasing with respect to the amount of resource allocated into it as p job i u job i = p job i − α job i u job i , 0 ≤ p job i ≤ p job i u job i ≤ p job i , i = 1, 2, . . . , n, while in the convexly nonlinear one, with the consideration of the law of diminishing marginal returns, which states that the productivity increases with the amount of resource at VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ a decreasing rate, expressed as p job i u job i = ω job i u job i k , 0 < p job i ≤ p job i u job i ≤ p job i , i = 1, 2, . . . , n, k > 0. Unexpected disruptions of many types may occur. Such uncertainties can cause disturbances and infeasibility in the existing scheduling plan. Then rescheduling responds to the disruption by adjusting the previous planned schedule. In the case of controllable processing times, rescheduling replans the processing times besides the assignment and sequencing. Given machine breakdown, the scheduler has to compress processing times of the disrupted jobs and the unscheduled once again.
Surveyed in details in [12], researchers summarized the rescheduling methods into three types: complete reactive scheduling, proactive scheduling, and predictive-reactive scheduling. Complete reactive scheduling generates no firm schedule in advance and makes decision in real time. Robust proactive scheduling generates predictive schedules with the consideration of the unexpected events. Predictive-reactive method generates a firm schedule beforehand and recovers from uncertain disruption. Common disruption types are: changes in release dates, arrival new order cancellations, changes in order priority, processing delays, and machine breakdowns.
Machine breakdown disrupts the existing schedule in various situations. Researchers have responded with different methods to repair the disrupted partial scheduling for different objectives. Reference [13] studied a scheduling problem under machine breakdown in a continuous process industry. Reference [14] dealt with a flexible job shop scheduling problem under random machine breakdown. Reference [15] used a hybrid genetic algorithm to solve a robust and stable job shop scheduling problem. Reference [16] studied a production rescheduling problem for machine breakdown at a job shop. Reference [17] tackled a multi-objective flexible job shop scheduling problem under random machine breakdown with evolutionary algorithm. Reference [18] considered a proactive scheduling in response to stochastic machine breakdown under deteriorating production environments in the presence of machine breakdown and deteriorating effect.
Precedence constraints make the Lagrangian-based method, [27], [28], [29], unavailable in the scheduling theory. Because the explicit expression of the objective in terms of processing times solely becomes far-fetched with gaps or idle times in the scheduling process on parallel machine as in Fig. 1. Against the backdrop of the bilevel programming [30], [31], [32], this leads to the suboptimality of the discrete and continuous subproblems from both levels after decomposition. The optimality structure of a partial solution to one subproblem may not be the part of the optimality structure of the complete solution to the primal problem [11]. Thus, the previously attained partial solution to the lowerlevel subproblem may not be retained in the higher-level subproblem. So, it is not wise to spend heavy computational power to achieve very near optimality in the lower level. Just like for Pm tree, cpt lin , d u, [33] searched for the explicit expression of the critical path by enumeration with the labeling construction process.
For the discrete subproblem, the dispatching rule assigns the jobs in polynomial time [34]. The continuous counterpart has two cases to consider and we'd like to consider them in the terms of mathematical programming first. For the linear case, the subproblem is a linear programming problem (LP). LP is well known that polynomially solvable in the total length of the input [35]. However, it is still an open question that whether the LP admits the algorithm in strongly polynomial complexity [36], though recently [37], [38], [39] made some efforts. Dantzig's simplex method is probably still the most commonly used method for solving large LP in practice. But with Dantzig's original pivot rule, [40] showed that the algorithm may require an exponential number of iterations in the worst case surveyed by [41]. As a linear network flow problem [42], the simplex method can solve the continuous subproblem in a survey by [43]. For the convexly nonlinear case, the resource allocation subproblem is to minimize a strictly convex cost under linear constraints [44], [45], [46]. In a survey by [47], the dual ascent method is the common way to the continuous nonlinear resource allocation problem. This method converges at least linearly [48]. The nonconvex continuous version of the problem is N P-hard [49], even for the quadratic case and so are many simplified versions of the nonseparable problem even if convex. Reference [50] gave more detailed complexity analysis of different versions of the network optimization problem.
In industrial practice, commercial solvers bring ease in solving the continuous subproblem. And they guarantee efficiency and quality with the standard metrics for both the linear and nonlinear cases. Thus, the dual-simplex method and the interior point method packaged by Matlab linprog and fmincon solvers are ready to solve the continuous subproblem.
The complexity of the problem model seriously affects the performance of the solver. [51] proposed an MILP model of higher efficiency for an energy-aware flexible job shop scheduling problem for CPLEX. The redundancy exists in the given precedence constraints in the scheduling problems and [52], [53] removed them. Reference [54] removed the redundancy in the precedence constraints. Reference [55] used transitive reduction to simplify DAG for Pm prec, p j = 1 C max . The precedence-constrained knapsack problem went through transitive reduction to solve large cases before sent for MIP solver [56].
By the three-field notation [57], this paper studied a novel Pm |prec, cpt, d, Mchbrk| u, P1, extended from the already strongly N P-hard Pm tree, cpt lin , d u [33]. It extends the tree-formed precedence constraints to the general form, and the linear case alone to both the linear and nonlinear ones. The proposed method reaches out of the coverage of Lagrangian method for a general class of problems in industrial practice. Thanks to the low complexity of the dispatching rule and the efficiency and almost optimality of the solver with transitive reduction, we can solve this problem within reasonable time and limited RAM for large cases.
We organize the remaining paper as follows: Section III presents the formal description of the primal problem and its decomposition; Section IV addresses on the simulation of the machine breakdown; Section V presents transitive reduction, the resulted properties when applied and the proposed algorithm; we send a small-sized case of 40 jobs and 4 machines to take a tour in the rescheduling algorithm to elaborate its complex structure in Section VI; in Section VII, we run computational experiments to see the temporal performance improvement after transitive reduction even in very large-sized instances; at last, we make a brief conclusion as well as an outlook for the future research in Section VIII. B job i beginning time of job i C job i completion time of job i C max makespan, the maximum finishing time C cpr max makespan of the fully compressed jobs with the assignment and sequencing after the first iteration C rlx max makespan of the fully relaxed jobs with the assignment and sequencing after the first iteration γ total resource consumption as γ = n i=1 u i γ a γ in the a th iteration γ b best γ so far ϵ value for the stopping criterion of the minimum improvement of γ t brk machine breakdown time instant t rpr machine reparation time duration VOLUME 11, 2023 S ini initial state for the scheduling algorithm as S ini = , P, t brk , t rpr , V sch , V unsch S sch ini initial state of the scheduled jobs for the reactive session as S sch ini = sch , P sch , t brk , t rpr , V sch , V unsch S 0 ini zero initial state for the predictive session as S 0 ini = 0, P, 0, 0, 0, V assignment and sequencing of the jobs on parallel machine, i.e., the discrete partial solution as = π M 1 , π M 2 , . . . , π M m sch of the scheduled jobs P processing times of the jobs, i.e., the continuous partial solution as P = (p 1 , p 2 , . . . , p n ) P sch P of the scheduled jobs Mchbrk (= 1) state for whether a breakdown occurs

II. NOTATION
There is a set of n jobs as V = job 1 , job 2 , . . . , job n , the processing time as p job i for each job i , job i ∈ V , and the resource as u job i for job i to reduce p job i . The precedence constraints imposed on V out of the technical requirement are represented by a DAG, G t prec = V , A t , where job i , job j ∈ A t such that job j is required to wait until the completion of job i . The problem is to schedule V to a set of m parallel machines, M = {M 1 , M 2 , . . . , M m }, in order to minimize the sum of the resource consumption, n i=1 u job i , allocated to V before a given common due-date, d, without preemption: for the linear case: for the convexly nonlinear case: In Problem P1, Objective (1) describes the sum of resource consumption; Constraint (2) decides the machine and the position to place job i ; Constraint (3) guarantees that each individual job can only be processed once; Constraint (4) assigns each one job to only one machine; Constraint (5) ensures that no jobs are left unassigned; Constraint (6) demands that each job cannot be completed earlier than the time required to process it on its assigned machines; Constraint (7) ensures that the processing of the consecutive operation cannot start before the completion of the operation processed previously on the same machine. That is to say, no overlap between the jobs' processing times is allowed. The machine can process only one job at one time; Constraint (8) requires that job j should start later than the time that job i finishes, if job i is preceded by job j out of G t prec ; Constraint (9) makes sure that the makespan is large enough for all the completion times for all the jobs; Constraint (10) ensures that all jobs should complete their processing before the due-date, d; Constraint (11) bounds the processing times of all jobs within a given positive closed interval; Constraint (12) guarantees the listed variables should not be negative; Constraint (13) gives the linear resource consumption function for each individual job; Constraint (14) gives the convexly nonlinear case.

B. DECOMPOSITION OF PRIMAL PROBLEM
We decompose the primal problem P1 into the following two subproblems: one, P1-1, for the assignment and sequencing of each job, and the other, P1-2, for the resource allocation of each job: Given the processing time of each job, P, the discrete subproblem is formulated as: Given the assignment and sequencing of each job, , the continuous subproblem is formulated as: for the linear case: for the convexly nonlinear case: Then the dispatching rule is going to solve P1-1, and P1-2 won't be solved by the dual-simplex method and the interior point method for the linear and the nonlinear cases respectively until the transitive reduction step is done.

IV. MACHINE BREAKDOWN
At first, let us detail what happen to the parallel machines when the breakdown occurs. Once the machine breaks down at time t brk , the reactive scheduling picks up the partial schedule and finishes it after the machine reparation for time t rpr . t brk and t rpr describe the unavailability of a breakdown. The breakdown and reparation times follow the exponential distribution. Two parameters, MTBF and MTTR, are related to machine breakdown. They characterize the generation of random machine breakdown time instant and reparation duration respectively from the exponential distribution. Ag = MTTR (MTTR+MTBF) denotes the breakdown level of parallel machine, which is the percentage of time the machine has failures. The simulation of machine breakdown follows the steps detailed in [58].
Then we would like to see how the scheduler reacts to the breakdown when dealing with the jobs. The machine may break down at the time when several jobs are still processing. When the machine breaks down, the processing jobs are left unfinished. By the breakdown time, one can have three groups of jobs: the scheduled, V sch , the disrupted, V ds , and the unscheduled, V unsch . The scheduled jobs are the ones that have started their scheduling and have finished it. The disrupted are the ones that have started their scheduling but not yet finished. And the unscheduled are the ones that have not started their scheduling at all. In this paper, the disrupted job should be rescheduled once more with the unscheduled jobs when the breakdown machine resumes processing. A), describes precedence constraints. In the DAG-based representation, G DAG prec , the vertex, V , represents the jobs, and A the immediate precedence constraint between each two jobs. It is acyclic to avoid the absurdity that a job has to be processed both before and after the other job, which is, if put formally, a cycle in the graph. The precedence constraint job i , job j ∈ A is redundant if there exists an alternative path from job i to job j , other than the direct job i , job j .

V. TRANSITIVE REDUCTION ON DAG REPRESENTATION AND PROPOSED ALGORITHM
In a candidate solution, one job immediately precedes the other by the assignment and sequencing, and the technical requirement. The immediate precedence relationship of the assignment and sequencing, x M j k job i , might intercept the ones of the technical requirement, G t prec . Thus, it is necessary to remove the redundancy caused by the interception.

B. TRANSITIVE REDUCTION
Transitive reduction reduces the number of the edges of a directed graph as few as possible while keeping the same reachability [54], [59]. The transitive reduction of a finite DAG is unique and is a subgraph of the given graph. And for a finite DAG, the minimum equivalent graph is the same as the transitive reduction and can be constructed in polynomial time [60]. Therefore, the reduced G r prec is a subgraph of G c prec . transreduction in Matlab does the job.
The following figure can better illustrate transitive reduction of the redundancy. In Fig. 2 The proof can be trivially done. Claim 2 sheds some light on the reduction. And it tells what is like in the cases of the single machine problem and the constraint-free parallel machine problem. In the single machine case, m = 1, or the constraint-free case, Such cases reduce the complex precedence constraints to just the sequencing and assignment. Thus, Claim 2 explains that the precedence constraints of the technical requirement become meaningless for the resource allocation.

C. PROPOSED ALGORITHM
Here we present the proposed algorithm and formulate a flowchart of it in Fig. 3.

VI. DEMO FOR PROPOSED ALGORITHM A. LINEAR REPRESENTATION
The nested and entangled loops make it difficult for the reader to decipher the structure of the algorithm. To unravel this complication, besides the flowchart and the L A T E X algorithm environment, we linearize the presentation of the proposed algorithm as follows:

B. FIGURE EXPLANATION
To better illustrate the scheduling process. We generate an instance of 40 jobs, and 4 machines. We send the instance to the proposed method and let it go through the steps described by the flowchart in Fig. 3. During the scheduling process, we take snapshots of the schedule and the precedence constraints at some key steps. These snapshots allow us to gain some insightful knowledge on how the operation at each key step updates the schedule to demonstrate the mechanism of the proposed algorithm. 1 → 2 → 3 → loop1 as shown in the equation at the bottom of the next page.  For brevity, we omit the repeated steps in algorithmic loops. The following key steps are shown to tell the differences that the key step has made. We mark the selected key steps with an asterisk.
1-3 are omitted; 4 of loop1 a : one can see the empty schedule in Fig. 4a, and the precedence constraints only from the technical requirement in Fig. 4b; 5 of loop1 a : one can see the the initial schedule with the processing times at their lower bounds in Fig. 4c, and the combined precedence constraints in Fig. 4d; 6-7 of loop1 a : one can see the changes in the updated processing times in Fig. 4e, and the reduced combined precedence constraints in red from the unreduced in both blue and red altogether in Fig. 4f after the transitive reduction; 8 of loop1 a -4 of loop1 b are omitted; 5 of loop1 b : one can see changes in the updated schedule with the processing times from loop1 b in Fig. 4g, and the updated combined precedence constraints in Fig. 4h; 9 of loop2 b -9 of loop1 c are omitted; 10(y): End of the predictive session, the schedule in Fig. 4i, and the precedence constraints in Fig. 4j; 11 of loop2 a in loop3: one can see the nonempty partial schedule as the initial state after the machine breakdown in Fig. 4k, and the precedence constraints only from the technical requirement in Fig. 4l; 5 of loop2 a in loop3: one can see the the initial schedule with the processing times at their lower bounds in Fig. 4m, and the combined precedence constraints in Fig. 4n; 6-7 of loop2 a in loop3: one can see the changes in the updated processing times in Fig. 4o, and the reduced combined precedence constraints in red from the unreduced in both blue and red altogether in Fig. 4p after the transitive reduction; 8 of loop2 a in loop3 -10(n) are omitted; 12 in loop3: The end of the reactive session and the end of all the scheduling sessions. The final schedule is presented in Fig. 4q, and the final precedence constraints in Fig. 4r.

A. DATA GENERATION AND PARAMETER SETTINGS
Since there is no benchmark case to the best of our knowledge, we generate the dataset in the way shown in Table 1.   Groups of three sizes of test instances, small, n l S and n nl S , medium, n l M and n nl M , and large, n l L and n nl L , are generated for both linear and nonlinear cases respectively. The numbers of machines are set by m = n × r m . Proper numbers are selected to avoid trivial cases or cases like single machine problem when it is too large or small. P ⩾ P is guaranteed. C rlx max sets a basis to guarantee the due-date not too small to allow for a feasible solution. And r d C cpr max − C rlx max advances from that basis to guarantee the due-date not too big to make it meaningless for relaxation. The due-date is set by d is set by a random positive number, and d out,t job i ≤ r UB ×n. At last, k, ω and α are randomly generated. With the intentionally chosen parameters to avoid triviality, the machine breaks down around the middle of the predictive schedule to reschedule more jobs.
The ensuing rescheduling after the disruption partially doubles the time consumption. Rescheduling after a number of machine breakdowns takes tremendous time. In the large case of the test, which takes hours and days, large time consumption makes little sense to show the time consumption reduction after transitive reduction with multiple breakdowns. Thus, this cumbersome task leads [15] to aggregate all the breakdowns into one to evaluate the schedule stability. In our test, we select the value of MTBF to let the breakdown take place around the middle of the predictive schedule. MTTR is not too small to be trivial. The breakdown machine is randomly selected among the busy machines in the predictive schedule at the breakdown moment. For the timesaving reasons, the machines break down for only once.

B. PERFORMANCE METRICS
The following improvement ratio measures how much the temporal improvement is when with transitive reduction. ir_t = time_ntr − time_tr time_ntr × 100%.

C. COMPUTATIONAL RESULTS AND ANALYSIS
The continuous subproblem instances were solved with dualsimplex method for the linear case and interior point method for the nonlinear case. These two methods are packaged by linprog and fmincon solver respectively. All the continuous subproblem instances are solved to the global optimum. Reference [61] reported that the CPLEX solver was unable to solve instances with the number of jobs greater than 200 due to RAM capacity limitation. However, thanks to transitive reduction, the reduced consumption of RAM allowed for large instances. Table 2 shows the computational results for the linear case for the small, medium and large instances. The columns, obj_ntr and time_ntr, show the objective value and time consumption without transitive reduction, while the columns, obj_tr and time_tr, show the ones with. When the algorithm terminated, the objective values of the schedule were almost the same with or without transitive reduction. But only the temporal improvements differed. For the instances over 4000, the RAM consumption exceeds the capacity of this computer. Thus, the results are no longer available. Column ir_t shows that the improvement is indeed significant for all sizes of the test instances.

2) NONLINEAR CASE
To follow suit, Table 3 similarly presents the computational results for the nonlinear case of the scheduling problem. We can see significant reduction in time consumption.

D. COMPARISON WITH TABU SEARCH
We proposed a novel method to a novel problem is this paper. That is to say, there is no same case for comparison in the existing literature as far as we know. If we're forced to come up with a similar version in the existing literature compared to the rescheduling problem in this paper. Reference [33] employed the tabu search and the network flow method to the continuous and the discrete subproblems respectively for just the linear case. They obtained the following results in Table 4.
As a result, we outperformed hugely with a less powerful CPU, coded in a less efficient language for far larger and far more complex cases with the random machine breakdown in the following selected comparison. Thus, we're convinced that this huge margin in performance would reappear in the would-be comparison with the rescheduling problem in this paper.
And we shall have the following comparison: 1. Platform: (C++, i7-2600 @3. In short, given the similar amount of time from the existing results, the proposed method can process far more jobs. And given the similar number of jobs from the existing results, the proposed method can finish within far less time.

VIII. CONCLUSION AND FUTURE RESEARCH
This paper studied a novel parallel machine rescheduling problem subject to precedence constraints with controllable processing times under random machine breakdown. The proposed method handled two conditions in the rescheduling problem. We tackled the nonzero initial state left by the breakdown and the redundant precedence constraints with transitive reduction. The computational experiments showed significant reduction in time and RAM consumption. And this feature allows the scheduler to solve small, medium and even large instances with economy and efficiency.
For future research, we are interested in the problems for nonconvex objectives, such as JIT, under these settings. What role do precedence constraints play? How the suboptimality from both subproblems in both levels affects the performance? We need to know more in the algorithmic design and evaluation.