Abstract
In this paper, we discuss a dynamic unrelated parallel machine scheduling problem with sequence-dependant setup times and machine–job qualification consideration. To apply the Q-Learning algorithm, we convert the scheduling problem into reinforcement learning problems by constructing a semi-Markov decision process (SMDP), including the definition of state representation, actions and the reward function. We use five heuristics, WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT, as actions and prove the equivalence of the reward function and the scheduling objective: minimisation of mean weighted tardiness. We carry out computational experiments to examine the performance of the Q-Learning algorithm and the heuristics. Experiment results show that Q-Learning always outperforms all heuristics remarkably. Averaged over all test problems, the Q-Learning algorithm achieved performance improvements over WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT by considerable amounts of 61.38%, 60.82%, 56.23%, 57.48% and 66.22%, respectively.
Similar content being viewed by others
References
Liaw C-F, Lin Y-K, Cheng C-Y, Chen M (2003) Scheduling unrelated parallel machines to minimize total weighted tardiness. Comput Oper Res 30(12):1777–1789
Kim D-W, Na D-G, Chen FF (2003) Unrelated parallel machine scheduling with setup times and a total weighted tardiness objective. Robot Cim-Int Manuf 19(1–2):173–181
Rachamadugu RV, Morton TE (1981) Myopic heuristics for the single machine weighted tardiness problem. Working paper #28-81-82, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, Pennsylvania
Volgenant A, Teerhuis E (1999) Improved heuristics for the n-job single-machine weighted tardiness problem. Comput Oper Res 26(1):35–44
Carroll DC (1965) Heuristic sequencing of jobs with single and multiple components. PhD thesis, Sloan School of Management, MIT, Cambridge, Massachusetts
Vepsalainen A, Morton TE (1987) Priority rules for job shops with weighted tardiness costs. Manag Sci 33(8):1035–1047
Russell RS, Dar-El EM, Taylor BW (1987) A comparative analysis of the COVERT job sequencing rule using various shop performance measures. Int J Prod Res 25(10):1523–1540
Baker KR, Bertrand JWM (1983) A dynamic priority rule for scheduling against due-dates. J Oper Manag 3(1):37–42
Kanet JJ, Li X (2004) A weighted modified due date rule for sequencing to minimize weighted tardiness. J Scheduling 7(4):261–276
Lee YH, Bhaskaran K, Pinedo M (1997) A heuristic to minimize the total weighted tardiness with sequence-dependent setups. IIE Trans 29:45–52
Eom D-H, Shin H-J, Kwun I-H, Shim J-K, Kim S-S (2002) Scheduling jobs on parallel machines with sequence-dependent family set-up times. Int J Adv Manuf Tech 19(12):926–932
Kim S-S, Shin HJ, Eom D-H, Kim C-O (2003) A due date density-based categorising heuristic for parallel machines scheduling. Int J Adv Manuf Tech 22(9–10):753–760
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, Massachusetts
Watkins CJCH (1989) Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Jaakkola T, Jordan MI, Singh SP (1994) On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput 6(6):1185–1201
Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Mach Learn 16(3):185–202
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont, Massachusetts
Aydin ME, Öztemel E (2000) Dynamic job-shop scheduling using reinforcement learning agents. Robot Auton Syst 33(2–3):169–178
Hong J, Prabhu VV (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl Intell 20(1):71–87
Puterman ML (1994) Markov decision processes. Wiley, New York
Wang YC (2003) Application of reinforcement learning to multi-agent production scheduling. PhD. thesis, Mississippi State University, Mississippi
Riedmiller S, Riedmiller M (1999) A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, August 1999
Csáji BC, Kádár B, Monostori L (2003) Improving multi-agent based scheduling by neurodynamic programming. In: Proceedings of the 1st International Conference on Applications of Holonic and Multi-Agent Systems (HoloMAS 2003), Prague, Czech Republic, September 2003, pp 110–123
Wang YC, Usher JM (2005) Application of reinforcement learning for agent-based production scheduling. Eng Appl Artif Intell 18(1):73–82
Csáji BC, Monostori L (2005) Stochastic reactive production scheduling by multi-agent based asynchronous approximate dynamic programming. Lect Notes Comput Sci 3690:388–397
Pinedo M (2002) Scheduling: theory, algorithms, and systems, 2nd edn. Prentice-Hall, Englewood Cliffs, New Jersey
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, Z., Zheng, L. & Weng, M.X. Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning. Int J Adv Manuf Technol 34, 968–980 (2007). https://doi.org/10.1007/s00170-006-0662-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00170-006-0662-8