Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning

Zhang, Zhicong; Zheng, Li; Weng, Michael X.

doi:10.1007/s00170-006-0662-8

Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning

ORIGINAL ARTICLE
Published: 30 August 2006

Volume 34, pages 968–980, (2007)
Cite this article

The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Zhicong Zhang¹,
Li Zheng¹ &
Michael X. Weng²

787 Accesses
49 Citations
Explore all metrics

Abstract

In this paper, we discuss a dynamic unrelated parallel machine scheduling problem with sequence-dependant setup times and machine–job qualification consideration. To apply the Q-Learning algorithm, we convert the scheduling problem into reinforcement learning problems by constructing a semi-Markov decision process (SMDP), including the definition of state representation, actions and the reward function. We use five heuristics, WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT, as actions and prove the equivalence of the reward function and the scheduling objective: minimisation of mean weighted tardiness. We carry out computational experiments to examine the performance of the Q-Learning algorithm and the heuristics. Experiment results show that Q-Learning always outperforms all heuristics remarkably. Averaged over all test problems, the Q-Learning algorithm achieved performance improvements over WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT by considerable amounts of 61.38%, 60.82%, 56.23%, 57.48% and 66.22%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling

Article 12 October 2022

Introduction to Reinforcement Learning

New algorithms for minimizing the weighted number of tardy jobs on a single machine

Article 12 April 2018

References

Liaw C-F, Lin Y-K, Cheng C-Y, Chen M (2003) Scheduling unrelated parallel machines to minimize total weighted tardiness. Comput Oper Res 30(12):1777–1789
Article MATH MathSciNet Google Scholar
Kim D-W, Na D-G, Chen FF (2003) Unrelated parallel machine scheduling with setup times and a total weighted tardiness objective. Robot Cim-Int Manuf 19(1–2):173–181
Article Google Scholar
Rachamadugu RV, Morton TE (1981) Myopic heuristics for the single machine weighted tardiness problem. Working paper #28-81-82, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, Pennsylvania
Volgenant A, Teerhuis E (1999) Improved heuristics for the n-job single-machine weighted tardiness problem. Comput Oper Res 26(1):35–44
Article MATH Google Scholar
Carroll DC (1965) Heuristic sequencing of jobs with single and multiple components. PhD thesis, Sloan School of Management, MIT, Cambridge, Massachusetts
Vepsalainen A, Morton TE (1987) Priority rules for job shops with weighted tardiness costs. Manag Sci 33(8):1035–1047
Google Scholar
Russell RS, Dar-El EM, Taylor BW (1987) A comparative analysis of the COVERT job sequencing rule using various shop performance measures. Int J Prod Res 25(10):1523–1540
Article Google Scholar
Baker KR, Bertrand JWM (1983) A dynamic priority rule for scheduling against due-dates. J Oper Manag 3(1):37–42
Article Google Scholar
Kanet JJ, Li X (2004) A weighted modified due date rule for sequencing to minimize weighted tardiness. J Scheduling 7(4):261–276
Article MathSciNet MATH Google Scholar
Lee YH, Bhaskaran K, Pinedo M (1997) A heuristic to minimize the total weighted tardiness with sequence-dependent setups. IIE Trans 29:45–52
Article Google Scholar
Eom D-H, Shin H-J, Kwun I-H, Shim J-K, Kim S-S (2002) Scheduling jobs on parallel machines with sequence-dependent family set-up times. Int J Adv Manuf Tech 19(12):926–932
Article Google Scholar
Kim S-S, Shin HJ, Eom D-H, Kim C-O (2003) A due date density-based categorising heuristic for parallel machines scheduling. Int J Adv Manuf Tech 22(9–10):753–760
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, Massachusetts
Google Scholar
Watkins CJCH (1989) Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
MATH Google Scholar
Jaakkola T, Jordan MI, Singh SP (1994) On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput 6(6):1185–1201
MATH Google Scholar
Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Mach Learn 16(3):185–202
MATH MathSciNet Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont, Massachusetts
MATH Google Scholar
Aydin ME, Öztemel E (2000) Dynamic job-shop scheduling using reinforcement learning agents. Robot Auton Syst 33(2–3):169–178
Article Google Scholar
Hong J, Prabhu VV (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl Intell 20(1):71–87
Article Google Scholar
Puterman ML (1994) Markov decision processes. Wiley, New York
MATH Google Scholar
Wang YC (2003) Application of reinforcement learning to multi-agent production scheduling. PhD. thesis, Mississippi State University, Mississippi
Riedmiller S, Riedmiller M (1999) A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, August 1999
Csáji BC, Kádár B, Monostori L (2003) Improving multi-agent based scheduling by neurodynamic programming. In: Proceedings of the 1st International Conference on Applications of Holonic and Multi-Agent Systems (HoloMAS 2003), Prague, Czech Republic, September 2003, pp 110–123
Wang YC, Usher JM (2005) Application of reinforcement learning for agent-based production scheduling. Eng Appl Artif Intell 18(1):73–82
Article Google Scholar
Csáji BC, Monostori L (2005) Stochastic reactive production scheduling by multi-agent based asynchronous approximate dynamic programming. Lect Notes Comput Sci 3690:388–397
Article Google Scholar
Pinedo M (2002) Scheduling: theory, algorithms, and systems, 2nd edn. Prentice-Hall, Englewood Cliffs, New Jersey
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, Tsinghua University, Room 326, Building 1, Beijing, 100084, People’s Republic of China
Zhicong Zhang & Li Zheng
Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, 33620, USA
Michael X. Weng

Authors

Zhicong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Michael X. Weng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhicong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Zheng, L. & Weng, M.X. Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning. Int J Adv Manuf Technol 34, 968–980 (2007). https://doi.org/10.1007/s00170-006-0662-8

Download citation

Received: 14 January 2006
Accepted: 11 May 2006
Published: 30 August 2006
Issue Date: October 2007
DOI: https://doi.org/10.1007/s00170-006-0662-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning

Abstract

Access this article

Similar content being viewed by others

Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling

Introduction to Reinforcement Learning

New algorithms for minimizing the weighted number of tardy jobs on a single machine

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning

Abstract

Access this article

Similar content being viewed by others

Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling

Introduction to Reinforcement Learning

New algorithms for minimizing the weighted number of tardy jobs on a single machine

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation