Skip to main content
Log in

DRL-dEWMA: a composite framework for run-to-run control in the semiconductor manufacturing process

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This study aims to develop a weight-adjustment scheme for a double exponentially weighted moving average (dEWMA) controller using deep reinforcement learning (DRL) techniques. Under the run-to-run control framework, the weight adjustment of the dEWMA is formulated as a Markovian decision process in which the candidate weights are viewed as the DRL agent’s decision action. Accordingly, a composite control strategy integrating DRL and dEWMA is proposed. Specifically, a well-trained DRL agent serves as an auxiliary controller that produces the preferred weights of the dEWMA. The optimized dEWMA serves as a master controller to provide a suitable recipe for the manufacturing process. Furthermore, two classical deterministic policy-gradient algorithms are leveraged for automatic weight tuning. The simulation results show that the proposed scheme outperforms existing RtR controllers in terms of disturbance rejection and target tracking. The proposed scheme has significant practical application prospects in smart semiconductor manufacturing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Algorithm 3

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Espadinha-Cruz P, Godina R, Rodrigues EM (2021) A review of data mining applications in semiconductor manufacturing. Processes 9(2):305

    Article  Google Scholar 

  2. Moyne J, Del Castillo E, Hurwitz AM (2018) Run-to-run control in semiconductor manufacturing, CRC press

  3. Liu K, Chen Y, Zhang T, Tian S, Zhang X (2018) A survey of run-to-run control for batch processes. ISA Trans 83:107–125

    Article  Google Scholar 

  4. Wang HY, Pan TH, Wong DS-H, Tan F (2019) An extended state observer-based run to run control for semiconductor manufacturing processes. IEEE Trans Semicond Manuf 32(2):154–162

    Article  Google Scholar 

  5. Khakifirooz M, Chien C-F, Fathi M, Pardalos PM (2019) Minimax optimization for recipe management in high-mixed semiconductor lithography process. IEEE Trans Industr Inf 16(8):4975–4985

    Article  Google Scholar 

  6. Fan S-KS, Jen C-H, Hsu C-Y, Liao Y-L (2020) A new double exponentially weighted moving average run-to-run control using a disturbance-accumulating strategy for mixed-product mode. IEEE Trans Autom Sci Eng 18(4):1846–1860

    Article  Google Scholar 

  7. Zhong Z, Wang A, Kim H, Paynabar K, Shi J (2021) Adaptive cautious regularized run-to-run controller for lithography process. IEEE Trans Semicond Manuf 34(3):387–397

    Article  Google Scholar 

  8. Tom M, Yun S, Wang H, Ou F, Orkoulas G, Christofides PD (2022) Machine learning-based run-to-run control of a spatial thermal atomic layer etching reactor. Comput Chem Eng 168:108044

    Article  Google Scholar 

  9. Chen L, Chu L, Ge C, Zhang Y (2023) A general tool-based multi-product model for high-mixed production in semiconductor manufacturing. Int J Product Res 61(23):8062–8079. https://doi.org/10.1080/00207543.2022.2164088

    Article  Google Scholar 

  10. Gong Q, Yang G, Pan C, Chen Y, Lee M (2017) Performance analysis of double EWMA controller under dynamic models with drift. IEEE Trans Components Pack Manuf Technol 7(5):806–814

    Article  Google Scholar 

  11. Su C-T, Hsu C-C (2004) A time-varying weights tuning method of the double EWMA controller. Omega 32(6):473–480

    Article  Google Scholar 

  12. Wu W, Maa C-Y (2011) Double EWMA controller using neural network-based tuning algorithm for mimo non-squared systems. J Process Control 21(4):564–572

    Article  Google Scholar 

  13. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38

    Article  Google Scholar 

  14. Ziya T, Karakose M (2020) Comparative study for deep reinforcement learning with cnn, rnn, and lstm in autonomous navigation. In: 2020 International conference on data analytics for business and industry: way towards a sustainable economy (ICDABI), IEEE, pp. 1–5

  15. Arena P, Fortuna L, Frasca M, Patané L (2009) Learning anticipation via spiking networks: application to navigation control. IEEE Trans Neural Networks 20(2):202–216

    Article  Google Scholar 

  16. Tang G, Kumar N, Yoo R, Michmizos K (2021) Deep reinforcement learning with population-coded spiking neural network for continuous control. In: Conference on robot learning, PMLR, pp. 2016–2029

  17. Song Z, Yang J, Mei X, Tao T, Xu M (2021) Deep reinforcement learning for permanent magnet synchronous motor speed control systems. Neural Comput Appl 33:5409–5418

    Article  Google Scholar 

  18. Song D, Gan W, Yao P, Zang W, Qu X (2022) Surface path tracking method of autonomous surface underwater vehicle based on deep reinforcement learning. Neural Comput Appl 35:1–21

    Google Scholar 

  19. Spielberg S, Tulsyan A, Lawrence NP, Loewen PD, Bhushan Gopaluni R (2019) Toward self-driving processes: a deep reinforcement learning approach to control. AIChE Journal 65(10):e16689

    Article  Google Scholar 

  20. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, PMLR, pp. 1587–1596

  21. Nian R, Liu J, Huang B (2020) A review on reinforcement learning: introduction and applications in industrial process control. Comput Chem Eng 139:106886

    Article  Google Scholar 

  22. Dutta D, Upreti SR (2022) A survey and comparative evaluation of actor-critic methods in process control. Can J Chem Eng 100(9):2028–2056

    Article  Google Scholar 

  23. Panzer M, Bender B (2022) Deep reinforcement learning in production systems: a systematic literature review. Int J Prod Res 60(13):4316–4341

    Article  Google Scholar 

  24. Deng J, Sierla S, Sun J, Vyatkin V (2022) Reinforcement learning for industrial process control: a case study in flatness control in steel industry. Comput Ind 143:103748

    Article  Google Scholar 

  25. Li C, Zheng P, Yin Y, Wang B, Wang L (2023) Deep reinforcement learning in smart manufacturing: a review and prospects. CIRP J Manuf Sci Technol 40:75–101

    Article  Google Scholar 

  26. Gheisarnejad M, Khooban MH (2020) An intelligent non-integer PID controller-based deep reinforcement learning: Implementation and experimental results. IEEE Trans Industr Electron 68(4):3609–3618

    Article  Google Scholar 

  27. Lawrence NP, Forbes MG, Loewen PD, McClement DG, Backström JU, Gopaluni RB (2022) Deep reinforcement learning with shallow controllers: an experimental application to PID tuning. Control Eng Pract 121:105046

    Article  Google Scholar 

  28. Shalaby R, El-Hossainy M, Abo-Zalam B, Mahmoud TA (2023) Optimal fractional-order PID controller based on fractional-order actor-critic algorithm. Neural Comput Appl 35(3):2347–2380

    Article  Google Scholar 

  29. Qin H, Tan P, Chen Z, Sun M, Sun Q (2022) Deep reinforcement learning based active disturbance rejection control for ship course control. Neurocomputing 484:99–108

    Article  Google Scholar 

  30. Zheng Y, Tao J, Sun Q, Sun H, Chen Z, Sun M, Xie G (2022) Soft actor-critic based active disturbance rejection path following control for unmanned surface vessel under wind and wave disturbances. Ocean Eng 247:110631

    Article  Google Scholar 

  31. Yu J, Guo P (2020) Run-to-run control of chemical mechanical polishing process based on deep reinforcement learning. IEEE Trans Semicond Manuf 33(3):454–465

    Article  Google Scholar 

  32. Ma Z, Pan T (2021) A quota-ddpg controller for run-to-run control. In: China automation congress (CAC). IEEE 2021: 2515–2519

  33. Ma Z, Pan T (2023) Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes. Neural Comput Appl 35:19337–19350. https://doi.org/10.1007/s00521-023-08760-1

    Article  Google Scholar 

  34. Li Y, Du J, Jiang W (2021) Reinforcement learning for process control with application in semiconductor manufacturing. arXiv preprint arXiv:2110.11572

  35. Ma Z, Pan T (2022) Adaptive weight tuning of EWMA controller via model-free deep reinforcement learning. IEEE Trans Semicond Manuf 36(1):91–99

    Article  Google Scholar 

  36. Tseng S-T, Chen P-Y (2017) A generalized quasi-MMSE controller for run-to-run dynamic models. Technometrics 59(3):381–390

    Article  MathSciNet  Google Scholar 

  37. Castillo ED (1999) Long run and transient analysis of a double EWMA feedback controller. IIE Trans 31(12):1157–1169

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (No. 62273002, No. 61873113).

Author information

Authors and Affiliations

Authors

Contributions

Zhu Ma was contributed to conceptualization, writing—original draft, software. Tianhong Pan was contributed to supervision, validation, resources, writing—review and editing.

Corresponding author

Correspondence to Tianhong Pan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

DDPG is a policy-based and augmented actor-critic algorithm that consolidates the advantages of the policy gradient and deep Q-network (DQN) algorithms. In contrast to TD3, only the critic network Q and target-critic network \(Q^{'}\) were used for the algorithm implementation.

Within minibatch N, the critic network was trained by minimizing the loss function \(L(\theta )\), denoted as

$$\begin{aligned} \begin{aligned} {L(\theta )} = \frac{1}{N}{\sum _{i}}(Y_i-Q(s_i,a_i\mid \theta ))^{2} \end{aligned} \end{aligned}$$
(32)

where \(Y_i=r_i(s_{i},a_{i})+\gamma {Q^{'}}(s_{i+1},a_{i+1}\mid {\theta ^{'}}))\).

In addition, the calculation of the target action can be expressed as \(a_{t+1}=\mu ^{'}(s_{t+1}\mid {\phi ^{'}})\).

When training DDPG-dEWMA, we adopted the same state, action, and reward as described in Sect. 3. The training procedure for DDPG-dEWMA is presented in Algorithm 3. Compared with the TD3-dEWMA scheme, there are differences only in the DRL agent, which are not described in detail here.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Z., Pan, T. DRL-dEWMA: a composite framework for run-to-run control in the semiconductor manufacturing process. Neural Comput & Applic 36, 1429–1447 (2024). https://doi.org/10.1007/s00521-023-09112-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09112-9

Keywords

Navigation