DRL-dEWMA: a composite framework for run-to-run control in the semiconductor manufacturing process

Ma, Zhu; Pan, Tianhong

doi:10.1007/s00521-023-09112-9

DRL-dEWMA: a composite framework for run-to-run control in the semiconductor manufacturing process

Original Article
Published: 11 November 2023

Volume 36, pages 1429–1447, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

183 Accesses
1 Altmetric
Explore all metrics

Abstract

This study aims to develop a weight-adjustment scheme for a double exponentially weighted moving average (dEWMA) controller using deep reinforcement learning (DRL) techniques. Under the run-to-run control framework, the weight adjustment of the dEWMA is formulated as a Markovian decision process in which the candidate weights are viewed as the DRL agent’s decision action. Accordingly, a composite control strategy integrating DRL and dEWMA is proposed. Specifically, a well-trained DRL agent serves as an auxiliary controller that produces the preferred weights of the dEWMA. The optimized dEWMA serves as a master controller to provide a suitable recipe for the manufacturing process. Furthermore, two classical deterministic policy-gradient algorithms are leveraged for automatic weight tuning. The simulation results show that the proposed scheme outperforms existing RtR controllers in terms of disturbance rejection and target tracking. The proposed scheme has significant practical application prospects in smart semiconductor manufacturing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes

Article 26 June 2023

Optimal data-driven control of manufacturing processes using reinforcement learning: an application to wire arc additive manufacturing

Article Open access 18 January 2024

Designing an adaptive and deep learning based control framework for modular production systems

Article Open access 20 November 2023

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Espadinha-Cruz P, Godina R, Rodrigues EM (2021) A review of data mining applications in semiconductor manufacturing. Processes 9(2):305
Article Google Scholar
Moyne J, Del Castillo E, Hurwitz AM (2018) Run-to-run control in semiconductor manufacturing, CRC press
Liu K, Chen Y, Zhang T, Tian S, Zhang X (2018) A survey of run-to-run control for batch processes. ISA Trans 83:107–125
Article Google Scholar
Wang HY, Pan TH, Wong DS-H, Tan F (2019) An extended state observer-based run to run control for semiconductor manufacturing processes. IEEE Trans Semicond Manuf 32(2):154–162
Article Google Scholar
Khakifirooz M, Chien C-F, Fathi M, Pardalos PM (2019) Minimax optimization for recipe management in high-mixed semiconductor lithography process. IEEE Trans Industr Inf 16(8):4975–4985
Article Google Scholar
Fan S-KS, Jen C-H, Hsu C-Y, Liao Y-L (2020) A new double exponentially weighted moving average run-to-run control using a disturbance-accumulating strategy for mixed-product mode. IEEE Trans Autom Sci Eng 18(4):1846–1860
Article Google Scholar
Zhong Z, Wang A, Kim H, Paynabar K, Shi J (2021) Adaptive cautious regularized run-to-run controller for lithography process. IEEE Trans Semicond Manuf 34(3):387–397
Article Google Scholar
Tom M, Yun S, Wang H, Ou F, Orkoulas G, Christofides PD (2022) Machine learning-based run-to-run control of a spatial thermal atomic layer etching reactor. Comput Chem Eng 168:108044
Article Google Scholar
Chen L, Chu L, Ge C, Zhang Y (2023) A general tool-based multi-product model for high-mixed production in semiconductor manufacturing. Int J Product Res 61(23):8062–8079. https://doi.org/10.1080/00207543.2022.2164088
Article Google Scholar
Gong Q, Yang G, Pan C, Chen Y, Lee M (2017) Performance analysis of double EWMA controller under dynamic models with drift. IEEE Trans Components Pack Manuf Technol 7(5):806–814
Article Google Scholar
Su C-T, Hsu C-C (2004) A time-varying weights tuning method of the double EWMA controller. Omega 32(6):473–480
Article Google Scholar
Wu W, Maa C-Y (2011) Double EWMA controller using neural network-based tuning algorithm for mimo non-squared systems. J Process Control 21(4):564–572
Article Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38
Article Google Scholar
Ziya T, Karakose M (2020) Comparative study for deep reinforcement learning with cnn, rnn, and lstm in autonomous navigation. In: 2020 International conference on data analytics for business and industry: way towards a sustainable economy (ICDABI), IEEE, pp. 1–5
Arena P, Fortuna L, Frasca M, Patané L (2009) Learning anticipation via spiking networks: application to navigation control. IEEE Trans Neural Networks 20(2):202–216
Article Google Scholar
Tang G, Kumar N, Yoo R, Michmizos K (2021) Deep reinforcement learning with population-coded spiking neural network for continuous control. In: Conference on robot learning, PMLR, pp. 2016–2029
Song Z, Yang J, Mei X, Tao T, Xu M (2021) Deep reinforcement learning for permanent magnet synchronous motor speed control systems. Neural Comput Appl 33:5409–5418
Article Google Scholar
Song D, Gan W, Yao P, Zang W, Qu X (2022) Surface path tracking method of autonomous surface underwater vehicle based on deep reinforcement learning. Neural Comput Appl 35:1–21
Google Scholar
Spielberg S, Tulsyan A, Lawrence NP, Loewen PD, Bhushan Gopaluni R (2019) Toward self-driving processes: a deep reinforcement learning approach to control. AIChE Journal 65(10):e16689
Article Google Scholar
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, PMLR, pp. 1587–1596
Nian R, Liu J, Huang B (2020) A review on reinforcement learning: introduction and applications in industrial process control. Comput Chem Eng 139:106886
Article Google Scholar
Dutta D, Upreti SR (2022) A survey and comparative evaluation of actor-critic methods in process control. Can J Chem Eng 100(9):2028–2056
Article Google Scholar
Panzer M, Bender B (2022) Deep reinforcement learning in production systems: a systematic literature review. Int J Prod Res 60(13):4316–4341
Article Google Scholar
Deng J, Sierla S, Sun J, Vyatkin V (2022) Reinforcement learning for industrial process control: a case study in flatness control in steel industry. Comput Ind 143:103748
Article Google Scholar
Li C, Zheng P, Yin Y, Wang B, Wang L (2023) Deep reinforcement learning in smart manufacturing: a review and prospects. CIRP J Manuf Sci Technol 40:75–101
Article Google Scholar
Gheisarnejad M, Khooban MH (2020) An intelligent non-integer PID controller-based deep reinforcement learning: Implementation and experimental results. IEEE Trans Industr Electron 68(4):3609–3618
Article Google Scholar
Lawrence NP, Forbes MG, Loewen PD, McClement DG, Backström JU, Gopaluni RB (2022) Deep reinforcement learning with shallow controllers: an experimental application to PID tuning. Control Eng Pract 121:105046
Article Google Scholar
Shalaby R, El-Hossainy M, Abo-Zalam B, Mahmoud TA (2023) Optimal fractional-order PID controller based on fractional-order actor-critic algorithm. Neural Comput Appl 35(3):2347–2380
Article Google Scholar
Qin H, Tan P, Chen Z, Sun M, Sun Q (2022) Deep reinforcement learning based active disturbance rejection control for ship course control. Neurocomputing 484:99–108
Article Google Scholar
Zheng Y, Tao J, Sun Q, Sun H, Chen Z, Sun M, Xie G (2022) Soft actor-critic based active disturbance rejection path following control for unmanned surface vessel under wind and wave disturbances. Ocean Eng 247:110631
Article Google Scholar
Yu J, Guo P (2020) Run-to-run control of chemical mechanical polishing process based on deep reinforcement learning. IEEE Trans Semicond Manuf 33(3):454–465
Article Google Scholar
Ma Z, Pan T (2021) A quota-ddpg controller for run-to-run control. In: China automation congress (CAC). IEEE 2021: 2515–2519
Ma Z, Pan T (2023) Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes. Neural Comput Appl 35:19337–19350. https://doi.org/10.1007/s00521-023-08760-1
Article Google Scholar
Li Y, Du J, Jiang W (2021) Reinforcement learning for process control with application in semiconductor manufacturing. arXiv preprint arXiv:2110.11572
Ma Z, Pan T (2022) Adaptive weight tuning of EWMA controller via model-free deep reinforcement learning. IEEE Trans Semicond Manuf 36(1):91–99
Article Google Scholar
Tseng S-T, Chen P-Y (2017) A generalized quasi-MMSE controller for run-to-run dynamic models. Technometrics 59(3):381–390
Article MathSciNet Google Scholar
Castillo ED (1999) Long run and transient analysis of a double EWMA feedback controller. IIE Trans 31(12):1157–1169
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (No. 62273002, No. 61873113).

Author information

Authors and Affiliations

School of Computer Science and Technology, School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
Zhu Ma
School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China
Tianhong Pan

Authors

Zhu Ma
View author publications
You can also search for this author in PubMed Google Scholar
Tianhong Pan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Zhu Ma was contributed to conceptualization, writing—original draft, software. Tianhong Pan was contributed to supervision, validation, resources, writing—review and editing.

Corresponding author

Correspondence to Tianhong Pan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

DDPG is a policy-based and augmented actor-critic algorithm that consolidates the advantages of the policy gradient and deep Q-network (DQN) algorithms. In contrast to TD3, only the critic network Q and target-critic network $Q^{'}$ were used for the algorithm implementation.

Within minibatch N, the critic network was trained by minimizing the loss function $L(\theta )$, denoted as

$$\begin{aligned} \begin{aligned} {L(\theta )} = \frac{1}{N}{\sum _{i}}(Y_i-Q(s_i,a_i\mid \theta ))^{2} \end{aligned} \end{aligned}$$

(32)

where $Y_i=r_i(s_{i},a_{i})+\gamma {Q^{'}}(s_{i+1},a_{i+1}\mid {\theta ^{'}}))$.

In addition, the calculation of the target action can be expressed as $a_{t+1}=\mu ^{'}(s_{t+1}\mid {\phi ^{'}})$.

When training DDPG-dEWMA, we adopted the same state, action, and reward as described in Sect. 3. The training procedure for DDPG-dEWMA is presented in Algorithm 3. Compared with the TD3-dEWMA scheme, there are differences only in the DRL agent, which are not described in detail here.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, Z., Pan, T. DRL-dEWMA: a composite framework for run-to-run control in the semiconductor manufacturing process. Neural Comput & Applic 36, 1429–1447 (2024). https://doi.org/10.1007/s00521-023-09112-9

Download citation

Received: 28 March 2023
Accepted: 16 October 2023
Published: 11 November 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00521-023-09112-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DRL-dEWMA: a composite framework for run-to-run control in the semiconductor manufacturing process

Abstract

Access this article

Similar content being viewed by others

Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes

Optimal data-driven control of manufacturing processes using reinforcement learning: an application to wire arc additive manufacturing

Designing an adaptive and deep learning based control framework for modular production systems

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DRL-dEWMA: a composite framework for run-to-run control in the semiconductor manufacturing process

Abstract

Access this article

Similar content being viewed by others

Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes

Optimal data-driven control of manufacturing processes using reinforcement learning: an application to wire arc additive manufacturing

Designing an adaptive and deep learning based control framework for modular production systems

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation