Machine learning-based modeling and operation for ALD of SiO2 thin-films using data from a multiscale CFD simulation

https://doi.org/10.1016/j.cherd.2019.09.005Get rights and content

Highlights

  • Multiscale computational fluid dynamics (CFD) modeling of ALD reactor.

  • Machine-learning modeling using multiscale CFD model data.

  • Use of machine learning model to optimize ALD cycle time.

  • Significant reduction of ALD cycle time versus fixed-time deposition.

Abstract

Atomic layer deposition (ALD) is a widely utilized deposition technology in the semiconductor industry due to its superior ability to generate highly conformal films and to deposit materials into high aspect-ratio geometric structures. However, ALD experiments remain expensive and time-consuming, and the existing first-principles based models have not yet been able to provide solutions to key process outputs that are computationally efficient, which is necessary for on-line optimization and real-time control. In this work, a multiscale data-driven model is proposed and developed to capture the macroscopic process domain dynamics with a linear parameter varying model, and to characterize the microscopic domain film growth dynamics with a feed-forward artificial neural network (ANN) model. The multiscale data-driven model predicts the transient deposition rate from the following four key process operating parameters that can be manipulated, measured or estimated by process engineers: precursor feed flow rate, operating pressure, surface heating, and transient film coverage. Our results demonstrate that the multiscale data-driven model can efficiently characterize the transient input-output relationship for the SiO2 thermal ALD process using bis(tertiary-butylamino)silane (BTBAS) as the Si precursor. The multiscale data-driven model successfully reduces the computational time from 0.6 to 1.2 h for each time step, which is required for the first-principles based multiscale computational fluid dynamics (CFD) model, to less than 0.1 s, making its real-time usage feasible. The developed data-driven modeling methodology can be further generalized and used for other thermal ALD or similar deposition systems, which will greatly enhance the feasibility of industrial manufacturing processes.

Introduction

Deposition, along with photo-lithography and etching, is one of the most important building blocks in thin-film material development. In the semiconductor industry, the stringent demands on the scale and complexity of the electronic devices have continuously pushed the deposition techniques to be more precise and controllable. For instance, the state-of-the-art flash memory devices, such as the 3D NAND (not-and) memory and the dynamic random-access memory (DRAM), require sophisticated non-planar 3D design and ultra-thin gates (<10 nm) with demanding conformity and defect formation criteria (George et al., 1996, Schuegraf et al., 2013). Originated as a derivative of the chemical vapor deposition (CVD), atomic layer deposition (ALD) has been developed to build the thin-films that meet the demands for more uniform films and higher aspect-ratio micro-structures (Kääriäinen et al., 2013). In a typical ALD process, two precursor gases are introduced in pulses into the reactor sequentially, with inert purge gas pulse injected between the two precursor pulses. Each precursor pulse is called a half-cycle, and the inert gas pulses are called purging. In each half-cycle, ideally, only one precursor will be in contact and react with the substrate surface. Given the appropriate operating conditions and cycle-time, an ideal ALD half-cycle will be self-limiting, resulting in a highly conformal and fully covered substrate surface terminated with the desired element introduced by the precursor species (George, 2009). Following this alternating precursor scheme, ALD can precisely deposit materials layer-by-layer and effectively allow the uniform coverage on complex geometries (Tanner et al., 2007, Foong et al., 2010, Shirazi and Elliott, 2014, Ishikawa et al., 2017). As a result, ALD has been widely utilized in the field of nanoelectromechanical systems (NEMS), especially for the manufacturing of semiconductor memory devices.

The past decade has witnessed an increasing research interest on ALD and a growing market in the industrial manufacturing of thin-film materials (Raaijmakers, 2011, Kääriäinen et al., 2013). The ALD industry also holds a promising future with its global market expected to reach US$3.2 billion by 2026. Novel precursors and reaction mechanisms were discovered, making the film processing of high aspect-ratio substrate layouts more efficient and feasible. Yet, the limited throughput and the high manufacturing cost of ALD production still trouble the manufacturers, especially amid the fluctuating global semiconductor market. Despite the urgent need for further enhancement of the ALD processes, the overall cost of precursors and equipment thwarts the progression of more extensive research efforts (Shirazi and Elliott, 2014). More specifically, one of the major roadblocks to a deeper understanding of the ALD process is the coupled effect between the reactor gas-phase development and the microscopic thin-film deposition process. Beside the apparent absence of a clear and direct theoretical explanation, it is also extremely difficult to obtain an empirical database of film growth during the experiments due to the limitations in real-time in-situ sensing. Scanning electron microscopy (SEM) and scanning tunneling microscopy (STM) provide detailed film profiles but are often expensive and destructive to the deposited film, and therefore, they cannot be used to perform on-line monitoring (Chen, 1993, Goldstein et al., 2017). Quartz crystal microbalance (QCM) does a good job of measuring the overall deposition rate by examining the total mass of the substrate surface in real-time, but fails to unveil local changes within the substrate (Elam et al., 2002). Another option is the in-situ spectroscopic ellipsometry, which is capable of providing local structural information of the film. However, the operation of spectroscopic ellipsometry equipment is very complicated and thus is not prevalent in industry (Pittal et al., 1993, Dalton et al., 1994). With all aforementioned shortcomings, the existing hardware monitoring methods, despite their fidelity and capability to report the actual film profiles, are not yet applicable for a complete measurement of local film coverage profiles during the industrial ALD process. Therefore, an accurate and all-inclusive simulation model for the ALD processes could be helpful in estimating the transient film coverage and in understanding the real-time development on both microscopic and macroscopic scales, which could be of significant value to both industrial and experimental works.

Many efforts have been done in the past to develop an appropriate model for thermal ALD processes. Due to the limitation of computational power and the inherent differences in operational length scales, it is not economical or even feasible to develop a model that describes both the bulk gas-phase and the substrate surface processes using a single simulation method. As a result, multiscale models are often proposed and developed in which different simulation methods and techniques are adopted for gas-phase and surface processes, respectively (e.g., Ding et al., 2019). For the gas-phase model, one approach is to develop analytical or numerical solutions of the mass, momentum, and energy conservation equations with the suitable boundary conditions (Christofides et al., 2008). Nevertheless, the assumptions in these models, particularly when analytical solutions are sought, make the applicability of the results rather limited as they fail to be applicable to industrial-scale ALD systems, where complex reactor designs are often used to enhance species transport. Therefore, computational fluid dynamics (CFD) modeling may be employed instead and it has been demonstrated to be capable of computing highly accurate gas-phase profiles for complex reactor geometries (Pan et al., 2014, Crose et al., 2018). On the microscopic scale, surface reactions are the predominant processes that determine film deposition. Molecular Dynamics (MD) has traditionally been used to simulate a variety of molecular-scale microscopic events, but again it cannot be applied to simulate the industrial-size ALD system. More recently, the kMC method, which statistically determines and tracks the probability of each event instead of each particle, is proven to be able to efficiently and accurately represent ALD systems, and thereby it is a good choice for the simulation of the microscopic film-growth activities (Knoops et al., 2010, Elliott and Greer, 2004, Weckman et al., 2018, Crose et al., 2018). Therefore, utilizing and combining distinct methods for the macroscopic (gas-phase) and microscopic (film growth) simulations, we can accurately characterize the ALD system as a whole. Recently, we have investigated the thermal ALD process of SiO2 adopting BTBAS as the precursor (Ding et al., 2019). Specifically, using a standalone 3D surface deposition kMC model, we successfully reported a growth rate that lies within the experimental range (1.4 Å  2.1 Å per cycle); and in Zhang et al. (2019), we created a 3D multiscale CFD model integrating the CFD and the kMC model, which correctly predicted the half-cycle time needed under different precursor inlet flow rates and reduced the half-cycle time needed by 39.6%, following the ALD chamber geometry optimizations.

Although the multiscale CFD model provides valuable insights to the dynamics of the ALD process, its solution is computationally time-intensive and thus infeasible to be applied in the context of industrial on-line operational optimization. Data-driven modeling adopting machine learning techniques, especially neural networks, has recently received attention in the field of deposition simulations. Djurabekova et al. (2007) and Nicolas and Lorenzo (2010) have employed an artificial neural network (ANN) to characterize the result of kMC simulation for lattice microscopic structures. Chaffart and Ricardez-Sandoval (2018) and Kimaev and Ricardez-Sandoval (2019) have looked into ANN formulations of multiscale deposition simulation on a 2D geometry. However, no machine learning-based formulation has yet been investigated for an industrial-scale ALD system. Motivated by these considerations, in this work, we first construct a database using the previously developed multiscale CFD model that incorporates the gas-phase CFD model, the microscopic kMC model, and the multiscale workflow (Zhang et al., 2019). Then, a multiscale data-driven model is developed, which consists of a linear parameter varying model approximating the gas-phase CFD model and a feed-forward ANN approximating the microscopic kMC model. By fully parameterizing the film deposition rate in terms of input variables that can be manipulated, measured or estimated in real-time, we are able to preserve the model fidelity of key input-output relationships, while greatly enhancing the computational speed. Thereby, the developed multiscale data-driven model may lead to a significant economic benefit by allowing on-line prediction of the minimum cycle-time for full coverage. By contrast, in order to conduct a thorough research on operating conditions with experiments, the precursor alone could cost about 4.08 million dollars, in addition to the operational cost for heating and running the reactor. For some other precursors, such as BDEAS, this number could be doubled due to their smaller production scale.

Section snippets

Multiscale CFD modeling, data collection and data-driven model construction

This section summarizes the process description of the SiO2 thermal ALD deposition, the set-up of the multiscale CFD model, the collection and pre-processing of simulation data, as well as the formulation of the multiscale data-driven model. The following discussion consists of four subsections. First, the macroscopic gas-phase CFD model is introduced, including the geometry development and the tuning of the gas-phase transport model. Next, the microscopic deposition model with the kMC

Multiscale data-driven model development and validation

In this section, the resulting macroscopic linear parameter varying model and the microscopic ANN model are summarized, and the combined multiscale data-driven model is compared to the first-principles based multiscale CFD model. In addition, advantages and applications adopting this multiscale data-driven model are discussed.

Conclusion

In this work, we developed a multiscale data-driven model from a first-principles based multiscale CFD model of the thermal ALD SiO2 thin-film deposition using BTBAS as precursor. Specifically, the resulting multiscale data-driven model consisted of an ANN-based model for the microscopic film growth domain and a linear parameter varying model for the macroscopic gas-phase domain. The final trained microscopic ANN model achieved a good prediction with a normalized error of 1.0 × 10−3 and a precise

Acknowledgement

Financial support from the National Science Foundation is gratefully acknowledged.

References (51)

  • Q.A. Acton

    Chemical Processes—Advances in Research and Application: 2012 Edition: ScholarlyBrief

    (2012)
  • R.B. Bird et al.

    Transport Phenomena

    (2007)
  • F. Burden et al.

    Bayesian regularization of neural networks

    Artificial Neural Networks: Methods and Applications

    (2008)
  • C.J. Chen

    Introduction to Scanning Tunneling Microscopy, vol. 4

    (1993)
  • P.D. Christofides et al.

    Control and Optimization of Multiscale Process Systems

    (2008)
  • N.R. Council

    Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering

    (2003)
  • L.A. Curtiss et al.

    Gaussian-4 theory

    J. Chem. Phys.

    (2007)
  • T.J. Dalton et al.

    Interferometric real-time measurement of uniformity for plasma etching

    J. Electrochem. Soc.

    (1994)
  • J. Elam et al.

    Viscous flow reactor with quartz crystal microbalance for thin film growth by atomic layer deposition

    Rev. Sci. Instrum.

    (2002)
  • S.D. Elliott et al.

    Simulating the atomic layer deposition of alumina from first principles

    J. Mater. Chem.

    (2004)
  • G. Fang et al.

    Theoretical understanding of the reaction mechanism of SiO2 atomic layer deposition

    Chem. Mater.

    (2016)
  • A. Fluent

    ANSYS Fluent Theory Guide 15.0

    (2013)
  • T.R.B. Foong et al.

    Template-directed liquid ALD growth of TiO2 nanotube arrays: properties and potential in photovoltaic devices

    Adv. Funct. Mater.

    (2010)
  • A. Frisch et al.

    GaussView User Manual

    (2000)
  • S.M. George

    Atomic layer deposition: an overview

    Chem. Rev.

    (2009)
  • Cited by (36)

    • Prediction of local concentration fields in porous media with chemical reaction using a multi scale convolutional neural network

      2023, Chemical Engineering Journal
      Citation Excerpt :

      Fully connected neural networks have been employed for the prediction of integral quantities, such as the permeability, from effective, hand-picked features (such as the porosity or the surface area). This approach can be found in literature [37,38], and in our earlier works for the study of flow and transport in porous media [39,40]. These techniques are effective and their training is easy to carry on, but the resulting model is very sensitive to the choice of input parameters, since the domain is described via upscaled parameters (say, porosity) that have no way of relaying spatial heterogeneity.

    • A review of computational modeling techniques for wet waste valorization: Research trends and future perspectives

      2022, Journal of Cleaner Production
      Citation Excerpt :

      The data of all these scenarios could then be collected, and the impact of such multivariate, non-linear process-variables on the syngas composition or its properties can be effectively studied using ML models. Recent publications on deep learning models to predict the LHV of biomass derived syngas in circulating fluidized bed gasifier (Kartal and Özveren, 2020) or the development of feed forward ANN models to predict the microscopic domain film growth dynamics derived from atomic layer deposition (ALD) reactor (Ding et al., 2019), are relevant examples, where the data for both cases were derived from simulated models. Alternatively, the well-tuned CFD or PS models can be employed to validate the predictions and optimal solutions for experimental design as derived from ML models (Fig. 7d).

    • Modeling spatial distribution of flow depth in fluvial systems using a hybrid two-dimensional hydraulic-multigene genetic programming approach

      2021, Journal of Hydrology
      Citation Excerpt :

      In these studies, a physically-based model was first calibrated and validated, and then the model was assumed to reflect the real physical process and used to provide sufficient data, which was in turn adopted by a machine learning technique to train models. For example, Ding et al. (2019) combined a computational fluid dynamics model and a feed-forward artificial neural network model to develop a multiscale data-driven model for atomic layer deposition (ALD) of SiO2. The study showed that the model can efficiently describe the dynamic relationships for input and output variables for the SiO2 thermal ALD process, and can significantly reduce the computational costs.

    • A computational workflow to study particle transport and filtration in porous media: Coupling CFD and deep learning

      2021, Chemical Engineering Journal
      Citation Excerpt :

      In particular, artificial neural networks (ANN) are one of the most well-known ML tool, experiencing constant improvement and use in industry [47–50]. Data-driven models have been coupled with CFD models for the prediction of fluid dynamics properties [51–53] and for the optimization of devices, for example separators [54,55] and chemical reactors [56–59]. In the case of porous media the random nature of the geometries led to the application of neural networks for the prediction of geometrical parameters: especially coupled with Lattice Boltzmann methods for the prediction of the permeability [60–62].

    View all citing articles on Scopus
    View full text