Diffusion maps based k-nearest-neighbor rule technique for semiconductor manufacturing process fault detection

https://doi.org/10.1016/j.chemolab.2014.05.003Get rights and content

Highlights

  • A diffusion maps based k-nearest-neighbor rule fault detection technique is proposed.

  • The low dimensional manifold feature space with the intrinsic dimensionality is extracted by diffusion maps.

  • The adapted kNN rule based fault detection method is applied to the low dimensional manifold feature space to detect potential faults.

  • The effectiveness of the proposed method is evaluated in one simulation experiments and the semiconductor manufacturing process.

Abstract

In the semiconductor industry, traditional multivariate statistical process monitoring methods and pattern classification based detection methods have been developed to detect the semiconductor process faults. However, they do not show superior performance due to the limits of these methods and the unique characteristics of semiconductor processes such as non-linearity and multimodal batch trajectories. This paper presents a novel diffusion maps based k-nearest-neighbor rule (DM-kNN) technique that can reduce data-storage costs and enhance the performance of fault detection by integrating diffusion maps analysis with k-nearest-neighbor rule. DM-kNN takes full advantage of the dimensionality reduction and information preserving properties of DM to extract the low dimensional manifold feature that optimally preserves the intrinsic nonlinear structure of the data set. Then the adapted kNN rule based fault detection method is applied to the low dimensional manifold feature space to detect potential faults. The effectiveness and robustness of DM for dimensionality reduction and feature extraction are verified in simulation experiments compared with other linear and nonlinear dimensionality reduction methods. In addition, DM-kNN is applied to monitor the semiconductor manufacturing process. The fault detection results of the proposed method are demonstrated to be superior to those of the MPCA, FD-kNN, PC-kNN and FS-kNN approaches.

Introduction

Semiconductor manufacturing system is considered to be one of the most complicated high-tech manufacturing systems. The key characteristics of this process include the high complexity of wafer processing, non-linearity of most batch processes, multimodal batch trajectories due to product mix, large-scale systems and the high degree of measurement uncertainty, which all together have posed great difficulties to these traditional fault detection methods [1]. Therefore, designing effective fault detection technologies to improve product quality, reduce scrap rate and wafer quantity, and ensure the safety during the production process is the key point of the semiconductor manufacturing process. Many advanced process controls and detecting methods are proposed to obtain accurate and effective fault detection results [2], [3], [4], [5].

Data-driven based statistical process control methods have been widely applied in semiconductor manufacturing processes, because of large amounts of trace or machine data generated and collected by process equipments in modern industry [6]. Among all fault detection methods, multivariate statistical process control (MSPC) method as the effective data-driven method, has been successfully applied to on-line process fault detection, in particular, chemical processes, biochemical processes, semiconductor processes, etc. [7], [8], [9]. The most popular MSPC methods are multi-way PCA (MPCA) and multi-way PLS (MPLS) in the semiconductor industry [6], [10], [11], [12], [13]. However, some drawbacks of MPCA and some unique characteristics of semiconductor manufacturing processes make it unable to gain ideal detection performance. The main reason is that the derivation of threshold for the MPCA detection statistics relies on the assumption that the process data follow a multivariate Gaussian distribution approximately [14], [15]. Nevertheless, the unique characteristics of semiconductor manufacturing processes such as non-linearity and multimodal batch trajectories usually result in non-Gaussian distributed process information, which will deteriorate the detection performance of MPCA and MPLS. To overcome this limitation and explicitly account for these unique characteristics, a pattern recognition based fault detection method utilizing the k-nearest-neighbor rule (FD-kNN) has been developed recently [16]. The basic idea of FD-kNN is that a faulty sample's distance to the k nearest neighboring normal samples must be greater than a normal sample's distance to the k nearest neighboring normal samples. FD-kNN can offer better fault detection performance compared with MPCA. However, the drawback of FD-kNN is that it requires large storage space to store the complete original training data set for large-scale processes with thousands of variables after batch unfolding. This is because for the test sample, FD-kNN needs to search all the training data set to calculate the Euclidean distance between test sample and training data set to further identification of its kNN. This may increase the burden of on-line process fault detection when tens of thousands of such fault detection models are running simultaneously due to high-mix production. For example, IBM has more than 7000 on-line fault detection models of running [17], while Intel has as many as 30,000 of such models [18]. In order to reduce storage space, He and Wang [19] proposed principal component based kNN method (PC-kNN). The basic idea was using PCA to realize data dimension reduction and extract the original samples' principal component subspace (PCs) which was the modeling sample, then the FD-kNN method is applied to the score subspace to detect faults. However, this method ignores abnormal information that occurred in the residual subspace, leading to high Type II errors (missed detection). Aiming at the flaws that the PC-kNN ignores the residual information, feature space based k nearest neighbor method (FS-kNN) was proposed by Guo et al. [20]. In their work, the feature space was the sum of PC subspace and squared prediction error (SPE). Compared to PC-kNN, FS-kNN has an improved fault detection performance, but there is still a high false alarm and missed detection rates. The main reason of suboptimum fault detection performance showed by FS-kNN and PC-kNN is that the principal component subspace or feature space extracted by PCA may not be able to effectively represent the original data space. On the one hand, PCA is a kind of linear technique; it may not be able to extract nonlinear relations or features of the complex semiconductor data with severe non-linearity. On the other hand, PCA is a second-order method, which means that it considers only mean and variance or covariance of the data set, but does not take into account the local structure information from the data with complicated distributions, which is common for the actual industrial processes.

To handle the non-linear problem of industrial process data, some nonlinear theories have been developed, such as the conventional nonlinear PCA, neural networks and other kernel-based approaches. Dong and McAvoy [21] developed a nonlinear PCA combined with principal curves and neural networks. However, such nonlinear mappings can only be suitable for a limited class of nonlinear models due to the nonlinear function of the principal curve algorithm which is a linear combination of some univariate functions [22]. Furthermore, a nonlinear optimization problem cannot be avoided to calculate the principal curves and train the neural networks. Recently, the kernel PCA has been proposed through introducing the nonlinear kernel function, which does not involve nonlinear optimization [23], [24], [25], [26]. However, the possibility that the intrinsic geometry structure of data may reside on a manifold is not explicitly considered by KPCA [27]. Recently, some promising nonlinear dimensionality reduction algorithms called manifold learning have emerged, such as local tangent space alignment (LTSA) [28], locally linear embedding (LLE) [29], isometric feature mapping (ISOMAP) [30], and diffusion maps (DM) [31], [32]. Among them, diffusion maps approach is a robust nonlinear manifold dimensionality reduction technique. It is defined on a graph of the data points through constructing a Markov chain. Then, the spectral eigenanalysis is performed on the probability transition matrix of the Markov chain to achieve the low-dimensional representation of data points. Compared with the classical linear technique, DM can explicitly discover the nonlinear manifold geometry that embedded in the complex high-dimensional space. Simultaneously, it preserves the local structure of the features via retaining the pairwise diffusion distances as well as possible. Moreover, diffusion distance is computed using all possible paths between any two data points, so it is more robust to noise than the geodesic distance (shortest path distance) used by ISOMAP. At present, the application fields of diffusion maps are mostly focused on image processing and pattern recognition. For instance, Jingen Liu et al. [33] applied diffusion maps analysis to learn the semantic features for action recognition; Feng Zheng et al. [34] proposed a semi-supervised algorithm based on diffusion maps to realize visualization on the Yale face and human pose images; and R. R. Coifman et al. [35] presented diffusion maps technique to identify different tissue samples of hyperspectral pathology tissue image. It noticed that the target dimensionality of above applications is always restricted not to go beyond three in order to realize visualization, this may lead to poor performance due to excessive discard of some meaningful information. Hence, it is necessary to estimate the intrinsic dimensionality of the original data during the feature extraction phase. Furthermore, diffusion maps technique is seldom researched in the fault diagnosis field, especially the application in the industrial process fault detection and diagnosis is much less. Yixiang Huang et al. [36] used discriminant diffusion maps analysis for machinery condition monitoring and fault diagnosis.

In order to enhance the performance of fault detection, reduce the storage of the training samples and overcome the shortcomings of PCA in dealing with non-linear data sets, a novel fault detection method called diffusion maps based k-nearest-neighbor rule (DM-kNN) is presented for semiconductor manufacturing process in this paper. In the first step, correlation dimension method is used to estimate intrinsic dimensionality of the original data set, and then diffusion maps analysis is applied to the original data set to transform high dimensional data into low dimensional manifold feature space with the intrinsic dimensionality. In the second step, the kNN fault detection method is applied to low dimensional manifold feature space to detect potential faults. From the viewpoint of feature extraction and fault detection performance, DM-kNN has the following advantages:

  • 1)

    Diffusion maps analysis is capable of discovering the nonlinear structure of the manifold existing in the given data and optimally preserves the intrinsic geometry structure in a low dimensional manifold feature space.

  • 2)

    DM-kNN has relatively high robustness and stability for the process fault detect, because the diffusion maps algorithm based on the diffusion distance has more robust to noise than, e.g., the geodesic distance.

  • 3)

    Normal and abnormal samples can be obviously mapped into different manifold regions after diffusion maps and kNN detection statistics is based on distance, which causes DM-kNN to have an optimal fault detection performance.

  • 4)

    Compared to FD-kNN, DM-kNN can remove the redundancy of the original data sets and reduce storage space since the low manifold feature space usually has a much lower dimension compared with the original data space.

The rest of this paper is organized as follows. Section 2 reviews the FD-kNN technique. Section 3 describes the theoretical aspects of diffusion maps analysis and presents the novel fault detection method of DM-kNN. In Section 4, simulation example is, firstly, used to illustrate dimensionality reduction and information extraction capability of the DM method. Then, DM-kNN is applied to semiconductor manufacturing process, and the fault detection results are compared with MPCA, FD-kNN, PC-kNN and FS-kNN approaches. The conclusions of this work are summarized in Section 5.

Section snippets

Fault detection using k-nearest-neighbor rule (FD-kNN)

The k-nearest neighbor (kNN) rule was first developed by Fix and Hodges [37]. kNN has been widely used in pattern classification such as near infrared (NIR) spectroscopy data classification [38], [39], where the unlabeled samples are classified through searching the k nearest labeled samples in the training data. Recently, a new fault detection method using kNN rule (FD-kNN) has been developed. The basic idea of FD-kNN lies in the fact that a normal sample trajectory is similar to the

Diffusion maps based k-nearest-neighbor rule approach for fault detection

The disadvantage of FD-kNN is that it requires larger storage space to store the training data sets for large-scale processes, because there may be thousands of variables after batch unfolding. The reason for large requirement of storage space is that for each test sample, FD-kNN requires finding the sample's k nearest neighbors from the whole training data set through computing the Euclidean distance between test sample and training data set. Although PC-kNN was proposed in order to reduce the

Estimate the corresponding control limit of Di2 for fault detection

For the determination of threshold, He and Wang [16] assume that the k-nearest neighbor distances dij are normally distributed around a nonzero mean among training samples and Di2 follow a non-central χ2 distribution. Thus, the threshold with a significance level can be estimated by the chilimit function. However, because of the complexity of the industrial process, the process sample is impossible to obey the rigorous statistical distribution. Also the selection of the k-nearest neighbor

Application example

In this section, two process cases are given to illustrate the performance of DM-kNN for process fault detection. First, the simulation examples with nonlinear distribution are used to test the effectiveness and robustness of diffusion maps for dimensionality reduction and information extraction. Then, semiconductor manufacturing process is used to verify the fault detection performance of DM-kNN method, and to compare it with MPCA, FD-kNN, PC-kNN and FS-kNN.

Conclusion

In this paper, a novel diffusion maps based k-nearest-neighbor rule (DM-kNN) technique is proposed and applied to the fault detection of semiconductor manufacturing processes. The presented approach uses the correlation dimension estimator to assess the intrinsic dimensionality of the original process data. Then, the diffusion maps can help extract the low dimensional manifold features with the intrinsic dimensionality that optimally preserves the intrinsic nonlinear structure of the data sets.

Acknowledgment

The authors would like to acknowledge the National Natural Science Foundation of China under Grant numbers 61174119, 61034006, and 60774070, and Liaoning Province Foundation under Grant number 2009R47.

References (48)

  • Y. Huang et al.

    Discriminant diffusion maps analysis: a robust manifold learner for dimensionality reduction and its applications in machine condition monitoring and fault diagnosis

    Mech. Syst. Signal Process.

    (2013)
  • R.M. Balabin et al.

    Biodiesel classification by base stock type (vegetable oil) using near infrared spectroscopy data

    Anal. Chim. Acta.

    (2011)
  • R.M. Balabin et al.

    Gasoline classification using near infrared (NIR) spectroscopy data: comparison of multivariate techniques

    Anal. Chim. Acta.

    (2010)
  • B. Nadler et al.

    Diffusion maps, spectral clustering and reaction coordinates of dynamical systems

    Appl. Comput. Harmon. Anal.

    (2006)
  • J. Yu

    Local and global principal component analysis for process monitoring

    J. Process Control

    (2012)
  • S. Joe Qin

    Statistical process monitoring: basics and beyond

    J. Chemom.

    (2003)
  • J. Chen

    Removal of the effects of outliers in batch process data through maximum correntropy estimator

    Chemom. Intell. Lab. Syst.

    (2012)
  • Z. Ge et al.

    Review of recent research on data-based process monitoring

    Ind. Eng. Chem. Res.

    (2013)
  • B.M. Wise et al.

    A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process

    J. Chemom.

    (1999)
  • G.A. Cherry et al.

    Multiblock principal component analysis based on a combined index for semiconductor fault detection and diagnosis

    IEEE Trans. Semicond. Manuf.

    (2006)
  • P. Nomikos et al.

    Multivariate SPC charts for monitoring batch processes

    Technometrics

    (1995)
  • R. Chen et al.

    Plasma etch modeling using optical emission spectroscopy

    J. Vac. Sci. Technol. A Vac. Surf. Films

    (1996)
  • J. Wong

    Batch PLS analysis and FDC process control of within lot SiON gate oxide thickness variation in sub-nanometer range

  • S.J. Qin et al.

    On unifying multiblock analysis with application to decentralized process monitoring

    J. Chemom.

    (2001)
  • Cited by (46)

    • Ensemble learning with member optimization for fault diagnosis of a building energy system

      2020, Energy and Buildings
      Citation Excerpt :

      In addition to building energy systems such as air-conditioning and refrigeration, machine learning methods have been widely used for fault diagnosis in other fields. Li et al. [20] proposed a diffusion mapping technology based on the K-nearest neighbor (KNN), which combines diffusion mapping analysis with KNN and can successfully reduce the cost of data storage and improve fault detection performance. Wang et al. [21] proposed an early fault diagnosis method based on the KNN model, with which the limit switch function of electromechanical equipment can be replaced before it fails.

    View all citing articles on Scopus
    View full text