An effective approach for identifying defective critical fabrication path

Abstract Most defect signatures on wafers are caused by faulty tools. If these defects are not captured by in-line inspection tools or sampled during Defect Review-SEM, they are carried over multiple processing steps and discovered at the end of the fabrication procedure. Wafers with different defect signatures discovered at the end of the fabrication procedure are, most often, processed through same faulty tools. The fabrication paths of most defective wafers converge at some point to form a common sub-path before they disperse again. Usually, faulty tools in this common path cause most of the different defect signatures discovered at the end of the fabrication procedure. Process engineers would need to trace in reverse the fabrication paths to identify this common path to repair its tools. We introduce in this paper a defect diagnostic system called IDFP that can identify the common fabrication path that caused most of the defectivities on wafers discovered at the end of the fabrication procedure. We evaluated the quality of IDFP by experimentally comparing it with two systems. Results revealed marked improvement.


PUBLIC INTEREST STATEMENT
Each wafer undergoes various fabrication equipment before it is fully functional. These wafers are highly vulnerable to defects caused by faulty equipment at the different fabrication processing steps. Each of these equipment is susceptible to malfunction that causes a specific type of defect signature on wafers. Specific malfunction in equipment can cause specific defect signature on wafers. That is, a defect signature on wafer can be caused by a specific type of equipment malfunction at a specific fabrication processing step. Some of these defects may not be captured by in-line inspection tools and are carried over multiple processing steps. They can be discovered at the end of the fabrication procedure, which requires process engineers to trace back the equipment that caused the defects. To overcome this, we introduce in this paper a defect diagnostic system called IDFP that can identify the common fabrication path that caused most of the defectivities on wafers discovered at the end of the fabrication procedure.

Introduction
Relying on engineers' vision to judge defect patterns is unreliable and inconsistent, especially with the increasing sizes of wafers and the shrinking sizes of features. As alternatives, automatic defect detection (ADD) (Chou & Rao, 1997;Chen & Liu, 2000;Liu, Chen, & Lu, 2002;Zhou et al., 2002) and automatic defect classification (ADC) (Adly, Al Hussein et al., 2015;Fenner et al., 2005) methods are gaining importance. The automatic detection is carried out by comparing an image of an inspected die to a reference defect-free die image (Chien, Wang, & Cheng, 2007;Palma, Nicolao, & Miraglia, 2005;Pukite & Berman, 1990;Yuan & Kuo, 2008;Xie & Simske, 2011). Despite the success of most of these methods in identifying defects, most of them have not been successful in associating the source of a defect signature to a specific processing step or tool. This is due, in part, to the fact that some defects may not be captured by in-line inspection tools or may not be sampled during Defect Review-SEM and are carried over multiple processing steps. This makes it difficult for these methods to pinpoint the root cause of the defects.
Most defect signatures on wafers are caused by faulty tools. A defect signature on wafer can be caused by a specific type of tool malfunction at a specific processing step. Some of these defects may not be captured by in-line inspection tools or may not be sampled during Defect Review-SEM and are carried over multiple processing steps (Iwata et al., 2000). They can be discovered at the end of the fabrication procedure, which requires process engineers to trace back the tools that caused the defects.
Each wafer undergoes various fabrication equipment before it is fully functional. These wafers are highly vulnerable to defects caused by faulty equipment at the different fabrication processing steps. Each of these equipment is susceptible to a malfunction that causes a specific type of defect signature on wafers. Actually, most defect signatures on wafers are caused by faulty equipment. Specific malfunction in equipment can cause specific defect signature on wafers. That is, a defect signature on the wafer can be caused by a specific type of equipment malfunction at a specific fabrication processing step. Some of these defects may not be captured by in-line inspection tools and are carried over multiple processing steps. They can be discovered at the end of the fabrication procedure, which requires process engineers to trace back the equipment that caused the defects.
Associating the source of a defect signature to a specific processing step or equipment using ADD (Chen & Liu, 2000;Liu et al., 2002;Zhou et al., 2002), ADC (Adly, Al Hussein et al., 2015;Taha, 2017), or engineers' judgments can be ineffective and inconsistent. This is due, in part, to the fact that some defect signatures may not be captured by in-line inspection tools or may not be sampled during Defect Review-SEM and are carried over multiple processing steps, which makes it difficult for these methods to pinpoint the root causes of the defects. To overcome this, we introduce in this paper a defect diagnostic system called IDFP "Identifying Defective Fabrication Path" that can identify the common fabrication path that caused most of the defectivities on wafers discovered at the end of the fabrication procedure. It does so by identifying the set of all equipment that is likely caused all distinct set of defect signatures discovered at the end of the fabrication procedure.
Identifying all equipment that caused all distinct defect signatures discovered by a prober equipment at the end of the fabrication procedure is more effective than identifying each equipment caused a defect signature individually. This is due to the following: (1) multiple defect signatures can be caused by the same equipment, and (2) some defect signatures can be caused by a sequence of faulty equipment (i.e., a defect signature caused by some faulty equipment can be altered by a subsequent faulty equipment to a different defect signature).
The contributions of this paper are summarized as follows: (1) Proposing a novel framework that identifies the common fabrication sub-path (i.e., common equipment) that caused most of the defectivities on wafers discovered at the end of the fabrication procedure. It can identify this common fabrication sub-path by pinpointing the points of convergence and disperse of the fabrication paths that caused most defective wafers.
(2) Proposing a methodology that pinpoints the faulty equipment within the common fabrication sub-path. Identifying these equipment is crucial, since wafers with various defect signatures discovered at the end of the fabrication procedure are, most often, processed through same faulty equipment.
(3) Proposing a novel algorithm that can efficiently compute the critical fabrication path, which includes equipment used in the fabrication of the wafers with all defect signatures.

Literature review
Wafers usually pass through over 300 fabrication processing steps in the semiconductor manufacturing. Each of these steps comprises one or more equipment. That is, each wafer undergoes various fabrication equipment before it is fully functional. Specific malfunction in equipment can cause specific defect signature on wafers. Below are some of the common defects and the faulty equipment causing them: • Bull eye and circular ring on wafers can be caused by either: (a) non-uniformities of thin film deposition caused by malfunction of the CVD, ECD, or PVD equipment, or (b) uneven temperature distribution caused by malfunction of the rapid thermal annealing equipment (Friedman, Hansen, & Nair, 1997).
• Edge ring defect on wafers can arise from etching problems caused by malfunction of the wet bench equipment (Burkeen et al., 2007).
• Zone defect signature on wafers often caused by malfunction of the thin film deposition equipment (Cunningham & Mackinnon, 1998).
For the sake of early identification of process problems, engineers need to detect the root causes of problems during the fabrication process to reduce the losses caused by excursion . Towards this, many methods have been proposed for detecting spatial defect patterns on semiconductor wafers (de Berg et al., 2008). The authors of (Lee, Yu, & Park, 2001a, 2001b proposed an unsupervised self-organizing map (SOM) algorithm using data sampling method. The proposed algorithm clusters the locations in the spatial chip that have similar defect features. The authors of (Chen & Liu, 2000;Liu et al., 2002) adopted adaptive resonance theory (ART) techniques to identify defect patterns on WBM. The authors of (Hsu & Chien, 2007) proposed a hybrid method to detect defective patterns. The method integrates ART network and spatial statistics. The authors of (Baly & Hajj, 2012) adopted SOM and ART as wafer classifiers using real data sets and extensive simulated and.
A number of methods proposed model-based clustering. The authors of (Chien, Hsu, & Chen, 2013) employed multi-way principal analysis and data mining techniques to monitor and diagnose the semiconductor fabrication process. These techniques are used to derive rules for fault classification and to detect faults. The authors of (Zhou et al., 2002) proposed a method for predicting yield using Critical Area Analysis. First, the method estimates the size of defects based on inline defect data. Then, it partitions yield loss according to the estimated defect sizes. The authors of (Yuan & Kuo, 2008) proposed Bayesian model-based clustering models for clustering spatial defective pattern on semiconductor wafers. The models can determine ellipsoidal, curvilinear, and non-uniform global defect patterns. However, these algorithms cannot differentiate between the different types of defects.
The authors of (Taha, Salah, & Yoo., 2017) proposed a method for clustering the patterns of defective chips on wafers according to the spatial dependence of these defects across all wafer maps. The proposed method can identify the dominant defect patterns and clusters chip defects based on how dominant are their spatial patterns across all wafer maps. The authors of (Taha, 2017) proposed a method for detecting defect signatures on semiconductor wafers and allowing process engineers to trace back the source of a final defectivity identified at the end of the fabrication procedure and associate it to specific defect signatures carried over multiple process zones. The author of (Taha, 2018) proposed a method based on the rules of inferences for diagnosing and identifying the root causes for semiconductor fabrication process. The authors of  proposed a method for discovering defect patterns using simplified subspaced and randomized general regressions. The authors cluster and partition defects using Voronoibased data partitioning.

Outline of the approach
Let M be a non-empty set whose elements are themselves sets. Let each element of M be a set p of tools used in the fabrication of a defective wafer, whose defects were discovered at the end of the fabrication procedure. We observe that the set of defective tools that caused a distinct set of all defect signatures discovered at the end of the fabrication procedure is likely lie in the intersection of all the sets p of M. That is, this set of defective tools is likely to be within a set S, were S ¼ \ p2M p. We call the fabrication path that passes through the set S the Critical Fabrication Path (CFP). This set is critical for identifying the root causes of most defect signatures discovered at the end of the fabrication procedure. IDFP employs an effective approach for identifying the CFP.
Below is an overview of the sequential computational steps taken by IDFP to identify the CFP: (1) Representing the fabrication paths of defective wafers: In the framework of IDFP, the fabrication paths of defective wafers are represented as a graph. We call such a graph the Fabrication Path Graph (FPG). The leaf nodes of FPG represent the end-of-line prober tools. These nodes are labelled with the defect signatures identified from the electrical failures captured by the prober tools. In the framework of IDFP, Wafer Defect Maps (WDMs) are created from either defects captured by in-line inspections stations or by reverseengineering failure analysis techniques. We advocate the use of process engineers' discretion in deciding the number of samples that justify performing failure analysis. Figure 1 shows an illustrative FPG.
(2) Converting a FPG into an equivalent tree: IDFP transforms a FPG into an equivalent tree called the Fabrication Path Tree (FPT). We observe that the defective tools that caused a distinct set of all the defect signatures identified by failure analysis in response to electrical failures captured by end-of-line probers are likely to be located in the path that starts at the root of FPT and ends at the Lowest Common Ancestor (LCA) of the set N of leaf nodes labelled with the distinct set of defect signatures. This process is better illustrated in section 3.
(3) Identifying the CFP: In the framework of IDFP, the LCA of the set N is identified using the concept of existence dependency. If the existence of the set N in FPT is dependent on the existence of a LCA lca, the CFP starts from the root node and ends at the lca. This process is better illustrated in section 4.
(4) Efficiently computing the CFP: To efficiently compute the LCA of the set N, IDFP employs a combination of stack-based sort-merge algorithm and the concept of existence dependency. This process is better illustrated in section 5.

Running example
We present a running example throughout the paper using the FPG shown in Figure 1. Each node in the figure represents a tool. Each node is identified by an alphabetical letter and a colour. All nodes representing tools of the same processing type (i.e., perform the same fabrication process) are assigned the same letter and colour. For example, all plating tools will be assigned the same letter and colour. Figure 1. An illustrative FPG with few paths (in real-word, the number of paths is much larger). Each node represents a tool. Each node is identified by an alphabetical letter and a colour. All nodes representing tools of the same processing type (i.e., perform the same fabrication process) are assigned the same letter and colour. For example, there are two nodes with the same red colour and letter "e" to signify that they are of the same processing type.

Converting a FPG into an equivalent tree
In the IDFP implementation, Dewey IDs are assigned to nodes for the efficient computation of: (1) the existence dependencies between nodes, and (2) the LCA of nodes. A Dewey ID is a Dewey number-like and it provides a straightforward solution to locating the LCA of nodes. Since Dewey numbers/IDs work on trees rather than graphs, IDFP converts a FPG into an equivalent tree. We call the resulting tree the FPT. A FPG and a FPT are equivalent, if each two nodes connected by an edge in the FPG are also connected by an edge in the FPT. Let P w be the fabrication path of a defective wafer w in FPT. P w starts from the root node and ends at a leaf node n. The leaf node n will be labelled with the defect signature discovered on w by failure analysis in response to electrical failure captured by prober tools at the end of the fabrication procedure. Definition 1 formalizes the concept of FPT.

Definition 1, FPT
A FPT is a tree that is equivalent to a FPG. A FPT t is a tuple: t = (N, E, r, λt) where N is the set of nodes representing tools, E ⊆ N × N is the set of edges representing the sequences of tools in the fabrication process, r is the root node of t, and λt: n 0 →Σ is a node-labelling function, where n 0 is a leaf node representing an end-of-line prober and Σ is an alphabet for labelling n 0 with the defect signature mapped to an electrical failure captured by the prober. Each node in t is labelled with a Dewey number-like label called a Dewey ID. If n 0 is labelled with a defect signature d, the Dewey ID of n 0 reveals the fabrication path of the wafers with defect d (i.e., the sequence of the ancestor tools of n 0 , which includes the tools that caused the defect d).
A Dewey ID of node n is a sequence of components, each having the form L x , where L is an alphabetical letter denotes a processing type, and the subscript digit x denotes the number of times that a tool of processing type L was used in the path from the root node to L x in FPT using Depth First Search. Recall that the same tools can be used in alternating fabrication operations of the same line (e.g., the deposition of different thin layers of metals). A Dewey ID uniquely identifies a tool and its sequential order in the fabrication process. The sequence of components L x in the Dewey ID of node n from left to right reveals the chain of ancestors of n and their processing types, starting from the root node. The last component reveals the fabrication processing type of node n itself. For example, the Dewey ID a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 in Figure 2 reveals the following:  Figure 1. Each node n represents a tool and is identified by a Dewey component and a Dewey ID. The Dewey component uniquely identifies n. The Dewey ID uniquely identifies the sequential order of n in the fabrication path. The Dewey ID of n is a sequence of Dewey components, each having the form L x , where L is a letter denotes the processing type of an ancestor of n, and the subscript digit x denotes the number of times a tool of processing type L used in the path from the root to n using DFS. The Dewey ID of a leaf node labelled with a defect d reveals the path that caused d.
Example 1: Figure 2 shows the FPT that is equivalent to the FPG in Figure 1. Each tool is identified by a Dewey component and a Dewey ID. The Dewey component uniquely identifies the tools. The Dewey ID uniquely identifies the sequential order of the tools in the fabrication procedure. The Dewey ID of a leaf node labelled with a defect signature d reveal the path that caused d. For example, the Dewey IDs of the leaf nodes a 0 .b 0 .c 0 .e 0 .f 0 .b 1 .e 1 and a 0 .c 1 .d 1 .e 3 .f 1 .g 1 .h 1 reveal two paths that caused the defect OE.
• The component e 2 reveals that tool a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 is of processing type "e" and the subscript digit 2 reveals that two tools of processing type "e" were used prior to a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 in the path from the root using Depth First Search.
• The Dewey ID of the parent of a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 is a 0 .b 0 .c 0 .e 0 .f 0 .b 2 . The component b 2 reveals that the parent is of processing type "b" and that two tools of processing type "b" were used prior to the parent in the path from the root node using Depth First Search (DFS).

Notation 1, Defect Context (DC)
A DC is a leaf node in FPT. It represents an end-of-line prober tool. It is labelled with a defect signature d mapped to an electrical failure captured by the prober on a defective wafer w. The Dewey ID of the DC reveals the sequence of tools used in the fabrication of w, which includes the tools that may have caused the defect d.
Let D be a set of all distinct defect signatures discovered on wafers at the end of the fabrication procedure during a specified fabrication period. Let lca be the LCA of the DCs in FPT labelled with all the set D. A CFP is the path in FPT that starts from the root node and ends at lca. We observe that the CFP is a common fabrication path of the wafers with all the defect signatures D. That is, a CFP is a sequence of tools used in the fabrication of the wafers with all the defect signatures D. We observe that it is highly likely that the faulty tools that caused all the defect signatures D are located in the CFP. Thus, investigating the CFP is crucial for identifying the root causes of defect signatures discovered at the end of the fabrication procedure. To identify the CFP, IDFP computes the LCA of each set of DCs labelled with all the set D. We call each of these LCAs a Critical Defect Context (CDC). Thus, a CDC is a LCA of a set of DCs labelled with all distinct defect signatures discovered on wafers at the end of the fabrication procedure during a specified fabrication period.
Investigating all tools to identify the root causes of defect signature is time-consuming. It can waste the time of process engineers and distract them from investigating the influential tools in the fabrication process that are likely to have contributed to defect signatures discovered at the end of the fabrication procedure. IDFP overcomes this problem by adopting the concepts of CDC and CFP. These concepts allow IDFP to refine the set of tools used in the fabrication of defective wafers to investigate only the influential ones in the deficiencies discovered at the end of the fabrication procedure. That is, the CDC and CFP concepts allow process engineers to pay more attention to ongoing tools malfunctions that are likely to have contributed to defect signatures discovered at the end of the fabrication procedure. Most of these tools are usually confined between the root tools and the CDC, inclusive.
IDFP employs the concept of existence dependency to identify the CDC of a given set of DCs labelled with all distinct defect signatures discovered at the end of the fabrication procedure. The concept of existence dependency was first proposed for Entity-Relationship modelling (Elmasri & Navathe, 2007). An object x is existence-dependent on an object y if the existence of x is dependent on the existence of y (Widjaya, Taniar, & Rahayu, 2003). Intuitively, in order for a CDC to be valid, the existence of the DCs in FPT should be dependent on its existence. According to the existence dependency concept, for two nodes n and m to have an existence dependency relationship, m and n should belong to different types (Snoeck & Dedene, 1998). Snoeck et al. (Snoeck & Dedene, 1998) confirmed that two objects of the same type can never be existence dependent on each other. The authors demonstrated that an object type in a graph is never existence-dependent on itself. They transformed an OO schema into a graph consisting of the object types found in the schema and their relations. The object types in the graph are related only through associations that express existence dependency. The authors demonstrated that if the two objects O i and O j belong to the same type, O i cannot be dependent on O j and vice versa. IDFP applies the above-mentioned observations to determine whether a CDC of a set of DCs is valid. Recall that a CFP starts from the root node and ends at a valid CDC. Based on the above observations, in order for a CDC to be valid, the existence of the DCs in FPT should be dependent on the existence of the CDC. And, in order for the existence of the DCs to be dependent on the existence of the CDC, the processing type of the CDC should be different than those of the DCs. Otherwise, the CDC is invalid and cannot be used for identifying the CFP.
We describe below the scenarios when a CDC and a DC can be of the same processing type. During wafer fabrication procedure, some fabrication operations of the same line are repeated several times in intervening processing steps. For example, the fabrication operations of depositing different thin layers of metals on the wafer are performed several times in intervening processing steps. These fabrication operations of the same line are performed by tools of the same processing type. For example, the fabrication operations of depositing different thin layers of metals can be performed by tools of the same processing type, such as CVD, ECD, and/or PVD. Therefore, if a CDC and a DC happen to be performing fabrication operations of the same line, the two tools are of the same processing type.

Definition 2, CDC properties
CDC properties are the constraints that permit a LCA lca to be a CDC. These constraints are as follows: (1) lca is the LCA of a set S of DCs, S = {DC| TYP(DC) ≠ TYP(CDC)}, and (2) |S| equals all distinct set of defect signatures discovered on wafers at the end of the fabrication procedure during a specified period.

Efficiently computing the critical fabrication path
We constructed an algorithm called FindCDC (see Figure 3) that can efficiently compute the CFP by efficiently identifying the CDC of a given set of DCs labelled with all distinct defect signatures discovered on wafers at the end of the fabrication procedure. To identify the CDC, algorithm FindCDC employs a combination of stack-based sort-merge approach and the concept of existence dependency described in section 4. The algorithm employs the concept of existence dependency by applying the CDC properties described in section 4. Thus, the algorithm employs the combination of stack-based sort-merge approach and the CDC properties. The input to the algorithm is two arrays, called contexts and defects. Array contexts contains the Dewey IDs of the DCs. Array defects contains all distinct defect signatures discovered on wafers at the end of the fabrication procedure during a specified period of time. Thus, array defects is of length m, where m is the number of distinct discovered defect signatures.
Each iteration of the algorithm produces a new stack state. The entry to the stack is the pair (Dewey component and defects[i]). Let c i , c j ,…, and c k be the Dewey components in the stack from the bottom entry to the stack entry. The stack entry represents a node in FPT, whose Dewey ID is c i .c j .….c k . The bottom entry represents the root node in FPT, whose Dewey ID is c i . Consider the FPT in Figure 2 and the stack in Figure 8-a. The stack entry represents node a 0 .b 0 .c 0 .e 0 .f 0 .b 1 .e 1 in the FPT, and the bottom entry represents the root node a 0 .
Let d 1 , d 2 , …, d m be the distinct discovered defect signatures. defects[i] represents defect signature d i . If a DC dc j is labelled with defect signature d i , this is represented in the stack by storing the last Dewey component of dc j in array defects [i]. For example, consider the FPT in Figure  2 and the stack in Figure 8-b, which represents the DC a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 . This DC is labelled with the defect signature "LRA". This defect signature is represented in the stack by storing the last Dewey component of a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 (i.e., the component e 2 ) in the second field of array defects (i.e., defects (Chien et al., 2013)).
In each iteration of the algorithm, the current DC being processed is called Current Context (CC) (line 2), and the prior DC processed is called Prior Context (PC). Line 3 computes the number of common Dewey components prefixes "q" in the CC and PC. For example, the CC in Figure 8-b is a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 and the PC in Figure 8-a is a 0 .b 0 .c 0 .e 0 .f 0 .b 1 .e 1 ; therefore, q = 5. stack[q] entry represents the Lowest Common Ancestor (LCA) of the CC and PC. If q equals the stack size (line 4),  Subroutine PushNode (Figure 4) pushes into the stack the top components of the CC that do not match the top components of the PC. By applying the CDC properties described in section 4, subroutine ApplyCDCproperty ( Figure 5) first finds in FPT the closest descendant node n of the LCA of the CC and PC, where n and LCA are tools of the same processing type. It then pops from the stack the entry representing n and all entries above it. It also pops these entries' defect information (to satisfy the CFP properties). Subroutine popNode (Figures 6 and 7) pops the remaining entries above stack[q], which represent tools of different processing types than the LCA. Lines 7-9 pass the defect information of the popped entries to the current top entry. Lines 3 and 4: if all the fields of array defects of a popped entry are occupied, the entry represents the CDC.     Second stack state (Figure 8-b) Line 2 of FindCDC: the CC is a 0 .b 0 .c 0 .e 0 .f 0 .b 2 .e 2 . Line 3: q = 5. Line 6 calls Subroutine ApplyCDCproperty, which in turn calls subroutine PopNode (line 5), which proceeds as follows. Line 2 pops the two entries above stack[q = 5] shown in Figure 8-a and lines 7-9 pass the defect information of stack[7] (i.e., the component e 1 ) to stack[q = 5].defects [1]. Line 6 of ApplyCDCproperty calls subroutine PushNode. As shown in Figure 8-b, line 2 of PushNode pushes the non-matching components of the CC (i.e., the components b 2 and e 2 ) into the stack. Since the Third stack state (Figure 8-c) The CC is a 0 .b 0 .c 0 .e 0 .d 0 .g 0 .h 0 . Line 3: q = 4. Line 6 calls ApplyCDCproperty, which proceeds as follows. Line 2: since TYP(stack[7] = TYP(stack[4]) (see Figure 8- The CC is a 0 .c 1 .d 1 .e 3 .f 1 .g 1 .h 1 . Line 3: q = 5. Line 6 calls ApplyCDCproperty, which calls PopNode (line 5), which proceeds as follows. Line 2 pops the entries above stack[q = 5] shown in Figure 8-d and lines 7-9 pass the defect component e 4 to stack[q = 5].defects [3]. Line 6 of ApplyCDCproperty calls PushNode, which pushes the non-matching components of the CC (i.e., h 1 and g 1 ) into the stack and represents the defect "OE" of the CC by storing the last component of the CC (i.e., h 1 ) in "stack Sixth stack state (Figure 8-f) The CC is a 0 .c 1 .d 1 .e 3 .f 2 .b 4 .h 2 . Line 3: q = 4. Line 6 calls ApplyCDCproperty, which calls PopNode (line 5), which proceeds as follows. Line 2 pops the entries above stack[q = 4] and lines 7-9 pass the defect component h 1 to stack[q = 4].defects[1]. Line 8: since TYP(stackEntry [5].defects[3]) = TYP (stack[q = 4], the defect component e 4 will not be passed to stack[q = 4].defects [3]. Line 6 of ApplyCDCproperty calls PushNode, which pushes the non-matching components of CC (i.e., h 2 , b 4 , f 2 ) into the stack and represents the defect "DM" of the CC by storing the last component of the CC (i.e., h 2 ) in "stack[7].defects[3]".
Eights stack state (Figure 8-h) Line 8 of FindCDC calls PopNode, which proceeds as follows. Line 2 pops the entries above stack[4] and lines 7-9 pass the defect components h 3 and h 2 to stack [4].defects [2] and stack [4].defects[3] respectively. Line 3 of PopNode will return true, since all the fields of array defects at stack[4] are occupied. Line 4 will output the stack entry a 0 .c 1 .d 1 .e 3 as the CDC.

Experimental results
We implemented IDFP in Java, run on Intel(R) Core(TM) i7 processor, with a CPU of 2.70 GHz and 16 GB of RAM, under Windows 10. We evaluated IDFP by comparing it experimentally with the following two approaches: : This approach employs decision tree analysis to identify faulty tools and process stages. The approach assigns each tool and process stage a P-value. It applies K-W tests to identify the tools with significant performance variation. Tools and process stages with low P-values are used as input for the decision tree analysis. The P-value threshold proposed in  is 0.3. The approach employs ANOVA F-test as the splitting criterion that identifies the tools with low yields. In the evaluations, we considered these low yields tools as the root causes of defects. We refer to this approach by Chien for easy reference.
• SDCA (Taha, 2017): We proposed previously in (Taha, 2017) a system called SDCA that identifies the tools that caused defect signatures on wafers. SDCA assigns a severity score for each defect signature at each processing step. The severity score of a defect signature d j at a processing step S i is computed based on the maximum Contiguity Ratio  of d j at S i . Then, SDCA assigns a severity score for each processing step and tool based on the severity scores of defect signatures. The score of a tool reflects the relative severities of all defect signatures that caused by this tool.
The dataset used in the experiments is real-world wafer maps from Samsung Electronics in Korea. We used the same dataset in our previous work . The dataset includes the following: (1) fabrication records of 26 lots, (2) 843 wafer maps captured during the fabrication of the 26 lots at various processing steps, (3) information about the tools processed the 843 wafers during the fabrication procedure, and (4) the types of defect signatures captured on the wafers by in-line inspection tools during the fabrication procedure. We employed the software SAS, STATISTICS, and Scenario for data analysis and transformation and graph visualization. We constructed a FPG that represents the dataset by marking the fabrication paths (i.e., a sequence of tools) of the 843 wafers. IDFP transformed the FPG into an equivalent FPT. Table 2 shows the notations of the defect types in the evaluation dataset, which we used in the legends of the figures that plot the experimental results.

Evaluating the prediction accuracy in terms of precision
We divided the fabrication period of the 26 lots into 10 equal periods. For each period, we measured the precision of each model for identifying the tools that caused defect signatures during this period. Figure 9 shows the results. We computed the Precision using the standard formula shown in Equation (1).

Evaluating the prediction accuracy in terms of the percentage of hit rate
We evaluated the effectiveness of the three models for identifying the faulty tools that contributed to the defect signatures on the 843 testing wafers in terms of percentage of hit. For each defect signature d j on the testing wafers, we computed the percentage of hit achieved by each model for identifying the faulty tools that caused d j . Figure 10 shows the percentages of hit.
4.3. Evaluating the prediction accuracy in terms of the closeness between the predicted and actual faulty tools in their steps in the fabrication sequence Each wafer undergoes various fabrication processing steps in the fabrication sequence. Each of these processing steps is performed by one or more tools. Thus, each wafer undergoes various tools in a sequential order.   d. The closer e j in the fabrication sequential order to e i the better the prediction of m. In this section, we evaluate the prediction accuracy of m by measuring the distance in the fabrication sequence between the steps of e i (d) and e j (d). The distance between e i (d) and e j (d) is the number of processing steps between the steps performed by e i (d) and e j (d) in the fabrication sequence. Intuitively, the smaller the distance, the more accurate the prediction is. A zero distance represents an exact match (i.e., a hit). For all defect signatures of type d, let dis m (d) be the average distance of model m with regard to d, which is the average distance between the steps performed by e i (d) and e j (d) in the fabrication sequence. We use Equation (2) to measure dis m (d). Figure 11 shows the results. (2)

Discussion
As Figure 10 shows, IDFP outperformed the other two models in terms of percentage of hit in identifying the exact faulty equipment that caused defect signatures. The overall average percentages of hits achieved by the models were 53%, 44%, and 34% for IDFP, SDCA (Taha, 2017), and Chien  respectively. Based on our observation of the experimental results, we attribute the outperformance of IDFP over SDCA and Chien's methods to the following: (1) the strengths of IDFP predictive techniques (i.e., its good predictive capabilities in associating defect signatures to faulty equipment), and (2) the limitations of Chien and SDCA's methods. In subsection 5-1 we outline the strengths of IDFP's method that have been observed from the experimental results. In subsections 5.2 and 5.3, we outline the limitations of Chien and SDCA's methods, respectively, observed from the experimental results and contributed to the outperformance of IDFP over them. In subsection 5-4 we outline the limitations of IDFP's method that have been observed from the experimental results.

Strengths of IDFP predictive techniques
In 36% of the test wafer maps, IDFP was able to achieve exact hits. That is, IDFP predicted faulty tools that are exact match to the actual faulty tools caused the defect signatures on 36% of the test wafer maps. As Figure 10 shows, IDFP achieved hits greater than 50% in identifying the faulty tools that caused more than 50% of the defect signatures. This is an indicative of the good predictive capabilities of IDFP, which would allow process engineers to prioritize their investigation of faulty tools and to pay more attention to the ones that caused most defect signatures.
As Figure 11 shows, IDFP achieved smaller distances than the other two methods in most of the defect signatures. Intuitively, the smaller the distance, the more accurate is the prediction of Figure 11. The average distance of each model for each defect signature d, which is the average distance between the steps performed by e i (d) and e j (d).
a method. Thus, a zero distance represents an exact prediction match. IDFP achieved zero distances in heavy metal contamination and poor etching uniformity defect signatures. In general, and as Figure 11 shows, IDFP achieved small distances under all the defect signatures. It achieved distances less than 1 in 47% of the defect signatures. This confirmed the practical viability of IDFP's method in accurately identifying the faulty tools that cause defect signatures.
Let e i (d) be the actual equipment that caused the defect d. The experimental results revealed that IDFP could identify either e i (d) or an equipment e j (d) close to e i (d) in the fabrication sequential order as the root cause of d. Actually, in real-world scenarios, e j (d) can contribute to the defect d along with e i (d). That is, equipment that are close to one another in the fabrication sequential order can all contribute to a specific defect. Therefore, not all IDFP's incorrectly predicted faulty equipment are considered false, since most of them are close to the actual faulty equipment. Moreover, these missed predictions can make it easier for process engineers to pinpoint the exact faulty equipment, since there are close to the actual faulty ones in the fabrication sequence.

Limitations of Chien method
Based on our observation of the experimental results, we outline the limitations of Chien's method as follows: (a) The method achieved poor prediction accuracy in each fabrication period (recall section 4.1) that do not have large test wafer maps. It also achieved poor prediction accuracy for defect signatures (recall section 4.2) that did not occur on a large number of test wafer maps. That is, the method requires a large sample size for its decision tree analysis. If the sample is not large enough, it terminates the decision tree immaturely, due to small observations in leaf nodes.
(b) The method was biased toward the predictors in the decision tree paths that have more levels.
(c) The method generated significantly different decision trees for the fabrication periods (recall section 4.1) that have small variations in test wafer maps. That is, the method produces very large changes in the decision trees even though the changes in wafer maps are minimal.

Limitations of SDCA method
Based on our observation of the experimental results, we outline the limitations of the SDCA's method to the following: (a) Similar to the Chien's method, the SDCA's method showed very sensitivity to small test wafer maps. It also achieved poor prediction accuracy in the fabrication periods (recall section 4.1) and the defect signatures (recall section 4.2) that do not have large test wafer maps. This is because the contingency table analysis employed by SDCA cannot obtain a significant association, if the sample size is small. As a result, the findings obtained by a contingency table may not be significant even though they are statistically significant.
(b) The method produced erroneous conclusions, when the expected frequency in a contingency table is small (e.g., less than 5). That is, it is very sensitive to small frequencies.
(c) The method did not permit a participant measured to fit in more than one category. That is, the method requires all measured individual participants to be independent.

Limitations of the IDFP method
The experimental results showed that, if a defect was caused by several faulty tools, IDFP may not identify all of them. However, it can identify at least one of them. For example, while IDFP achieved acceptable hit rates in identifying the faulty tools that contributed to aluminum bridging, it achieved relatively low hit rates in identifying the faulty tools that contributed to copper bridging. This is because the aluminum bridging in the evaluation dataset was due to under-etch caused by only wet etch tool, while copper bridging was due to several different tools, namely plating tools (which caused copper corrosion), litho-etch tools (which caused copper under-etch), and copper CMP tools (which caused faulty smoothing). Even in such a scenario, the IDFP's concept of CFP can narrow down the list of suspect tools, which saves the time of process engineers to identify the faulty one.
As shown in Figure 10, IDFP achieved relatively lower hit rates in identifying the faulty equipment that contributed to corrosive residue and misalignment defect signatures. This is because there are relatively larger number of fabrication equipment that caused these two defects (e.g., all photolithography steppers can cause misalignment and all CVD, ECD, and PVD equipment can cause corrosive residue). Therefore, the CFP concept may include multiple equipments that can cause these defects, which makes it harder for IDFP to pinpoint the exact faulty one.

Conclusions
We introduced in this paper a defect diagnostic system called IDFP that can identify the common fabrication path (i.e., the CFP) that caused most of the defectivities on wafers discovered at the end of the fabrication procedure. This path includes the set of all tools that is likely caused all the distinct set of defect signatures discovered by failure analysis in response to electrical failures captured by prober tools at the end of the fabrication procedure. Identifying all tools that caused all distinct defects is more effective than identifying individual faulty tools.
We evaluated IDFP by comparing it experimentally with Chien et al.  and SDCA (Taha, 2017), in terms of: (1) the precision of identifying the tools that caused defect signatures during different fabrication periods, (2) the percentage of hit, and (3) the closeness between the predicted and actual faulty tools in their steps in the fabrication sequence. We summarize below the major findings of the experimental results: (a) IDFP achieved very good hit rates. It achieved hits greater than 50% in identifying the faulty tools that caused more than 50% of the defect signatures.
(b) IDFP achieved small distances between its predicted faulty tools and the corresponding actual ones in the fabrication sequence that caused defect signatures. Actually, it achieved small distances in all defect signatures. It achieved distances less than 1 in 47% of the defect signatures.
The above confirmed the practical viability of IDFP's method to accurately identify the faulty tools that cause defect signatures. This allows process engineers to prioritize their investigation of faulty tools and to pay more attention to the ones that cause most defect signatures. In general, the experimental results revealed clear indicative of the good predictive capabilities of IDFP.
If a defect was caused by several faulty tools, IDFP may not identify all of them. Specifically, IDFP achieved relatively lower hit rates in identifying the root causes of defect signatures that caused by a larger number of fabrication equipment. Therefore, we need to investigate further approaches in a future work that can improve IDFP's performance by overcoming the above limitation.

Funding
The author received no direct funding for this research.