Generating Defective Epoxy Drop Images for Die Attachment in Integrated Circuit Manufacturing via Enhanced Loss Function CycleGAN

In integrated circuit manufacturing, defects in epoxy drops for die attachments are required to be identified during production. Modern identification techniques based on vision-based deep neural networks require the availability of a very large number of defect and non-defect epoxy drop images. In practice, however, very few defective epoxy drop images are available. This paper presents a generative adversarial network solution to generate synthesized defective epoxy drop images as a data augmentation approach so that vision-based deep neural networks can be trained or tested using such images. More specifically, the so-called CycleGAN variation of the generative adversarial network is used by enhancing its cycle consistency loss function with two other loss functions consisting of learned perceptual image patch similarity (LPIPS) and a structural similarity index metric (SSIM). The results obtained indicate that when using the enhanced loss function, the quality of synthesized defective epoxy drop images are improved by 59%, 12%, and 131% for the metrics of the peak signal-to-noise ratio (PSNR), universal image quality index (UQI), and visual information fidelity (VIF), respectively, compared to the CycleGAN standard loss function. A typical image classifier is used to show the improvement in the identification outcome when using the synthesized images generated by the developed data augmentation approach.


Introduction
In integrated circuit (IC) manufacturing, vision-or camera-based inspection systems are often used to enable automatic inspection of defects encountered during IC production. There are three primary components associated with such systems: a camera sensor, a computer running the inspection algorithms, and a sorter to separate defective and nondefective ICs. In this paper, defective epoxy drop images for die attachment are studied using die substrate images captured by a camera.
Die attachment is a crucial step in the production of ICs. It is also referred to as die bonding or die mounting and involves attaching a silicon wafer die to a die pad or a substrate. Adhesive die attachments are most widely used due to their low cost [1]. In order to create a bond between a die and a substrate, an adhesive die attachment, as its name implies, uses an adhesive or epoxy. An epoxy die attachment equipment/die bonder is normally used to attach a die to its substrate using epoxy adhesive [2]. As illustrated in Figure 1, the die is set on top of the epoxy drop that is placed on the substrate and is bonded by heating.
A difficult aspect of die attachment is attaching the die to the substrate with the least amount of epoxy possible. An excessive amount of epoxy could cause the die or chip to tilt or overflow the substrate, compromising the stability of the entire IC package. On the other hand, an inadequate amount of epoxy could cause a bond line to become too thin, creating a weak bond with insufficient mechanical strength, which could lead to die cracking or A difficult aspect of die attachment is attaching the die to the substrate with the least amount of epoxy possible. An excessive amount of epoxy could cause the die or chip to tilt or overflow the substrate, compromising the stability of the entire IC package. On the other hand, an inadequate amount of epoxy could cause a bond line to become too thin, creating a weak bond with insufficient mechanical strength, which could lead to die cracking or die lifting. It is therefore necessary to inspect epoxy drops for proper die attachment. Due to the time and cost involved in conducting the inspection manually, the inspection process has been automated in modern IC manufacturing. The inspection process needs to be computationally efficient with high accuracy so that it is deployable in an actual production line.
A vision-based deep learning system can be utilized to conduct the inspection process automatically by examining images of an epoxy drop to assess whether a proper amount has been used. Depending on predefined quality standards, the system would accept or reject the die in real time. The use of modern deep neural networks requires the availability of a very large number of images with both adequate and inadequate amounts of epoxy. In practice, only a few images with inadequate amounts of epoxy, named defective or rejected epoxy images, are available since such cases occur quite infrequently. A good performance is normally reached when the training data are equally balanced for non-defective and defective image samples. Furthermore, although the lack of defective epoxy images can be mitigated by using conventional data augmentation techniques (such as cropping, rotating, flipping, and translating) [3], these techniques do not provide the image diversity needed for adequate training or testing of deep neural networks. The testing situation arises when only non-defective images are used for training a deep neural network and the generated defective images are used to test the trained network.
Generative adversarial network (GAN) models are being increasingly used for data augmentation purposes [4]. Many variations of GANs have been introduced in recent years. A variation of a GAN called CycleGAN [5] has been used to synthesize realistic images in different applications. In this paper, the CycleGAN standard loss function is modified or enhanced by utilizing learned perceptual image patch similarity (LPIPS) [6] and a structural similarity index metric (SSIM) [7] in order to generate more realistic defective or rejected epoxy images. Different combinations of LPIPS and the SSIM with/without the standard loss function are examined.
A data augmentation method for generating high-quality rejected epoxy images is introduced in this paper by enhancing the standard loss function CycleGAN via incorporating LPIPS and SSIM image quality metrics. This augmentation approach enables vision-based deep neural networks to be trained or tested more effectively by having equal numbers of defective and non-defective images. It is further shown that this data augmen- A vision-based deep learning system can be utilized to conduct the inspection process automatically by examining images of an epoxy drop to assess whether a proper amount has been used. Depending on predefined quality standards, the system would accept or reject the die in real time. The use of modern deep neural networks requires the availability of a very large number of images with both adequate and inadequate amounts of epoxy. In practice, only a few images with inadequate amounts of epoxy, named defective or rejected epoxy images, are available since such cases occur quite infrequently. A good performance is normally reached when the training data are equally balanced for non-defective and defective image samples. Furthermore, although the lack of defective epoxy images can be mitigated by using conventional data augmentation techniques (such as cropping, rotating, flipping, and translating) [3], these techniques do not provide the image diversity needed for adequate training or testing of deep neural networks. The testing situation arises when only non-defective images are used for training a deep neural network and the generated defective images are used to test the trained network.
Generative adversarial network (GAN) models are being increasingly used for data augmentation purposes [4]. Many variations of GANs have been introduced in recent years. A variation of a GAN called CycleGAN [5] has been used to synthesize realistic images in different applications. In this paper, the CycleGAN standard loss function is modified or enhanced by utilizing learned perceptual image patch similarity (LPIPS) [6] and a structural similarity index metric (SSIM) [7] in order to generate more realistic defective or rejected epoxy images. Different combinations of LPIPS and the SSIM with/without the standard loss function are examined.
A data augmentation method for generating high-quality rejected epoxy images is introduced in this paper by enhancing the standard loss function CycleGAN via incorporating LPIPS and SSIM image quality metrics. This augmentation approach enables vision-based deep neural networks to be trained or tested more effectively by having equal numbers of defective and non-defective images. It is further shown that this data augmentation of synthesized defective images leads to improved identification outcomes.
The rest of the paper is organized as follows. Section 2 provides a review of previous works related to GAN models for data augmentation applications. An overview of Cycle-GAN is then covered in Section 3 together with the introduced enhanced loss function to address the die attachment problem of interest here. The experimental results in terms of quantitative evaluation metrics are then presented in Section 4 for the CycleGAN standard loss function and the enhanced loss function. The paper is finally concluded in Section 5.

Previous Works on Data Augmentation Using a GAN
This section provides a summary of data augmentation performed for different applications using a GAN. A GAN was used to produce many versions of an image in [8]. A review of medical image augmentation papers using a GAN was covered in [9]. A pixel-level image augmentation technique was developed in [10] based on image-to-image translation with a GAN. It was trained on a surface defect dataset of magnetic particle images to generate synthesized image samples. A variation of CycleGAN named AttenCGAN was proposed in [11] to synthesize electrical commutators and surface images with artificial defects to increase the number of image samples. A defect transfer GAN (DT-GAN) was developed in [12] to produce realistic surface defect images. The Mask2Defect GAN was suggested in [13] to create surface defect images obtained from an automobile part stamping plan. A region-and strength-controllable GAN for creating synthesized defects in metal surfaces was also proposed in [14] based on the idea of image inpainting. To produce high-quality defect images, a so-called relative mean generative adversarial network (TARGAN) was introduced in [15] using a metal gear surface defect image dataset and a hot-rolling strip defect image dataset. In [16], an MAS-GAN-based model was proposed for the production of industrial defect images by combining an attention mechanism and a data augmentation module. A framework called DefectGAN was introduced in [17] by using a compositional-layer-based architecture to generate realistic defect images. For data augmentation of surface defects on hot-rolled steel strips, three GANs were trained in [18], a new GAN called a contrastive GAN was proposed in [19], and a semi-supervised learning (SSL) defect classification approach based on two different networks of a categorized generative adversarial network (GAN) and a residual network was proposed in [20]. In order to produce defect images using a large number of defect-free images of commutator cylinder surfaces from industrial sites, a generation technique known as the surface defect generation adversarial network (SDGAN) was introduced in [21]. GAN models were also utilized for unsupervised surface inspection in [22], for anomaly detection on structured and arbitrary textured surfaces in [23], for Mura defect classification in thin-film transistor liquid crystal display (TFT-LCDs) in [24], for enhancing the quantity and quality of images of fabric defects in [25], for the autonomous design of architectural shape sketches in [26], and for establishing the probabilistic correlations of quasi-static responses of bridges in [27].
More recently, GANs have been used for IC manufacturing applications. A multi-scale GAN with a transformer (MST-GAN) as a semi-supervised deep learning network was developed for IC metal package samples in [28]. An IC solder joint inspection approach was suggested in [29] based on a GAN model and statistical training. In [30], a GAN model was used to generate pseudo-defective wafer die images from real defective images. A GAN-based image generation technique for organic light-emitting diode (OLED) panel defect images was discussed in [31].
In this paper, our objective is to produce realistic defective or rejected epoxy substrate images based on the few available such images. To meet this objective, we make use of a large number of defect-free or good epoxy substrate images that are available in order to generate a large number of defective or rejected epoxy substrate images by using CycleGAN to translate non-defective or good images to rejected images. For this purpose, we enhanced the CycleGAN cycle consistency loss function by incorporating other loss functions, which are discussed in Section 3.

Generating Synthesized Defective Images via CycleGAN
The most widely used conditional generative adversarial network for the purpose of unpaired image-to-image translation is called CycleGAN [5]. A typical CycleGAN learns the mapping between two distributions via optimization of an objective function by using two generators and two discriminators. Two losses are incorporated into the CycleGAN optimization framework: adversarial loss and cycle consistency loss. The adversarial loss measures the difference between the generated images and the target images according to the original GAN design [4], and the cycle consistency loss is used to avoid conflicts between the learnt mappings. In our problem, we generate synthesized defective or rejected epoxy drop substrate images from non-defective or good epoxy drop substrate images. Despite the fact that the CycleGAN generates realistic synthetic images, ambiguity mapping occurs when a domain with rich information (i.e., good epoxy drop substrate images) is translated into a domain with relatively weak information (i.e., rejected epoxy drop substrate images). This ambiguity mapping is addressed in this paper by adding the loss functions of LPIPS and the SSIM to the standard cycle consistency loss function of the generator network. More details of these loss functions are stated later in this section.
Two generator networks and two discriminator networks make up the CycleGAN architecture [5]. Adversarial training is carried out on the networks against one another. The generators' objective is to convert an image from one domain to another. The discriminators' objective is to distinguish between real and synthesized images in their respective domains. Figure 2 demonstrates the CycleGAN framework that we utilized to achieve data augmentation of defective or rejected epoxy drop substrate images. The data augmentation model contains the two mapping functions G g→r : Good → Rejected and G r→g : Rejected → Good , and the associated adversarial discriminators D r and D g . Here, D r distinguishes between real rejected epoxy drop substrate images {I r } and synthesized rejected epoxy drop substrate images {I s r } generated from real good epoxy drop substrate images {I g }, and likewise D g distinguishes between {I g } and {I s g }. The total loss function of the CycleGAN can be expressed as a summation of the adversarial losses ( L advers ) and the cycle consistency loss ( L cyc : unpaired image-to-image translation is called CycleGAN [5]. A typical CycleGAN learns the mapping between two distributions via optimization of an objective function by using two generators and two discriminators. Two losses are incorporated into the CycleGAN optimization framework: adversarial loss and cycle consistency loss. The adversarial loss measures the difference between the generated images and the target images according to the original GAN design [4], and the cycle consistency loss is used to avoid conflicts between the learnt mappings. In our problem, we generate synthesized defective or rejected epoxy drop substrate images from non-defective or good epoxy drop substrate images. Despite the fact that the CycleGAN generates realistic synthetic images, ambiguity mapping occurs when a domain with rich information (i.e., good epoxy drop substrate images) is translated into a domain with relatively weak information (i.e., rejected epoxy drop substrate images). This ambiguity mapping is addressed in this paper by adding the loss functions of LPIPS and the SSIM to the standard cycle consistency loss function of the generator network. More details of these loss functions are stated later in this section.
Two generator networks and two discriminator networks make up the CycleGAN architecture [5]. Adversarial training is carried out on the networks against one another. The generators' objective is to convert an image from one domain to another. The discriminators' objective is to distinguish between real and synthesized images in their respective domains. Figure 2 demonstrates the CycleGAN framework that we utilized to achieve data augmentation of defective or rejected epoxy drop substrate images.   Adversarial losses make sure that the generated images appear realistic, and the cycle consistency loss reflects the difference between the original image and the reconstructed or transformed image. The aim is to solve the following optimization problem More details of the loss functions are stated next. Adversarial loss function: For both the mapping functions, the adversarial losses in [4] adopted by the standard Cycle GAN are used. For the mapping function G g→r and its discriminator D r , the optimization problem can be written as where i r ∼ p data (i r ) and i g ∼ p data i g denote the distributions of I r and I g , respectively, and E denotes the expected value over all real data instances. Here, G g→r attempts to generate images G g→r i g that look like I r images, while D r attempts to distinguish between synthesized images I s r ≈ G g→r i g and real images I r . Similarly, for the mapping function G r→g and its discriminator D g , the optimization problem can be written as Cycle consistency loss: Cycle consistency loss converts images back to their original domain, i.e., i g → G g→r i g → G r→g (G g→r i g ≈ i g , known as the forward cycle consistency loss, and i r → G r→g (i r ) → G g→r (G r→g (i r ) ≈ i r , known as the backward cycle consistency loss. Cycle consistency loss is defined as the combination of the following losses: where L F_Cycle and L B_Cycle represent the forward and backward cycle consistency losses, respectively. The cycle consistency loss makes sure that the features of the input images are preserved in the generated images. For the cycle consistency loss, it matters which loss function is used for the cycle consistency loss. In this work, we consider several loss functions separately and in combination for the cycle consistency loss in order to improve the CycleGAN performance for the die attachment problem of interest here. When combining the loss functions, they are normalized so that their contributions to the combined loss function are made equal. A description of the loss functions considered is presented next. L 1 loss function: Additionally, referred to as mean absolute error (MAE) loss, measures the absolute distance between the generated image and the target image. In our case, it is obtained by taking the absolute value of the real good image I g and the reconstructed good image G r→g G g→r i g . It can be expressed as follows: Similarly, the absolute value of the real rejected image I r and the reconstructed good image G g→r (G r→g (i r ) is obtained as follows: L 1 loss function reduces the absolute difference between the images. It is mostly used to capture low-frequency details or to enforce the accuracy of low frequencies. It has been used to compute the cycle consistency loss in the standard CycleGAN [5]. The quality of the generated images can be improved by combining this kind of loss function with another loss function, as discussed in [32]. L 2 loss function: Additionally, referred to as mean squared error (MSE), is obtained by squaring the difference between the generated image and the target image. In our case, it can be expressed as follows: L F Cycle L 2 (i g , G r→g (G g→r i g )) = E i g ∼p data (i g ) i g − G r→g (G g→r i g ) (i r G r→g (G g→r i g )) = E i r ∼p data (i r ) i r − G g→r (G r→g (i r )) 2 2 (9) In [33], L 1 and L 2 losses were compared and no discernible difference between them was found. However, according to [34], L 1 loss is preferred over L 2 loss because it promotes less blurring. Both of these losses represent pixel-wise losses. They consider pixel-by-pixel variations between the images. Even though the images being compared are comparable to the human visual system, there exists a loss in value. Since the computation is based on each pixel, it is not significantly affected if one shifts an image by just one pixel. The total loss value gradually rises as a result of the aggregation of each minor difference between the corresponding pixels of two images. Hence, adding some other loss function to the standard loss functions can help to improve the performance of the model.

Structural similarity index metric (SSIM):
This index has been extensively utilized to assess image quality [7] and has been used as loss function for numerous image processing applications [35,36] as well as for GAN-based solutions [32,37,38]. It was created under the presumption that the human visual system is extremely well suited for sifting through structural data in a visual input. The structural information degradation between a generated image and a corresponding input image is measured by the SSIM. Luminance, contrast, and structure are three sub-indices that make up the SSIM. Luminance is reflected in the local means, contrast in the local standard deviations, and structure in the local Pearson correlation between two images. For an input image x and a reconstructed image y, the SSIM is defined as follows: where µ x and µ y are the mean intensities, σ x and σ y are the variances, and σ xy is the covariance of images x and y. The constants C 1 and C 2 are used to prevent numerical singularity. More information on this index appears in [7]. Here, the SSIM is included as a loss function to produce visually acceptable images. This index can be expressed as follows: Learned perceptual image patch similarity (LPIPS): LPIPS indicates how similar two images appear to the human eye. In essence, LPIPS determines how comparable the activations of two image patches are for a given network. Therefore, we use it here as a loss function. Figure 3 and Equations (13) and (14) show how a pertained network F is used to compute the LPIPS score between a real input image and a reconstructed image.
where F denotes the pertained network with l ∈ L layers for feature extraction, and T normalizes and scales the deep embedding to a scalar LPIPS score. Then, the L 2 distance is computed and averaged across the dimensions and layers of the network. For feature distances, the AlexNet [39] network is used here, which is more in line with the structure of the human visual cortex [6,40].

Model Architecture and Training
For the generator and discriminator, the same architecture described in the standard CycleGAN [5] is used here; see Figure 4. The generator feeds its 128 × 128 input image through three convolutional layers in succession, each of which causes the representation to become smaller with more channels. Afterwards, a set of six residual blocks follows, each with 128 filters. Transpose convolutional layers are used to further enhance the representation for the final image. Apart from the Tanh activation in the last layer for reconstruction, each layer is followed by instance normalization and a rectified linear unit (ReLU) as the activation function. The generated image is 128 × 128 in size. The Markovian discriminator (PatchGAN) [41] is utilized to determine if the image patches are real or synthesized for the discriminator. Five convolutional layers make up the discriminator, which is a fully convolutional network. In order to keep the size of the feature maps at 1/8, the stride is only set to 2 for the first four convolutional layers and the instance normalization along with Leaky ReLU are utilized as the activation function. To preserve the size of the feature maps, the stride of the final output layers is set to 1 and the filter number is set to 1 in order to produce a one-channel prediction map with values ranging from 0 to 1 for every pixel. The discriminator's input is a real or synthesized image having a size of 128 × 128, and the output is 30 × 30 in size. In order to determine if a patch of the input image is real or synthesized, each output pixel corresponds to a patch of the input image.

Model Architecture and Training
For the generator and discriminator, the same architecture described in the standard CycleGAN [5] is used here; see Figure 4. The generator feeds its 128 × 128 input image through three convolutional layers in succession, each of which causes the representation to become smaller with more channels. Afterwards, a set of six residual blocks follows, each with 128 filters. Transpose convolutional layers are used to further enhance the representation for the final image. Apart from the Tanh activation in the last layer for reconstruction, each layer is followed by instance normalization and a rectified linear unit (ReLU) as the activation function. The generated image is 128 × 128 in size. The Markovian discriminator (PatchGAN) [41] is utilized to determine if the image patches are real or synthesized for the discriminator. Five convolutional layers make up the discriminator, which is a fully convolutional network. In order to keep the size of the feature maps at 1/8, the stride is only set to 2 for the first four convolutional layers and the instance normalization along with Leaky ReLU are utilized as the activation function. To preserve the size of the feature maps, the stride of the final output layers is set to 1 and the filter number is set to 1 in order to produce a one-channel prediction map with values ranging from 0 to 1 for every pixel. The discriminator's input is a real or synthesized image having a size of 128 × 128, and the output is 30 × 30 in size. In order to determine if a patch of the input image is real or synthesized, each output pixel corresponds to a patch of the input image. After defining each component of the CycleGAN, the network is trained. A pseudo code of the training is shown in Algorithm 1. CycleGAN offers a clear advantage over utilizing unpaired data. However, our image-generating network is trained using the paired real good epoxy drop substrate images and real rejected epoxy drop substrate images in order to test the generated images in a uniform manner. The batch size is set to 1 After defining each component of the CycleGAN, the network is trained. A pseudo code of the training is shown in Algorithm 1. CycleGAN offers a clear advantage over utilizing unpaired data. However, our image-generating network is trained using the paired real good epoxy drop substrate images and real rejected epoxy drop substrate images in order to test the generated images in a uniform manner. The batch size is set to 1 with 200 epochs and k = 100, which is sufficient for convergence. The model is optimized using the Adam optimizer with β 1 = 0.5 and an initial learning rate of 0.0002 for the first 100 epochs, decreasing the learning rate linearly to 0 for the last 100 epochs. Our CycleGAN with different cycle consistency losses is implemented in Python (environment version 3.8.13) using the TensorFlow (version 2.9.1) and Pillow (version 9.2.0) libraries. Generate m synthetic samples I s r : I g → G g→r i g I s g : I r → G r→g (i r ) 6: Compute adversarial losses // Combination of discriminator loss on both real and fake images L advers G g→r , D r , I g , Update the discriminators D g and D r max : G g→r (i g ) → G r→g (G g→r (i g )) I Cycle r : G r→g (i r ) → G g→r (G r→g (i r )) 9: Compute Cycle Consistency Loss L Cycle G r→g , G g→r = L F_Cycle G g→r + L B_Cycle G r→g /* Different loss functions separately and in combination were used to calculate the cycle consistency losses L F_Cycle G g→r and L B_Cycle G r→g */ 10: Compute total generator loss L G g→r , G r→g , D g , D r = L advers G g→r , D r , I g , I r + L advers G r→g , D g , I r , I g + λL cyc G g→r , G r→g 11: Update the generators G g→r and G r→g min

Experimental Results and Discussion
In this section, first we report our experiments to assess the quality of the generated images based on our introduced enhanced loss function CycleGAN. Then, we report our experiments using a typical image classifier (ResNet18) [42]) to show the identification outcomes with and without using the generated synthesized defective or rejected images.
Our experiments were performed on a server with the 64-bit Windows 10 operating system with two Intel ® Xeon ® 2.40 GHz CPUs and with two NVIDIA Tesla K40 m graphics cards having 256 GB RAM.

Evaluation Metrics
Our goal is to create high-quality synthesized rejected epoxy drop substrate images. This requires quantitative quality evaluation metrics of the synthesized images. The metrics that are often used for this purpose include peak signal-to-noise ratio (PSNR) [7], the Sensors 2023, 23, 4864 9 of 16 universal image quality index (UQI) [43], and visual information fidelity (VIF) [44]. The testing dataset consists of paired images of the same size.
Peak signal-to-noise ratio (PSNR): Given its simplicity and ease of use, the PSNR is the most widely used metric for evaluating synthesized images. The PSNR indicates the difference in pixels between a synthesized and a real image. The quality of the resulting image improves with an increasing PSNR. Equation (15) indicates how the PSNR is computed: where Peak Value denotes the highest value in the image data, and for an 8-bit unsigned integer data type, it is 255. The mean squared error (MSE) between two images is given by with x and y representing, respectively, the real and the synthesized images of size N × N. Equation (15) reflects the absolute error in dB.

Universal image quality index (UQI):
The UQI compares generated synthesized and real images in terms of luminance, contrast, and structure, reflecting the characteristics of the human visual system. It corresponds to the special case of the SSIM when C 1 = C 2 = 0 in Equation (10) and can be written as the product of three components of correlation, luminance distortion, and contrast distortion, as follows: where x = {x i |i = 1, 2, . . . , N} and y = {y i |i = 1, 2, . . . , N} denote the real and the synthesized images, respectively, . The dynamic range of the UQI is [−1, 1]. The best value 1 is achieved if and only if for all i = 1, 2, . . . , N, x i = y i .
Visual information fidelity (VIF): Visual information fidelity (VIF) is a full reference image quality assessment index based on natural scene statistics and the human visual system (HVS). The HVS is used to determine the accuracy of visual information, which includes factors such as the sharpness of edges, the accuracy of color representation, and the ability to detect subtle changes in contrast. VIF measures image fidelity by comparing the information recovered from a real image x with the information lost in a synthesized image y using the HVS. It is a straightforward ratio of the real and the generated images with a value between 0 and 1 and is defined as follows:

Dataset
A dataset of O-shape epoxy drop substrate images was provided to us by Texas Instruments. In the dataset, there were 8850 good epoxy drop substrate images and only 16 rejected epoxy drop substrate images. As explained earlier, this is because defective patterns of epoxy drops rarely occur during production. To ready the dataset for processing, we cropped the region of interest (ROI), having a size of 128 × 128, from the images, as illustrated in Figure 5 (Figure 5a shows ROI cropping of non-defective or good epoxy drop substrate and Figure 5b shows ROI cropping of defective or rejected epoxy drop substrate). We selected 88 good epoxy drop substrate images with different lighting conditions/backgrounds and paired them with rejected epoxy drop substrate images. Some sample non-defective or good epoxy drop substrate images are shown in Figure 6. Additional defective or rejected epoxy drop substrate images were generated by rotation and vertical/horizontal flips for the experiments. Figure 7 shows the 16 rejected real epoxy drop substrate images.
we cropped the region of interest (ROI), having a size of 128 × 128, from the images, as illustrated in Figure 5 (Figure 5a shows ROI cropping of non-defective or good epoxy drop substrate and Figure 5b shows ROI cropping of defective or rejected epoxy drop substrate). We selected 88 good epoxy drop substrate images with different lighting conditions/backgrounds and paired them with rejected epoxy drop substrate images. Some sample non-defective or good epoxy drop substrate images are shown in Figure 6. Additional defective or rejected epoxy drop substrate images were generated by rotation and vertical/horizontal flips for the experiments. Figure 7 shows the 16 rejected real epoxy drop substrate images.   we cropped the region of interest (ROI), having a size of 128 × 128, from the images, as illustrated in Figure 5 ( Figure 5a shows ROI cropping of non-defective or good epoxy drop substrate and Figure 5b shows ROI cropping of defective or rejected epoxy drop substrate). We selected 88 good epoxy drop substrate images with different lighting conditions/backgrounds and paired them with rejected epoxy drop substrate images. Some sample non-defective or good epoxy drop substrate images are shown in Figure 6. Additional defective or rejected epoxy drop substrate images were generated by rotation and vertical/horizontal flips for the experiments. Figure 7 shows the 16 rejected real epoxy drop substrate images.

Results and Discussion
We started our experiments by training the model using the standard cycle consistency loss (i.e., ) along with the other loss functions (i.e., , SSIM and LPIPS) separately as well as in combination. Then, we generated realistic synthesized images after training the model. Figure 8 shows some sample outcomes of the generated synthesized rejected epoxy drop substrate images using different loss functions as the cycle consistency loss, separately and in combination with the CycleGAN standard loss function. Table 1 shows the evaluation metrics for different loss functions as the cycle consistency loss (ℒ ). As the

Results and Discussion
We started our experiments by training the model using the standard cycle consistency loss (i.e., L 1 ) along with the other loss functions (i.e., L 2 , SSIM and LPIPS) separately as well as in combination. Then, we generated realistic synthesized images after training the model. Figure 8 shows some sample outcomes of the generated synthesized rejected epoxy drop substrate images using different loss functions as the cycle consistency loss, separately and in combination with the CycleGAN standard loss function. Table 1 shows the evaluation metrics for different loss functions as the cycle consistency loss (L Cycle ). As the dataset utilized was made up of paired image data, all the generated synthesized rejected images (i.e., I s r ) exhibited a relatively uniform reference material or real rejected epoxy drop substrate images (i.e., I r ). This table shows the averages and standard deviations of the metrics for the rejected images I r and their generated counterpart, i.e., the generated synthesized images I s r translated from the good epoxy drop substrate images I g using the CycleGAN network with different loss functions as the cycle consistency loss. The number of generated images for each loss function separately and in combination was 1408.   From Table 1, one can see that the L 2 loss received the lowest score for all the metrics when used alone and performed better when used in combination with LPIPS and the SSIM. The standard cycle consistency loss function L 1 loss performed worse than both the SSIM and LPIPS. These findings correlate with the visual examination of the images shown in Figure 8. We also found that combining the SSIM and LPIPS with L 1 separately improved their scores, but combining all three together (i.e., L 1 + SSIM + LPIPS) gave the best results and visually looked more realistic and similar to the real rejected images.
Furthermore, from the results of Table 1, one can see that L 1 performed better than L 2 and the output generated from the L 2 function had the blurring effect (see Figure 8), as mentioned in [34]. Additionally, combining L 1 and L 2 with the other loss functions also improved the performance of the model, as noted in [32]. It can be seen that applying the SSIM for L 1 and L 2 increased the performance of the model. LPIPS as the loss function was found to work better independently as well as in combination since it enhanced the image quality and helped to generate more realistic images, as noted in [45].

Identification Metrics
The image classifier ResNet18 [42] was used here as a typical classifier to show the impact of the generated images when performing defect identification. As normally done for classification problems, the confusion matrix, precision, recall, and accuracy of the classifier were found with and without using the generated images. Table 2 shows a depiction of the confusion matrix with precision, recall, and accuracy denoted by where TP (true positive) indicates when a rejected image is placed in the defective or rejected class, TN (true negative) indicates when a good or non-defective image is placed in the non-defective or good class, FP (false positive) indicates when a non-defective or good image is placed in the defective or rejected class, and FN (false negative) indicates when a defective or rejected image is placed in the non-defective or good class. To ready the dataset for the classification experiments, we randomly selected 1400 real good epoxy drop substrate images and labeled them as non-defective or good. For the defective or rejected class, the 16 available real defective or rejected epoxy drop substrate images were used to generate 2800 synthesized defective or rejected epoxy drop substrate images (1400 images were generated by the standard loss function CycleGAN and 1400 images were generated by our enhanced loss function CycleGAN). Then, these datasets were divided into 60% training, 20% validation, and 20% testing subsets with no overlap among them.

Identification Outcomes
While keeping the same non-defective or good class images the same, for the defective or rejected class, the classifier ResNet18 was trained in three different ways, as follows: 1.
By using real rejected epoxy drop substrate images; 2.
By using rejected epoxy drop substrate images and generated rejected epoxy drop substrate images based on the standard loss function CycleGAN; 3.
By using rejected epoxy drop substrate images and generated rejected epoxy drop substrate images based on our enhanced loss function CycleGAN.
Then, the above trained models were tested via the same testing data subset whose rejected class consisted of a combination of real and generated rejected epoxy drop substrate images. Table 3 shows a comparison of the identification outcomes. As can be seen from this table, the addition of the synthesized images significantly improved the identification outcome. Furthermore, our enhanced loss function CycleGAN provided a higher identification outcome compared to the standard loss function CycleGAN.

Conclusions
In this paper, the loss function of the generative adversarial network of CycleGAN was enhanced or modified to generate high-quality defective epoxy drop images for die attachment in IC manufacturing. Such images are needed for the purpose of training or testing vision-based deep neural network inspection systems. A CycleGAN network with different cycle consistency loss functions was designed to generate different sets of synthesized images. Based on three evaluation metrics, it has been shown that by incorporating the loss functions of learned perceptual image patch similarity (LPIPS) and the structural similarity index metric (SSIM) into the standard CycleGAN loss function, more realistic or higher-quality synthesized epoxy drop images are generated as compared to using the CycleGAN standard loss function. Furthermore, it has been shown that our enhanced loss function CycleGAN as a data augmentation approach leads to improved identification outcomes when using a typical image classifier. The enhancement approach developed in this paper is general purpose in the sense that it can be applied to other data augmentation scenarios involving other types of images.

Funding:
The work presented in this paper was supported by a grant from Texas Instruments (project number 1416) to the University of Texas at Dallas.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data available on request due to privacy restrictions. Contact the authors.