Conditional TransGAN-Based Data Augmentation for PCB Electronic Component Inspection

Automatic recognition and positioning of electronic components on PCBs can enhance quality inspection efficiency for electronic products during manufacturing. Efficient PCB inspection requires identification and classification of PCB components as well as defects for better quality assurance. The small size of the electronic component and PCB defect targets means that there are fewer feature areas for the neural network to detect, and the complex grain backgrounds of both datasets can cause significant interference, making the target detection task challenging. Meanwhile, the detection performance of deep learning models is significantly impacted due to the lack of samples. In this paper, we propose conditional TransGAN (cTransGAN), a generative model for data augmentation, which enhances the quantity and diversity of the original training set and further improves the accuracy of PCB electronic component recognition. The design of cTransGAN brings together the merits of both conditional GAN and TransGAN, allowing a trained model to generate high-quality synthetic images conditioned on the class embeddings. To validate the proposed method, we conduct extensive experiments on two datasets, including a self-developed dataset for PCB component detection and an existing dataset for PCB defect detection. Also, we have evaluated three existing object detection algorithms, including Faster R-CNN ResNet101, YOLO V3 DarkNet-53, and SCNet ResNet101, and each is validated under four experimental settings to form an ablation study. Results demonstrate that the proposed cTransGAN can effectively enhance the quality and diversity of the training set, leading to superior performance on both tasks. We have open-sourced the project to facilitate further studies.


Introduction
A printed circuit board (PCB) is the carrier of many electronic components in electronic products. Electronic components must be assembled according to categories and in the right positions during the manufacturing of electronic products. In practice, the recognition and positioning of electronic components on PCBs has been the key technology in the manufacturing and assembly of electronic products. Tere have been mainly three types of recognition methods for printed circuit board (PCB) electronic components: traditional artifcial visual detection methods, machine vision-based detection methods using image processing, and deep learning-based detection methods. Some other novel detection methods have also been developed, for example, automated x-ray detection and laser detection systems. However, they all have the shortcomings of high cost and failure rate, as well as slow detection. Traditional object detection techniques basically consist of three steps: frst, identify the candidate region by using a sliding search window with diferent scales, next, retrieve the visual features of the candidate region, such as Haar features or HOG features, and lastly, classify the regions. Tere are inherent shortcomings in traditional object detection methods: (1) using sliding windows for region identifcation and selection has the problem of weak pertinence, high time complexity, and lots of redundant windows; (2) manually designed features are subjective and rely on an individual's prior knowledge, and the process of detection is cumbersome; (3) the methods are timeconsuming, which cannot meet the needs of real-time detection.
In recent years, the automatic optical inspection technology (AOI) [1] has been used to detect PCB defects during its manufacturing. Compared with traditional manual inspection, it has multiple advantages, such as fast detection, low cost, and high accuracy. During the past decade's evolution of AOI technology, there are mainly three categories of methods: reference comparison, nonreference comparison, and mixing. Te reference comparison method involves matching a given image based on a given standard target sample image, fnding regions in the given image that have a high correlation with the target sample image and then using an algorithm to align the edge contours as closely as possible to achieve the target frame. Te main challenge of this method is to align the reference image and the test image precisely, which requires a complex confguration process. Meanwhile, light and noise greatly impact the detection process, which can easily cause false alarms [2].
PCB electronic component identifcation tasks mainly aim to detect diferent types of capacitors, optocouplers, and diodes. Traditional artifcial visual inspection methods and image processing-based machine vision detection methods have problems such as low accuracy, poor generalization ability, poor robustness, and lacking compatibility with multiple PCB electronic components. Eventually, they cannot meet the needs of manufacturing. Deep learning-based object detection methods have become the mainstream in the feld. Deep learning methods demonstrate an advantage in automatic feature extraction. Typical object detection algorithms based on two-stage model of region detection classifcation include R-CNN [3], SPP-Net [4], Fast R-CNN [3], and Faster R-CNN [5]. Typical object detection algorithms based on regression single-stage models include SSD [6], RetinaNet [7], and YOLO [8]. Te main challenges and difculties of the identifcation of printed circuit board (PCB) electronic components are as follows: frst, there are many types of PCBs and different design rules on the market; second, electronic components on the PCB and its features are complex and diverse; third, the PCB electronic component industry lacks a large number of samples of diferent types, resulting in data imbalance in traditional methods. Terefore, it is practically signifcant to design a method to expand the samples and increase the accuracy of the DNN prediction model.
In this paper, we propose a deep-learning pipeline for PCB electronic component inspection. Te core technical contribution is conditional TransGAN (cTransGAN) utilized for data augmentation. Te proposed cTransGAN, after extensive training, can generate high-quality synthetic PCB components and defect samples used to enhance the quantity and diversity of the original training set, leading to impressive performance gains in mean average precision (mAP). cTransGAN is featured with a TransGAN generator and discriminator, both conditioned on the class embeddings, which are latent representations of the classes in the dataset. Tese embeddings, served as inputs, can efectively guide the generator to produce an image belonging to a desired class; meanwhile, the discriminator is also guided to better distinguish real and generated images given the desired class.
To validate the proposed method, we conduct extensive experiments on two datasets, including a self-developed dataset for PCB component detection and an existing dataset for PCB defect detection. Also, we have evaluated three existing object detection algorithms, including Faster R-CNN ResNet101, YOLO V3 DarkNet53, and SCNet ResNet101, and each is validated under four experimental settings to form an ablation study. Results demonstrate that the proposed cTransGAN can efectively enhance the quality and diversity of the training set, leading to superior performance on both tasks. Te code of this project is available at https://github.com/long-deep/pcb-detect.
Te rest of this paper is organized as follows. Section 2 reviews research work related to PCB object detection and data augmentation. Section 3 explains our proposed model and dataset. In Section 4, several comprehensive experiments are conducted to evaluate the efectiveness of the proposed model. Finally, in Section 5, we conclude the paper and provide future work.

Object Detection and PCB Electronic.
In recent years, computer vision has made signifcant progress in object detection [9,10], which has advanced the development of autonomous vehicles [11], robotics [12], and many other practical applications. Te networks have achieved reliable performance, with stable, easy-to-use, open-source implementations [13] published. Tese implementations are also well documented, which is convenient for researchers to fne-tune their pretrained models for specifc tasks. However, almost all object detection networks need to be trained on large-scale datasets to obtain good performance. Unfortunately, for the PCB component detection tasks, it is expensive to build a large-scale dataset to fne-tune such detection networks. In addition, due to intraclass variance, there is inherent ambiguity in classifying components. Terefore, we have studied methods that utilize the inherent structure in the data (that is, within the PCB board), which cannot be achieved by traditional detection methods.
As well known, there are diferent categories of electronic components with diferent shapes. CNN simulates the brain's visual cognition principles. Trough dimensionality reduction, CNN retains the characteristics of the object even if the object reappears in a diferent scale, direction, and position. Terefore, CNN can be applied for the detection of electronic components. In [14], a novel graphical network block is proposed to refne the component features on each PCB. It can reach a 65.3% mAP of electronic component detection on the testing PCBs [14]. In [8], an improved YOLOv3 algorithm is proposed, adding an output layer that is sensitive to small objects. Te paper also verifes the effectiveness of the algorithm in both real PCB pictures and virtual PCB pictures, which include many PCB electronic components. In [15], a fast object recognition method is proposed, which combines YOLO-V3 and Mobilenet. Tey use MobileNet to replace Darknet53, the original architecture in YOLOv3, to achieve lightweight and fast speed. However, a common issue in the above CNN-based electronic component detection methods is that the dataset is limited, which cannot enable CNN to learn PCB electronic components well, resulting in low accuracy of PCB component recognition. In this research, we build our dataset, including images of the same type of electronic components in four ways (up, down, left, and right), which provides the model with more accurate recognition data.

Data Augmentation Techniques.
In industrial applications, the prediction accuracy of a deep learning model mainly relies on the size and quality of the training samples. Te collection of samples for electronic component recognition takes a long time and is even difcult to obtain. Te generative adversarial network (GAN) [16], as a generative model, can generate new synthetic instances of data that follow rather similar, if not exactly the same, distribution of real samples through continuous confrontation between the generator and the discriminator. At present, GAN has been widely used in diferent areas including image generation, style transfer, and many other felds [17][18][19]. Due to the limited size of the training dataset and the ambiguity of unknown electronic components, identifcation of unknown electronic components is still a challenging task. Deep learning-based image recognition usually requires lots of sample images for training. Data augmentation techniques should be adopted when there are limited images available for training. Based on this consideration, Abayomi-Alli et al. proposed an image augmentation technology [20] based on the random permutation of coefcients of within-class principal components obtained after applying principal component analysis (PCA). After reconstruction, the newly generated replacement image is used to train the deep network. Experimental results show that the method can improve classifcation accuracy and classifcation ambiguity in applications [20]. In the case of a small dataset, data augmentation has always been an efective method to reduce overftting. Even though there are already a variety of augmentation techniques, such as horizontal fip, random crop, and Mixup, they are not suitable for object detection tasks because of the lack of labeled bounding boxes information for corresponding generated images. To address this issue, in [21], an unsupervised data augmentation framework using GAN is proposed. Te authors proposed a twostep pipeline based on YOLOv4, which enables the generation of an image with the object lying in a certain position.
Recent advances attempt to explore the potential of generative models for image augmentation to address the issue of low training resources that has been a long-lasting challenge to train a deep learning model with satisfying generalization ability [22]. Compared to traditional augmentation methods, mostly based on image processing techniques, generative models such as GANs can capture the semantic features of images used for training and generate similar but diferent images to enhance the quantity and diversity of training data. Such capability of GANs has driven its usage in image augmentation for various computer vision tasks, including classifcation [23,24], object detection [25,26], and semantic/image segmentation [27][28][29][30]. Tese prior studies have validated the efectiveness of GANs as an image augmentation technique. Te way to use a GAN-based augmentation model in the proposed method is similar to the ones in the literature. In other words, a collection of training data in the original dataset are utilized to train a GAN; then, the generator of the GAN can generate synthetic images that can be selectively added to the augmented dataset. Te key diference between our work and the prior eforts is the proposed cTransGAN model, which inherits the merits of two powerful models, namely, cGAN [31] and TransGAN [32]. Experimental results on two datasets can demonstrate the superiority of the proposed method, compared to other GAN-based competitors in the area of image augmentation.

Dataset.
We chose two datasets to validate the proposed method. Te frst one is a self-developed dataset for PCB component detection, and the second one is an existing dataset for PCB defect detection. We aim to verify that the proposed method can achieve superior performance on both tasks.

Self-Developed Dataset
(1) Dataset Basics. Our dataset includes 2544 image samples that are divided into 3 categories: capacitors, diodes, and optocouplers. Tere are 1349 images of optocouplers, including 504 images for large optocouplers IC 1, 372 images for medium optocouplers IC100, and 473 images for small optocouplers. Tere are 799 images of capacitors, including 400 images for large capacitors and 399 images for small capacitors. Tere are 396 images of diodes. Te basic information of the PCB dataset is illustrated in Table 1. Te column "Original Dataset" explains the number of sample images for each category of electronic components. Te next column "Portion in Original Dataset" explains the proportion of samples of each category in the original dataset. Te next three columns provide the number of image samples of each category in the training set, the test set, and the generated dataset.
(2) Dataset Acquisition. Our PCB images are acquired using a BASLER camera, a2A5320-7gcBAS; an OPT lens, C1616-10M; and the light source by Haoli, HLFL478408K-K50. Te height of both the lens and camera is 460 mm.
(3) Dataset Preprocessing. To alleviate complex calculation during training, we use the color image resolution fxed flling algorithm to process the images. Te specifc steps are Computational Intelligence and Neuroscience as follows: the X and Y values (i.e., width and height) of the original image are obtained and compared with a predefned value (here 418); the image is then adjusted based on the comparison, where there are four cases to consider: Te four cases are illustrated in Figure 1. Also, Figure 2 shows three samples, representing the three classes considered in this dataset. [33] is a dataset that contains 1,500 imagepairs, each of which includes a defectfree template image and an aligned testing image with annotations that include the positions of the six most common PCB defects: open, short, mousebite, spur, pin hole, and spurious copper.

Te DeepPCB Dataset. DeepPCB
All of the images in this dataset were captured using a linear scan CCD with a resolution of around 48 pixels per 1 millimetre. Te defect-free template images are created by manually inspecting and cleaning sampled images in the manner described previously. Te original size of the template and the image that was tested is approximately 16 k × 16 k pixels. Once this is done, the images are chopped into many smaller subimages of the same size as their parent image and aligned using template-matching techniques. Following that, a carefully chosen threshold is used to utilize binarization in order to avoid illumination disturbance. Although preprocessing algorithms can difer depending on the specifc PCB defect detection algorithms used, the image registration and thresholding techniques used for high-accuracy PCB defect localization and classifcation are a common procedure used for PCB defect localization and classifcation.
Due to the fact that the real tested image has just a few defects, the authors augment the image by manually adding defects to each tested image in accordance with the PCB defect patterns, resulting in around 3 to 12 defects in each 640 × 640 image. After the annotation, the dataset is split into a training set with 1000 images and a test set with 500 images. Figure 3 shows the number of the six defect classes for both training and test sets in DeepPCB. It is noted that the classes are relatively balanced in terms of quantity. We generated a total of 800 synthetic images with 600 instances for each defect class spread across the generated samples. Figure 4 shows several annotated samples in DeepPCB.

System
Overview. As shown in Figure 5, there are two stages in the workfow: data augmentation using TransGAN [32] and electronic component recognition using Faster R-CNN, YOLOv3, and SCNet [34]. TransGAN is an unsupervised deep learning method. Its generator is designed to be memory-friendly and consists of multiple stages, and each stage is formed by stacking several transformer encoders. In this paper, we use synthetic images generated by TransGAN and the annotations of the source images to augment the training dataset for Faster R-CNN, YOLOv3, and SCNet detectors.

Conditional TransGAN-Based Data Augmentation.
In this subsection, we frst provide a brief introduction of GAN, cGAN, and TransGAN, followed by a detailed description of the proposed cTransGAN.

GANs.
Te vanilla GAN consists of two neural networks: a generator and a discriminator. Te generator takes a random vector as input and attempts to create a synthetic data point that resembles a true sample from the original dataset. Te discriminator, on the other hand, is trained with both real and fake samples and predicts whether a particular sample is real or not. To optimize the parameters, the prediction result is back-propagated via both networks.
cGAN improves on the vanilla GAN by conditioning the model on auxiliary information (e.g., class labels or y) to direct data production, allowing for more control over data modalities. Conditioning can be done by combining the generator and discriminator with a layer to generate a combined representation of x and y. Te generator learns   Computational Intelligence and Neuroscience more semantic properties of a sample provided y after injecting y. TransGAN [32] is an unsupervised deep learning method and uses a pure transformer with no convolution. Multiple transformer encoder blocks are utilized as building blocks for the discriminator and generator in TransGAN. A transformer encoder [35] is composed of a multihead selfattention component to obtain the long-term dependence between words in the sentence and the contextual semantic information. Even though the transformer was originally designed for natural language processing systems, it has been adopted in computer vision [36] areas. To mimic the sequential input required by the original transformer, the vision transformer (ViT) divides an input image into a collection of patches, which is the basis of TransGAN. Also, to reduce the memory cost caused by the numerous image patches, TransGAN develops a multistage memory optimization strategy to gradually increase/decrease the image resolution. Furthermore, TransGAN integrates a grid selfattention module, which converts an entire feature map to a grid of nonoverlapping feature patches. Next, it uses the local attention to replace the global attention in the grid, which greatly reduces the amount of calculation. Figure 6 shows the architecture of cTransGAN, which can be divided into three parts. First, we change the detection head of the TransGAN discriminator to output a tensor of size N, where N is the number of classes in the dataset. Tis way, the discriminator is treated as a classifer and trained using the original training set. Te trained model is then used to produce an embedding for each class of samples. Specifcally, the original training set is divided into multiple sets by class. We feed samples in class i into the trained model sequentially and extract the feature map from the last layer before the detection head and use it as the embedding, denoted by e i , to represent the class i samples. One major diference between the proposed method and cGAN is that we adopt the class embedding, rather than the label y, as an additional input of the GAN. Te idea is inspired by the way how Word2Vec generates word embeddings. Te strategy has been empirically effectively during training. After the embeddings are produced, they serve as inputs to guide the training of cTransGAN. As shown in Figure 6(c), the embedding for class i images, e i , is concatenated with a linearized real image of class i, and the result of concatenation is fed into the original TransGAN discriminator. Similarly, the concatenation of e i and the noise vector z are fed into the original TransGAN generator, which aims to output a generated image of class i. Both the output representations of the cTransGAN discriminator and generator stay unchanged. Also, the internal neural structure and the optimization scheme remain unchanged.

cTransGAN.
Conditioned on the class embedding e i , the trained cTransGAN can generate high-quality synthetic images belonging to class i. Te model can be trained end-to-end and speed up the process of data augmentation.

Data Augmentation via GAN. A trained cTransGAN
can be used to produce high-quality synthetic samples that look similar to the real samples. For the object detection task, the process involves the following steps. First, the marked bounding boxes in an annotated sample are warped, rescaled into the same size, and saved as images. Tis way, a collection  of labeled images can be gathered to train a cTransGAN. After that, the trained cTransGAN generator can produce synthetic defect images that can be plugged back into the original sample image where the bounding boxes were warped from. Terefore, an augmented sample with synthetic defect annotations can be produced. For the task considered in DeepPCB, images belonging to the same class present clear patterns, which can be efectively captured by the cTransGAN.

Models.
Tis subsection covers the models evaluated in this study for object detection.

YOLOv3
. YOLO (you only look once) is a fast object detection algorithm. As a good option for real-time detection without sacrifcing too much accuracy, it can provide fast detection even though it is no longer the most accurate algorithm. It identifes a specifc object in videos, images, or live feeds. YOLOv3 (YOLO version 3) [20] published in 2018, is the third improved version of YOLO, which is also the model we choose to use in this research. It uses Darknet-53 [9] instead of Darknet-19 [10] as its backbone network for feature extraction, inspired by SSD and ResNet [8]. Te framework of Darknet-53 is shown in Figure 7. It is mainly composed of convolutional and residual structures. As illustrated in Figure 6, neither pooling layer nor fully connected layer is found in Darknet-53. Specifcally, the last three layers: avgpool, connected, and softmax layer are used for classifcation training on the ImageNet dataset. Te main components of Darknet-53 are 3 × 3 and 1 × 1 flters. It has 53 convolutional layers, more powerful and more efcient than the previous 19. Tere are residual connections, just as in ResNet. In forwarding propagation, changing the step size of the convolution kernel will change the size of the tensor. For example, Stride � 2 is corresponding to changing the length of the image to be half of it. YOLOv3 splits the image into seconds × small cells. Each grid unit predicts three components: (1) the coordinates of the B bounding box (x, y, w, h), (2) the confdence score P (object), and (3) C conditional class probability, which is conditional based on the presence of an object in the grid cell. YOLOv3 makes predictions over three diferent scales and uses nine anchor boxes, three for each scale. Tere could be a loss of precision in small structures because of low-resolution 3D methods. In this research, YOLOv3 is trained with 450 epochs and with a decreasing learning rate.

SCNet.
SCNet can establish a convolutional neural network structure of semantic correspondence between images of diferent instances of the same object or scene category. It is used to learn geometrically plausible models for semantic correspondence. In SCNet, regional proposals are used as matching primitives, and geometric consistency is explicitly added to its loss function. Image pairs obtained from the PASCAL VOC 2007 keypoint dataset are used to train SCNet. In this research, we use ResNet101 [37] as the backbone of SCNet for training. ResNet101 [37] is residual network with 101 layers. Each layer is composed of an identity block and convolution block. Also, the skip connections in ResNet allow alternate shortcut for the gradient to fow through. It also allows the model to learn identity functions which make sure the higher layer will perform as good as or better than the lower layer, but not worse.  Computational Intelligence and Neuroscience various CNN technologies, the main advantage of Fast R-CNN is that it uses selective search to generate region proposals, which greatly saves time and improves the accuracy of object detection. Faster R-CNN uses the regional proposal network (RPN) to replace the selective search module in Fast R-CNN, which further improves the time efciency and accuracy of object detection. Te regional proposal network (RPN) uses a fully convolutional network to generate rectangular object proposals from the dataset images. It uses the input image to create anchor points or region boxes and predicts whether an anchor is in the background or the foreground. It then selects the area frame with the most region proposals as the optimal proposal. It improves the efciency of regional proposals and accurately detects objects. Te task is to mark the anchor with the highest overlap with the ground truth box as the foreground and the anchor with the lowest overlap as the history. Terefore, each anchor is considered to be a foreground or background with a predicted label. After RPN, a region proposal can be obtained with feature maps of various sizes. However, it is difcult to process feature maps of diferent sizes.

Faster R-CNN. R-CNN
Faster R-CNN uses a region proposal network faster than R-CNN. In R-CNN, pixel-level region proposals are used as input while in Faster R-CNN, feature map-level region proposals are taken as input. As explained in [37], using the combination of Faster R-CNN and ResNet101 can improve the performance of the network. Te framework of Faster R-CNN is illustrated in Figure 8

Experiment Settings.
We conduct experiments on a computer with Windows 10, which is equipped with 16 GB RAM and Intel Core i7-8700 CPU. Te TensorFlow framework and Nvidia GeForce RTX 2070 GPU are used to train the DCNN model. Te program uses Python 3.6.7.

Model Training.
To train TransGAN, we use the Adam optimizer, a batch size of 64 for both the discriminator and the generator, and a learning rate of 0.0001. Te cTransGAN model has been trained for 240 epochs. We train Faster R-CNN ResNet101, YOLO V3 DarkNet-53, and SCNet ResNet101 models on the original training set and the augmented training set. Te three deep learning models combined before or after using data augmentation, and there are 12 models in total for comparative study, as shown in Table 2 in Subsection 4.4. An image is taken as the input for each training model that can detect the bounding boxes of the detected PCB objects within the image. Each bounding box consists of the predicted category and confdence. Te chosen hyperparameters include a momentum of 0.7, a verifcation period of 4000, a learning rate of 0.004, a weight decay of 0.0004, a batch size of 32. Te training was conducted for 2000 epochs.

Evaluation Metrics.
Te mean average precision (mAP) is the primary performance metric, which is also a commonly used performance indicator in object detection. In addition, we consider four classifcation metrics, including accuracy (Acc), recall (Rec), specifcity (Spe), and F1-score (F1). Te mAP is calculated based on Intersection overUnion (IoU). IoU is a metric to evaluate how similar our predicted bounding box is to the ground truth bounding box. It is the ratio of the intersection area and the combined area between the prediction and the ground truth bounding box. Normally, if IoU > 0.5, it is a true positive (TP); otherwise, it is a false positive (FP). Furthermore, if (1) no detection at all or (2) detection of IoU > 0.5 but the object being misclassifed, it is a false negative (FN). Precision (Pre) is the ratio of true positives (TP) to the total number of predicted positives. Recall (Rec) is the ratio of TP to the total number of ground truth (1) Results on the Self-Developed Dataset. Table 2 shows the results of the three models under four experimental settings in mAP. We have listed mAP for each object class as well as the overall mAP, i.e., the last column. Te highest score for each column is marked in bold. We have the following fndings.   Table 2 gives the metrics for the best models trained by the Faster R-CNN ResNet101, YOLOV3 DarkNet, and SCNet ResNet101 models using diferent enhancements on the two dataset tasks, respectively, where the metrics include the AP value and the mean AP value (mAP) for each subtarget. Te range of AP values is from 0% to 100%, with higher values demonstrating better detection of the target. As can be seen in Tables 2, the detection results using cTransGAN are almost always optimal.
(i) IPAug was slightly worse than the GAN-based augmentation methods for all three models. However, the diference between IPAug and TransGAN was minor, with a gap less than 1%. (ii) Te adopted four settings for each model form an ablation study that evaluates the efect of each added module.  Table 3, where the frst three sections cover the results of the three models used in this study, and the last section shows the SOTA. Similar to the self-developed dataset, each model has been validated with four settings. We provide the observations as follows.
(i) Similar observations were found for IPAug, which presented comparable scores with TransGAN but was worse than cGAN and cTransGAN. Our experiments can validate that GAN-based augmentation methods are generally superior to IPAug. (ii) Despite the diferences in task characteristics and object patterns between the two datasets, the GANbased data augmentation strategy has consistently improved the mAP for both tasks. In the DeepPCB dataset, the gains brought by cGAN and TransGAN are in the range of 0   Table 3 gives the metrics for the best models trained by the Faster R-CNN ResNet101, YOLOV3 DarkNet, and SCNet ResNet101 models using diferent enhancements on the two dataset tasks, respectively, where the metrics include the AP value and the mean AP value (mAP) for each subtarget. Te range of AP values is from 0% to 100%, with higher values demonstrating better detection of the target. As can be seen in Tables 3, the detection results using cTransGAN are almost always optimal.

Conclusion
PCB is prone to open circuit, short circuit, or magnetic leakage during manufacturing. In order to automate the identifcation of PCB electronic components, we have established an image dataset that includes three categories of PCB components, namely optocouplers, capacitors, and diodes. In addition, we propose cTransGAN to generate synthetic samples, which effectively enhance the scale and diversity of the original training set. Tree deep learning models, including Faster R-CNN ResNet101, YOLO V3 DarkNet-53, and SCNet ResNet101, are trained and evaluated on the datasets. We also design plenty of comparative experiments to verify the efectiveness of object detection. Te results have demonstrated that the augmentation method based on cTransGAN makes the image samples more diversifed, so that the models can capture better semantic features, thereby obtaining signifcant performance improvement. Based on the experimental results, with data augmentation using cTransGAN, SCNet ResNet101 achieves the best detection accuracy. In addition to the self-developed dataset, we also evaluated cTransGAN on DeepPCB, a dataset for PCB defect detection, and similar observations can be found as well.
In summary, the superiority of cTransGAN has been validated on two datasets to handle two diferent PCB object detection tasks.
Meanwhile, there are some limitations in this study, which will be addressed in future work. First, classic image processing-based augmentation can be used together with cTransGAN-based data augmentation to quickly obtain larger number and greater diversity of datasets. It will be interesting to explore the individual and combined efects of these two types of augmentation, and how they complement each other to maximize the benefts of augmentation. Second, this paper only studies cTransGAN architecture, because its performance in generating high-quality synthetic images has been verifed. As a future work, many GAN options can be studied. Tird, the samples generated by the GAN-based augmentation need to be manually selected and labeled before they can be used for training, which is very time-consuming. It is worthwhile to develop additional supporting algorithms to facilitate the application of the generated samples.

Data Availability
Te data used to support the fndings of this study are available at https://github.com/long-deep/pcb-detect and https://github.com/tangsanli5201/DeepPCB.

Conflicts of Interest
Te author(s) declare(s) that there are no conficts of interest regarding the publication of this article.