Applying deep learning to defect detection in printed circuit boards via a newest model of you-only-look-once

In this paper, a new model known as YOLO-v5 is initiated to detect defects in PCB. In the past many models and different approaches have been implemented in the quality inspection for detection of defect in PCBs. This algorithm is specifically selected due to its efficiency, accuracy and speed. It is well known that the traditional YOLO models (YOLO, YOLO-v2, YOLO-v3, YOLO-v4 and Tiny-YOLO-v2) are the state-of-the-art in artificial intelligence industry. In electronics industry, the PCB is the core and the most basic component of any electronic product. PCB is almost used in each and every electronic product that we use in our daily life not only for commercial purposes, but also used in sensitive applications such defense and space exploration. These PCB should be inspected and quality checked to detect any kind of defects during the manufacturing process. Most of the electronic industries are focused on the quality of their product, a small error during manufacture or quality inspection of the electronic products such as PCB leads to a catastrophic end. Therefore, there is a huge revolution going on in the manufacturing industry where the object detection method like YOLO-v5 is a game changer for many industries such as electronic industries.

YOLO-v5 is about 27 megabytes and the weight file for YOLO-v4 is 244 megabytes. YOLO-v5 is nearly 90% smaller than YOLO-v4. This means YOLO-v5 can be deployed to embedded devices much more easily.

PCB data set
Initially, data is collected from the AVI machine, the AVI machine provides an RGB format of the panel image of 4 different cameras. Images from 4 different cameras are extracted separately and then further process by cropping and testing which is done separately on the 4 different cameras. So initially, the R, G, and B image is combined and cropped to the exact location of the defect that is provided by AOI machine in a text file. The images is cropped to a size of 400 × 400 of the location and saved. This method is used to collect around 23,000 defective PCB images. After collecting the images, they are labelled using a tool created for the quality inspection engineers. This tool is designed to extracts the images from AVI and AOI machine of defective PCB with coordinates and label the defects. The quality inspection engineer labels the defective region with box shape around the defective region and label it as DEFECT eventually the location of two corners of the box which is x1y1 and x2y2 are saved in txt format which is used for the training process. After collecting and labeling the images, three different models like YOLO-v5 small, YOLO-v5 medium, and YOLO-v5 large are trained and their results are compared.
A total of 23,000 images are used in this experiment which is divided into 20,700 images for training and 2300 images for testing purposes. After training for all the three models using 10 crossvalidation method, 30 models are generated and tested. The quality control testing procedure is fully automated with a user interface that provides the time and number of negative and OK images. This visualization user interface is effective and helps in monitoring the testing process without human interaction. The interface is developed using Python and Tkinter library. Figure 1 shows a snapshot of the developed interface used by the quality inspection operators.

Architecture of YOLO-v5
The structure of YOLO-v5 and YOLO-v4 very similar, but there are some differences. Figure 2 shows YOLO-v5 network structure diagram, it is still divided into 4 parts, namely: Input, Backbone, Neck, Prediction. The input end of YOLO-v5 uses the same Mosaic data enhancement method. During the usual project training, Arithmetic progression with small target Generally lower than the medium and large goals. Our data set also contains a large number of small targets, but the more troublesome is the distribution of small targets Uneven. There are several advantages, Rich data set, Random use of pictures, Random scaling, and random distribution for splicing, which greatly enriches the detection data set, especially random scaling adds a lot of small targets, making the network more robust. Reducing GPU, some people may say that random scaling, ordinary data enhancement can also be done. But the author considers that many people may only have a GPU, so Mosaic enhancement training can directly calculate the data of 4 pictures such that the Mini-batch size does not need to be very large. Therefore, a GPU can achieve better results.

b) Adaptive anchor frame calculation
In the YOLO algorithm, for different data sets, there will be an Anchor frame with initial length and width. During the network training stage, the network outputs are the prediction frame based on the initial anchor frame, then the sum of the Ground Truth Compare is calculated using the difference between the two, and then Iterative network parameters are updated in reverse. In YOLO-v3 and YOLO-v4, when training with different data sets, the calculation of the initial anchor box value is executed through a separate program. But in YOLO-v5, this function is embedded in the code. Hence, the best anchor box value is calculated during each training stage and updated adaptively.

c) Adaptive image scaling
In commonly used target detection algorithms, different pictures have different lengths and widths, so the common way is to uniformly scale the original pictures to a standard size, and then feed them to the detection network. During inspection, many defect images might have different aspect ratios, and after zooming and filling, the black image borders can be different. If more fillings is needed, there will be information redundancy, which will affect the speed of reasoning. Therefore, in the YOLO-v5 code, the letterbox function is modified to the original image and the least black border adaptively which is different with our previous study [27]. The black edges at both ends of the image height are reduced which consequently reduced the calculations during inference which improves the target detection speed. Through this simple improvement, the inference speed has been increased by 37%, which can be said to be very effective.

Backbone a) Focus structure
Taking the structure of YOLO-v5 as an example, an original 608 × 608 × 3 image is fed into the Focus structure as shown in Figure 3, then the slicing operation is used to change the images to 304 × 304 × 12 feature map, followed by 32 convolution operation kernels which produces a final feature map of 304 × 304 × 32. The Focus structure of YOLO-v5 is shown in Figure 3. The full name of CSP Net is Cross Stage Partial Network, which mainly solves the problem of a large amount of calculation in reasoning from the perspective of network structure design. The author of CSP Net believes that the problem of excessive inference calculations is due to network optimization gradient information repetition. Therefore, the CSP module is used to first divide the feature map of the base layer into two parts, and then merge them through a cross-stage hierarchy, which can reduce the amount of calculation and ensure accuracy. There are two CSP structures designed in the YOLO-v5 network, for example, the CSP1_X structure applied to the Backbone network and an additional CSP2_X Structure is applied to Neck.

Neck
YOLO-v5's current Neck and YOLO-v4 use FPN + PAN structure, but when YOLO-v5 first came out, only the FPN structure was used, and the PAN structure was added later, and other parts of the network were also adjusted. The Neck structure of YOLO-v5 as shown in Figure 4, the CSP2 structure designed by CSPnet is adopted to strengthen the ability of the network feature integration.

b) Non-maximum suppression
In the post-processing of target detection, screening of many target frames usually requires a nonmaximum suppression operation. In this research, DIOU_non-maximum suppression method is adopted which is different from our previous study [27]. Under the same parameters, the IOU in nonmaximum suppression was changed to DIOU_non-maximum suppression. For some block overlapping targets, there will be indeed some improvements. When the parameters are consistent with the ordinary IOU_non-maximum suppression, we modify it to DIOU_non-maximum suppression, and the two targets can be detected. Although the effect is similar in most states without an increase in the calculation cost, there is a slight improvement which is also good.

Results
In comparison with other object detection algorithms, the implementation of YOLO-v5 into embedded devices is very easy. Nvidia TITAN V GPU is used for this experiment, which reduces training time to 10% (i.e., 34 to 4 h). YOLO-v5 only requires the installation of a torch and some lightweight python libraries. With NVidia TITAN V GPU using Linux operating system and with the help of PyTorch, the experimental costs have been reduced because we just need to install the Torch with lightweight libraries. YOLO-v5 can infer individual images, batch images, video feeds, or webcam ports. The file folder layout is intuitive and easy to navigate while developing. You can easily translate YOLO-v5 from PyTorch weights to ONXX weights to CoreML to IOS. Three types of YOLO-v5 models are used in this experiment and have been compared with each other YOLO-v5 small, medium and large models. They have trained and tested the network setting, and parameters have been adjusted and tuned gradually using trial and error method. YOLO-v5 includes four different models ranging from the smallest YOLO-v5 with 7.5 million parameters (plain 7 MB, COCO pretrained 14 MB) and 140 layers to the largest YOLO-v5 x with 89 million parameters and 284 layers (plain 85 MB, COCO pre-trained 170 MB). The approach that is considered in this paper is based on pre-trained YOLO-v5x model.    YOLO-v5x model uses a two-stage detector that consists of a Cross Stage Partial Network (CSPNet) backbone, and a model head using a Path Aggregation Network (PANet) for instance segmentation. Each Bottleneck CSP unit consists of two convolutional layers with 1 × 1 and 3 × 3 filters. The backbone incorporates a Spatial Pyramid Pooling network (SSP), which allows for dynamic input image size and is robust against object deformations. In this experiment, we have used 23,000 images which are labeled by skilled quality control engineer different types of defects has been labeled as a single class as DEFECT. The epoch size is based on the training dataset. After deciding the parameters for the model, an initial ideal starts for training beginning. A cross-validation method has been induced to validate the results and 10 cross-validation method has been used. Then data is divided into 10 parts, 9 of them are used for training and the remaining one part is used for testing.
The 10 cross-validation method is used for 3 different models which generates 30 cross-validation models to justify the result. The 10-fold cross-validation method [28] has been implemented to evaluate the execution of the trained models. Initially, the data is randomly divided into 10 equal parts, 9 of these parts are used for the training model and the remaining part is used for testing. After every training of the YOLO-v5 model, the data used to evaluate the model is not seen or interacted with the model during the training season. This method is repeated 10 times by jumbling the training and validation datasets. After completing the training process for the Tiny-YOLO-v5 model, every model is tested for different datasets.
As demonstrated in Tables 1-3, the results are gradually improving as the structure changes. This performance also proves that YOLO-v5 model is more efficient than other YOLO models. After every epoch or iteration, there is an increase in accuracy of the training process, which eventually tends the model to move towards better performance, and the final model is saved after the accuracy reaches a stable state. Results of 10 cross-validations YOLO-v5 small model are shown in Table 1, for YOLO-v5 medium model are shown in Table 2 and YOLO-v5 large model are shown in Table 3. Table 4 displays the result of testing in the form of a confusion matrix. In detail, initially, YOLO-v5 small model detection accuracy is approximately 97.52%, the YOLO-v5 medium model's accuracy is 99.16%, YOLO-v5 large model is approximately 99.74% as reported in Tables 1-3 respectively. In total, 30 cross-validations have been trained on 3 different model sizes as shown in Table 4. The cells are shaded with red and green representing True Positive and True Negative respectively. The categories are labeled as NG (not good/damaged) and OK.
According to the results, it can be concluded that YOLO-v5 large provides the best output its highest accuracy of 99.95% in detecting defects, and on average for 10 cross-validations the accuracy is 99.74% (Table 3). Also, other parameters like Misclassification Rate, True Positive Rate, False Positive Rate, True Negative Rate, and Prevalence which can be noticed that measuring parameters for YOLO-v5 large, is consistent and in all the 10 cross-validations for YOLO-v5 large model, there is not a huge difference between them. Figure 5 shows sample images for True Positive. In the above sample images, it can be seen that the model can detect the defects with confidence.    Figure 7 shows the False Positive defects. In these images, the model predicts False Positive which is misclassified but it with low confidence. To avoid such kind of misclassification, the model needs to be fined tuned by inspecting the size of the bounding box in training data.   As it can be seen, the detection results YOLO-v5 large are appreciable with an average accuracy of 99.74% apart from the evaluation precision which is consistently 0.99 (Table 3). In addition, other measures such as misclassification rate, True Positive Rate, False Positive Rate, True Negative Rate, and Prevalence are the best using YOLO-v5 large compared to YOLO-v5 medium and YOLO-v5 small which gives stability and consistency. In most machine learning algorithms, it is believed that a large and balanced dataset makes the difference in performance.  In order to compare with our previous study using Tiny-YOLO-v2 version [27], five-fold crossvalidation with batch size 32 were used in this experiment where Tiny-YOLO-v2 was tested using 765 images. An average defective PCB detection accuracy (batch size of 32) was found to be 98.79%, and evaluation precision was consistently 0.99. In addition, other measures such as the misclassification rate, true positive rate, false-positive rate, true negative rate, and prevalence for a batch size of 32 were not up to the mark in comparison with YOLO-v5 as deposited in Tables 5 and 6. As the mean accuracy for YOLO-v5 large is 99.52%. Table 7 displays Confusion matrix of five different cross-validations  and comparison between Tiny-YOLO-v2 and YOLO-v5 large methods. Finally, in order to compare the training time and memory size for our proposal YOLO-v5 three models (i.e., small, medium, and large). Table 8 shows results for running 150 epochs.

Discussion
One major advantage of YOLO-v5 over other models in the YOLO series is that YOLO-v5 is coded in PyTorch from the ground up. This makes it useful for machine learning engineers as there exists an active and vast PyTorch community to support the researchers. YOLO-v5 is also much faster than all the previous versions of YOLO [29][30][31]. In addition to this, YOLO-v5 is nearly 90% smaller than YOLO-v4. This means YOLO-v5 can be deployed to embedded devices much more easily. To know more about some of the advantages of YOLO-v5, mosaic augmentation, is an included technique in the improved YOLO-v5. Previously, YOLO models are developed using darknet and that was not so flexible for research work and not suitable to be used in industry. Iterating on YOLO-v5 may be easier for the broader research community. Apart from that, YOLO-v5 is fast running on NVidia Titan V as it can reach 140 FPS while the other YOLO models are restricted to 50 FPS. YOLO-v5 is accurate after training for 1000 epochs and roughly it can achieve 0.935 mean average precision. It is not usually seen in any other YOLO or object detection models without loss an accuracy has been achieved. Finally, the size of YOLO-v5 model, e.g., the weight file of YOLO-v5 small is only 27 megabytes and the size of YOLO-v5 large is 192 megabytes while the size of YOLO-v4 is 244 megabytes. As mentioned in Section 2, adaptive image scaling and Non-maximum suppression are the changes that we have done comparative to our previous work [27]. In this study we have also implemented the automatic testing process and a user interface which automatically extracts the images from AVI machine and runs the testing in a parallel computing which eventually saves time during testing process. In addition to that, it runs the testing without intervention of human. Furthermore, to justify the difference between both the model, Figure 9(a) is the structure of Tiny-YOLO-v2 that we have used in the previous study, and Figure 9(b) displays the structure of YOLO-v5 that we have used in this research. As we can see in the below figures, the difference between both the structure is the neck section. In the field of target detection, in order to better extract the fusion features, usually some layers are inserted in the Backbone and the output layer. This part is called Neck. The neck, which is equivalent to the target detection network, is also very critical. YOLO-v5 uses CSPDarknet53 as the backbone, plus the SPP module, PANET as the neck, and the head of YOLO-v3. Compared with Tiny-YOLO-v2, the structure diagram of YOLO-v5 has more CSP structure and PAN structure. If you simply look at the structure, you will find it very convoluted. However, after seeing the below structure, it will feel suddenly open. In fact, the overall structure is same, but using various new algorithm ideas to improve each substructure. Yolo-v5 structure is the method for neighboring positive sample anchor matching strategy. Through flexible configuration parameters, models of different complexity can be obtained so it improves overall performance through some built-in hyperparameter optimization strategies, for example, mosaic enhancement is used to improve the detection performance of small objects.
The used datasets are collected by our research team who have decades of experience in quality inspection of PCB. Moreover, the traditional deep learning [32][33][34][35] methods are based on classifying or detecting particular objects in the image. In this paper, three different types of models have been deployed and have been compared with each other. The training time has also been compared each other after comparing training time, it has been observed that YOLO-v5 small takes less time compared to the other two models. YOLO-v5 small takes almost 3-4 hrs., to train and YOLO-v5 medium takes 12-14 hrs. to train while the YOLO-v5 large takes 31-32 hrs. However, with respect to the overall accuracy, YOLO-v5 large is more accurate than the other two models. YOLO-v5 small has an average accuracy of 97.52%, while YOLO-v5 medium has 99.16% and YOLO-v5 large has an accuracy of 99.74%. Apart from accuracy if we compare with the map, YOLO-v5 large has the highest map than the other two. YOLO-v5 large has 94%, while YOLO-v5 medium and YOLO-v5 small has 84 and 82%.

Conclusions
This research proves that YOLO-v5 large can detect the defects in PCB with plausible accuracy of 99.74%, which optimized a lot of skilled manpower and time. It also increases the accuracy. In future work, the accuracy can be increased considering several types of defects. In future work, efficient performance with higher accuracy requires further research for example includes a different kind of defects. The grouping of classes must be done in a more balanced way, and we need to include more types of defects and increase the type of defects with an increase in data. Further, in the future, we will try to develop fully automatic training without human interference and use transfer learning and meta-learning to improve the accuracy. Eventually, the transfer learning approach [36,37] can be considered for a pre-trained YOLO model.