PCB Defect Images Super-Resolution Reconstruction Based on Improved SRGAN

: Image super-resolution reconstruction technology can boost image resolution and aid in the discovery of PCB ﬂaws. The traditional SRGAN algorithm produces reconstructed images with great realism, but it also has the disadvantages of insufﬁcient feature information extraction ability, a large number of model parameters, as well as a lack of ﬁne-grained image reconstruction impact. To that end, this paper proposes an SRGAN-based super-resolution reconstruction algorithm for PCB defect images that is the ﬁrst to add a VIT network to the generation network to extend the perceptual ﬁeld and improve the model’s ability to extract high-frequency information. The high-frequency feature extraction module is then used to enhance the generator’s extraction of high-frequency information from the feature map while reducing the complexity of the model network. Finally, the inverted residual module and VIT network are combined to form the discriminator’s backbone network, which extracts and summarizes shallow features while synthesizing global features for higher-level features, allowing the discriminator effect to be achieved with less spatial complexity. On the test set, the improved algorithm increases the PSNR by 0.82 and the SSIM by 0.03, and the SRVIT algorithm’s number of discriminator model parameters and model space size are decreased by 2.01 M and 7.5 MB, respectively, when compared to the SRGAN algorithm. Moreover, the improved PCB defect image super-resolution reconstruction algorithm not only enhances the image reconstruction effect but also lowers model space complexity.


Introduction
Nowadays, contemporary electronic communication devices have a significant effect on societal development, and the quality of electronic devices is determined by the quality of printed circuit boards [1]. However, due to technical limitations, boards with different defects are inevitably produced, so PCB board defect detection [2,3] has become an urgent issue in the sector. PCB images are sometimes limited by imaging equipment, which can be blurry and low-resolution, making it difficult to identify problematic images and satisfy the application requirements of PCB board defect detection. The super-resolution image reconstruction algorithm [4], however, can convert low-resolution pictures into highresolution images, which is a current research hotspot in the field of computer vision and plays an important role in supporting the growth of the PCB board defect detection field.
Three phases of development have occurred for super-resolution image reconstruction algorithms [5,6]. Early methods primarily included interpolation-based algorithms [7] and algorithms built on reconstruction [8], which work in the pixel and frequency domain dimensions, respectively, but the final image reconstruction results are not ideal because both methods are limited by the image itself. With the brisk growth in the field of deep learning in recent years, an increasing number of scholars have devoted themselves to introducing deep learning techniques into the field of image resolution. In 2014, Dong proposed the SRCNN algorithm [9]; this was the first time deep learning techniques had been used in image reconstruction. It intended to build an end-to-end network model that nonlinearly maps pictures using a three-layer convolutional neural network. When compared to previous algorithms, the final high-resolution images generated by this algorithm are considerably improved in every way, but they still suffer from slower speed and higher computational cost. Dong suggested the FSRCNN algorithm in the same year to address these issues [10], which optimized the SRCNN network structure to reduce training costs and speed up operations while maintaining good performance. Tai introduced the DRRN [11] deep recursive residual network-based image reconstruction algorithm in 2017, which reduced network parameters and optimized network performance. In 2022, Lu proposed a new Efficient Super-Resolution Transformer (ESRT) for SISR. ESRT is a hybrid model that consists of a lightweight CNN backbone (LCB) and a lightweight deformer backbone (LTB) [12]. In 2023, Ariav proposed a fully transformer-based depth map superresolution network. The method incorporates a novel cross-attention mechanism that seamlessly and continuously guides color images into a depth upsampling process [13]. Following that, different deep learning network structures were applied to the field of super-resolution image reconstruction [14] with remarkable results.
Influenced by the Generative Adversarial Network (GAN) [15], in 2017, Ledig introduced the SRGAN (Super-Resolution Generative Adversarial Network) algorithm, which combined generative adversarial networks and super-resolution domains [16]; this algorithm suggests a discriminative network and a generative network to train images together, as well as a perceptual loss function, to make the reconstructed graphic texture clearer and enhance image visual effects. The SRGAN algorithm, in contrast, has flaws such as inadequate feature extraction capability and a large number of model parameters. To address this issue, this paper improves the SRGAN network and suggests an efficient PCB defect image super-resolution reconstruction algorithm based on the principles of model perceptual field improvement and feature fusion to obtain higher resolution PCB defect images.

Generating Adversarial Network
GAN consists of two parts, Generator and Discriminator, and the structure is shown in Figure 1: By inputting a low-resolution picture, the generator network generates a false superresolution image. The generator's fake image output, along with the actual image, is fed into the discriminator network. After the discriminator comparison, the return information of true or untrue is output, and the return information is fed into the generator network, which optimizes the network weight based on the return value to produce a more realistic high-resolution picture. The entire network can be used to reconstruct the HD image in the continuous confrontation between the generator and the discriminator, until the image produced by the generator cannot be judged as true or false by the discriminator. The GAN objective function is shown in Equation (1): where D denotes the discriminator, G denotes the generator, V(D,G) denotes the binary cross-entropy function, E denotes the probability of the included terms, x denotes the real data, and z denotes the fictitious data. When fed with actual data, the larger the output, the better; when fed with false data, the smaller the output, the better. When the generator is fixed and the discriminator is fixed to find the minimized generating network weight, the discriminator network weight is eventually found to optimize.

SRGAN
The core structure of the SRGAN network is a GAN network, and the generator part consists of a deep residual network, as shown in Figure 2: As shown in Figure 2, the generator network is split into four layers: low-level feature extraction, high-level feature extraction, deconvolution (transposed convolution), and CNN reconstruction. The primary component is a deep network made of B Resnet blocks, a structure that improves the flow of information across layers while also preventing gradient disappearance due to network depth deepening. The discriminator component is a standard CNN, as shown in Figure 3: The discriminator is a conventional CNN structured network, and the activation function employs Leaky ReLu to avoid some negative output necrosis. The texture information of the generated image is closer to the original image, which is more in line with human eye perception and has a stronger sense of realism, owing to the SRGAN network.

Algorithm Improvement Proposed Algorithm
The SRGAN network is used as the model framework in this paper, but the model has limitations such as a restricted perceptual field, the inability to extract high-frequency information effectively, and the large number of model parameters. Consequently, this paper improves it and suggests SRVIT (Super-Resolution Vision Transformer), a super-resolution reconstruction algorithm. To begin, the VIT network structure is introduced in order to broaden the perceptual field and improve the ability to retrieve high-frequency information. Second, the DW convolutional residual block is added to the generator, and the size of the convolutional kernel is increased to decrease the number of network parameters while improving the high-frequency extraction capability. Finally, the discriminator network uses the inverse residual block of the convolutional combination first, followed by the VIT network, benefiting the discriminator effect and reducing spatial complexity.

Generator Improvements
To address the issue of high-frequency information retrieval capability in the SRGAN network, this paper proposes to incorporate the VIT [17] structure in the generator stage. The VIT utilizes global computation and the receptive field covers the entire feature map, making it superior in extracting high-frequency information or image details that are crucial in fine-grained images such as PCB circuit diagrams. Moreover, to allow the model to focus on both the deep and shallow layers of the feature map, the VIT is added to both shallow and deep layers of the generator. However, since the entire VIT network is complex and some parts have minimal impact on the hyper-segmentation reconstruction network, the space complexity of the hyper-segmentation reconstruction network should be kept as low as possible. In the original network part, the VIT structure mentioned in this chapter takes the form of a Transformer Encoder, as illustrated in Figure 4. As shown in Figure 4, the VIT structure's incoming information first passes through the Norm layer, which is normalized to CHW in the channel direction, and then is input to the Dropout layer after a multi-head self-attention mechanism, with the purpose of network regularization to prevent overfitting, and the output information is finally input to the MLP network after another Norm layer, and the multilayer perceptron can strengthen the adaptive, self-learning capability of the model. In a complete VIT process, the multiheaded self-attentive mechanism is a very important step. The multi-headed self-attentive mechanism is a stack of single-headed self-attentive mechanisms, which can extract different information from the feature map and find parameters in multiple parameter spaces, which makes the reconfiguration of the network more accurate. The expression for the self-attentive mechanism is shown in Equation (2): where Q represents query, K represents Key, V represents Value, QKT represents the inner product of two feature matrices, softmax represents the normalization operation such that dK is the dimension of the key and the elements of the normalization function are divided by √ d k with a variance of 1. This allows the normalized distribution to fluctuate to the extent that d is decoupled so that the gradient values remain stable during training. The normalized distribution was decoupled from d so that the gradient values remain stable during training. Here, the value of QKV is the mapping of the input information first through three transformation matrices, Wq, Wk, and W v , to get the corresponding qi, ki, vi and then combined to obtain the process as shown in Equation (3): where "a" represents the value mapped when the input passes through the input channel. The values obtained from each self-attentive mechanism are stacked to obtain the output values of the multi-headed self-attentive mechanism, and the VIT structure of the SRVIT hyper-segmentation reconstruction algorithm proposed in this paper uses an eight-headed self-attentive mechanism module.
In addition to incorporating the VIT network structure, the proposed algorithm employs the HFB (High-Frequency Block) in the generator step, the structure of which is depicted in Figure 5: As shown in Figure 5, the module uses a larger convolutional kernel because the variability of adjacent pixel points of PCB images is not very large, and the size of the convolutional kernel is increased so that the model can better extract the high-frequency information of the feature map in the shallow features, while DW convolution is used to reduce the number of model parameters, the introduction of the layer Norm layer can better suppress the overfitting phenomenon of the model, and the residual structure can prevent the network gradient explosion and enhance the gradient propagation, thus improving the model learning ability.
The MLP network in the HFB module consists of the LINEAR module and the GELU activation function, which serves to complete the dimension-generation and then dimensionreduction operations to extract high-dimensional information, giving the model the ability to process information in parallel, and it can achieve a high degree of nonlinear global action to make the model have good fault tolerance and enhance the generalization ability of the model, and the drop path regularization module enhances the model robustness. The overall structure of the generator for the improved SRGAN-based super-resolution reconstruction network of PCB defect images in this paper is shown in Figure 6: As shown in Figure 6, the improved generator network structure consists of the LR image block obtained by downsampling as the input, which first enters a convolutional layer for shallow feature extraction and PReLU function for nonlinear mapping, and then the output feature map passes through the HFB module and VIT module in turn, and the last VIT module output enters the upsampling convolution module, and the upsampling convolution in the Pixel Shuffle module is responsible for recovering the image and upsampling role to improve the image resolution after the upsampling convolution goes through the CNN reconstruction layer to complete the whole generator, thus outputting the reconstructed super-resolution PCB defect image.

Discriminator Improvements
The improved discriminator network in this paper consists of the IRB (Inverted Residual Block) module and the VIT module. The IRB module is shown in Figure 7: The IRB module adopts an inverted residual structure, which first performs projective convolution to raise the dimensions, then passes through depth-separable convolution, and finally uses projective convolution to reduce the dimensions, because a depth convolution kernel is performed on only one channel of the feature layer, so the computation is not too large, and each IRB network contains three convolutions, three BN layers, and two SiLU activation functions to fully combine the features. The residual module reduces a large number of parameters and the size of the space occupied by the model, while preventing overfitting. The overall structure of the improved discriminator in this paper is shown in Figure 8: As can be seen from Figure 8, the input information of the discriminator first goes through a shallow feature extraction by convolution, and then enters the backbone network composed of IRB module and VIT module after BN layer and SiLU activation function. The DWPW convolutional inverse residual block in the backbone network can quickly extract and summarize the shallow features compared with the traditional convolution, and the VIT added later can give the high-level features for global feature synthesis, which facilitates the effect of discriminators and reduces the spatial complexity by making full use of the features. The backbone network is followed by the pooling layer and the fully connected layer, and the final output completes the discrimination of HD images or reconstructed images.

Loss Function
The performance of GAN networks is inextricably linked to the definition of the loss function. One of the advantages of SRGAN is the introduction of the loss perception function, which enhances the realistic detail information after hyper-segmentation. To solve the problem that the traditional pixel loss function MSE optimization lacks highfrequency detail extraction and leads to a lack of realism in the reconstruction effect, the SRGAN network uses a content loss function with perceptual similarity-driven VGG feature mapping. The overall perceptual loss formula of the SRGAN network is shown in Equation (4): where, l SR X is the content loss, 10 −3 l SR Gen is the adversarial loss, and is 10 −3 the weighting factor of the adversarial loss. The content loss includes MSE loss and VGG loss, and the weighting factor of VGG loss is taken as 0.006. The MSE loss, i.e., the pixel-by-pixel loss function between HR and SR, is shown in Equation (5): where H is the height of the input image, W is the width of the input image, r denotes the image reconstruction multiplier, I HR x,y denotes the value of the pixel point (x, y) corresponding to the high-resolution image, and G θ G (I LR ) x,y denotes the value of the pixel point (x, y) corresponding to the reconstructed image. The VGG loss, i.e., the pixel-by-pixel loss function between the deep features of the HR image and the SR image, is shown in Equation (6): where W i,j and H i,j denote the dimensions of the feature maps within the VGG network, ϕ i,j (I HR ) x.y denotes the eigenvalues of the deep feature map (x, y) points of the highresolution image, and ϕ i,j G θ G (I LR ) x,y denotes the eigenvalues of the deep feature map (x, y) points of the reconstructed image, and the loss of the deep features is used as the optimization objective can be complementary to the high-frequency information of the reconstructed image. Generating the adversarial loss function is shown in Equation (7): where G θ D (I LR ) represents the reconstructed image and D θ D (G θ D (I LR )) represents the probability that the reconstructed image is the true image.

Experimental Environment and Data Set
The experimental environment of the proposed PCB defect image super-resolution reconstruction algorithm SRVIT is shown in Table 1, and the experimental environment of the comparison network in the experiment is consistent with it to ensure the fairness of the experiment. In the deep learning-based PCB defect image super-resolution reconstruction experiments, a dataset with a large number of images is the basis for better training results, the dataset for this experiment is selected from the online open-source PCB defect dataset [18], which contains a total of 10,668 PCB defect images, containing six types of defects are open circuit, stray, stray copper, mouse bite, hole leakage, and short circuit. In this paper, 8534 images from the dataset are randomly selected as the training set, and the remaining 2134 images are used as the test set.

Evaluation Indicators
In this paper, the training of the experimental model will be HR circuit board defect images in accordance with the random cropping method for the image block, each patch was 320 × 320 size, LR circuit board defect images using a dual three-time interpolation algorithm for downsampling pre-processing to obtain the patch size of 80 × 80, each batch was sent to the model for training circuit board defect images for four, the choice of Adam optimizer to minimize the loss function, the parameter setting is A = 0.9. The learning rate of the network training is 0.001, and a total of 100 epochs are trained.
The objective evaluation indices selected in this paper are Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measurement (SSIM), which is an image quality criterion based on the pixel level and reflects the degree of image degradation. The PSNR is a pixel-level image quality criterion that reflects the degree of image degradation, as shown in Equation (8): where MAX is the maximum value that can be obtained from the image pixels and MSE is the mean square error of the two images. A larger PSNR represents a lower degree of image distortion and a higher quality of reconstruction. The SSIM evaluates the image quality from three perspectives: brightness, contrast, and structure of the image, respectively, which are defined by Equation (9): where x is the set reference image; y is the reconstructed image; L, C, and S represent the brightness, contrast, and structure contrast functions, respectively; and α, β, and γ are all numbers greater than 0, which are used to adjust the importance between each module. The subjective evaluation method is selected to visually demonstrate the effect of the leaky hole reconstruction effect image in the test set images, and the MOS evaluation method is used to discriminate the different scores of the same reconstructed PCB defect images according to the multiple evaluation values to take the mean value, the scoring criteria are between 1-5, and the MOS value of the original HD image is 5. The experiments in this paper also compare the improved super scoring algorithm in terms of the number of model parameters and the size of the model space with the SRGAN network is compared with the SRGAN network in terms of the number of model parameters and model space size.

Experimental Results
The experiments in this paper compare the reconstructed images based on the SRVIT super-resolution reconstruction model for PCB defect images with the images generated by five different image reconstruction algorithms, namely Bicubic [7], SRCNN [9], FRSCNN [10], VDSR [11], and SRGAN [14], with all network training environments kept consistent. The Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measurement (SSIM), and Mean Opinion Score (MOS) values are shown in Table 2, and the image pairs of some of the test set images after reconstruction in different models are shown in Figure 9:  According to Table 2, the quantitative evaluation results and the displayed images in Figure 9, the image reconstructed by the Bicubic method is poor, with the lowest values of PSNR and SSIM, the overall visual effect is blurred, and the MOS score is low. The three algorithms SRCNN, FRSCNN, and VDSR have improved values in the objective evaluation metrics compared to Bicubic. The general outline of the image is clearer than Bicubic, based on the CNN's expertise in capturing low-frequency information. However, because they are not good at capturing high-frequency information, the image details are severely lost and distortion occurs, and the MOS scores are all very low. Both SRGAN and SRVIT algorithms showed significant improvements in image reconstruction quality. Looking closely at the defective solder in Figure 9, the reconstructed image of the improved model is sharper at the edge of the inner hole and the SRGAN reconstructed image shows a slight distortion at the copper deposit below the solder hole; the reconstructed image of the improved model shows no distortion at the copper deposit. In addition, in Figure 10, the reconstructed image of the improved model is sharper at the inner wire edge at the defective notch. SRVIT optimizes and improves on the shortcomings of SRGAN, such as inadequate local feature extraction and distortion. The improvements were achieved in both objective PSNR and SSIM values, where PSNR improved by 0.82 dB and SSIM improved by 0.03, and in subjective MOS scores. The comparison of the spatial complexity of SRVIT and SRGAN is shown in Table 3: As can be seen from Table 3, the SRVIT algorithm proposed in this paper effectively reduces both the number of parameters and the model size in the discriminator stage compared with SRGAN, in which the number of parameters is reduced by 2.01 M and the model size is reduced by 7.5 MB. The generator is also reduced compared with SRGAN, in which the number of parameters is reduced by 0.02 M and the model size is reduced by 0.1 MB. Combining the two PSNR and SSIM evaluation metrics, the improved superresolution reconstruction algorithm of PCB defect images based on SRGAN in this paper not only improves in pixel-level image quality with higher human eye evaluation, but also effectively reduces spatial complexity.

Conclusions
This article proposes an improved super-resolution reconstruction algorithm SRVIT for PCB defect images to address the problems in the SRGAN algorithm. The VIT network is introduced in the improved algorithm generator, which uses a DW convolutional residual block design, and the discriminator is composed of an inverted residual module and VIT to create a backbone network. The improved SRVIT algorithm outperformed the original SRGAN algorithm in terms of objective assessment index values of PSNR and SSIM on the test set. In order to minimize spatial complexity, the amount of model parameters and space size were reduced, as well as the MOS backbone scoring value, which points the way forward in the area of PCB defect image super-resolution reconstruction.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Written informed consent has been obtained from the patient(s) to publish this paper. Data Availability Statement: Not applicable.