Unleashing the power of AI in detecting metal surface defects: an optimized YOLOv7-tiny model approach

The detection of surface defects on metal products during the production process is crucial for ensuring high-quality products. These defects also lead to significant losses in the high-tech industry. To address the issues of slow detection speed and low accuracy in traditional metal surface defect detection, an improved algorithm based on the YOLOv7-tiny model is proposed. Firstly, to enhance the feature extraction and fusion capabilities of the model, the depth aware convolution module (DAC) is introduced to replace all ELAN-T modules in the network. Secondly, the AWFP-Add module is added after the Concat module in the network’s Head section to strengthen the network’s ability to adaptively distinguish the importance of different features. Finally, in order to expedite model convergence and alleviate the problem of imbalanced positive and negative samples in the study, a new loss function called Focal-SIoU is used to replace the original model’s CIoU loss function. To validate the effectiveness of the proposed model, two industrial metal surface defect datasets, GC10-DET and NEU-DET, were employed in our experiments. Experimental results demonstrate that the improved algorithm achieved detection frame rates exceeding 100 fps on both datasets. Furthermore, the enhanced model achieved an mAP of 81% on the GC10-DET dataset and 80.1% on the NEU-DET dataset. Compared to the original YOLOv7-tiny algorithm, this represents an increase in mAP of nearly 11% and 9.2%, respectively. Moreover, when compared to other novel algorithms, our improved model demonstrated enhanced detection accuracy and significantly improved detection speed. These results collectively indicate that our proposed enhanced model effectively fulfills the industry’s demand for rapid and efficient detection and recognition of metal surface defects.


INTRODUCTION
Metals, as essential industrial materials, are widely used in various sectors such as machinery, aerospace, automotive, defense, and light industries.However, factors such as raw material quality, production environment, equipment, and human errors often lead to surface defects in metal during the industrial production process.These defects include Crease, Water_spot, Punching_hole, Inclusion, etc.Moreover, products are easily damaged on the surface during the actual production process.To prevent the supply of such substandard The etiology of surface defects in metals is multifaceted, exhibiting a range of complex morphologies.Based on the geometric characteristics of these defects, they can generally be categorized into three main types: point-like, linear, and planar.Typical defects can be summarized as punching hole, welding line, crescent gap, water spot, oil spot, silk spot, inclusion, rolled pit, crease, and waist folding.Notably, water spot and oil spot are defects that usually present low contrast, making them easily confounded with other types of defects; water spot are especially prone to misidentification.Given the diverse and intricate nature of these defect types, failure to adequately detect them during the production and processing of metals can have incalculable adverse effects on the structural integrity and functionality of the metal products.Consequently, the task of accurately identifying surface defects in metals is both highly necessary and critically important.However, there are still limitations in defect detection tasks, as directly using existing object detection models may not effectively detect certain types of defects.For metal surface defect detection, the characteristics of deep metal surfaces pose some specific challenges to the detection task.The first challenge is the variation in defect shape and size.Some types of metal surface defects are too small, while others are too large, and there is significant intra-class variation among defects.These obstacles make it difficult to detect small defects, while relatively larger defects are also challenging to detect due to variations in the shape of the same type of defect.To address this challenge, it is necessary to process and extract effective feature information from numerous defects of different scales, as well as improve the robustness of the detection model.The second challenge lies in detection efficiency, which is a critical aspect for industrial applications.It is imperative to maximize the model's detection accuracy while meeting the requirements for real-time detection.Low detection speeds can lead to bottlenecks in the production line, thereby slowing down the manufacturing process, increasing costs, and necessitating expensive hardware and substantial computational resources.Conversely, low detection accuracy results in elevated false positive rates, where non-defective regions are inaccurately flagged as defective.This substantially increases the risk of missing small or hard-to-detect defects, consequently undermining the reliability of quality control measures.Hence, achieving efficient and accurate identification of surface defects in metals presents a formidable challenge.
To solve the above-mentioned problems, this study proposes an improved method for detecting metal surface defects based on YOLOv7-tiny.The method can deploy a high-speed and accurate model in the production process to achieve unmanned, fast, and accurate localization and classification of metal surface defects.
The main contributions of this study are as follows: (1) An improved metal surface defect detection model based on YOLOv7-tiny is proposed.(2) In order to further enhance the network's ability to extract defect features and the interaction between spatial and channel information, the Elan-T modules in the YOLOv7-tiny model were replaced with the DAC modules tailored to this specific dataset, based on experimental retrieval.(3) To enable the network to automatically determine the importance of features and increase the ability of feature fusion, an Adaptive Weighted Feature Path module (AWFP-Add) is proposed.(4) To address the slow convergence speed, inaccurate regression results, and the imbalance problem in boundary box regression that is generally ignored by traditional loss functions, a new loss function called Focal-SIOU loss is used to replace the original boundary box loss function of YOLOv7-tiny.

RELATED WORK
Surface defect detection is an important link in the metal production process.However, there are great differences in the defect scale of defects in the defect detection process.In order to improve the detection accuracy, Beskopylny et al. (2023) proposed a feature extraction network that utilizes depth-wise separable convolution to enhance detection.They also introduced dilated convolutions in the spatial pyramid pooling (SPPF) module to enlarge the receptive field and incorporate contextual information.Lastly, they introduced a novel attention mechanism, namely the Multi-scale Enhanced Context Attention (MECA), to facilitate the extraction of multi-scale detailed information.The results demonstrate a 6.5% improvement in mean average precision(mAP) and a 5.75% improvement in F1 score compared to the original model.Chen et al. (2023) introduced the coordinate attention (CA) mechanism module to replace the spatial pyramid pooling (SPP) structure in YOLOX's Backbone.They also proposed a novel EDE block to capture the complete features of surface defects.Finally, they addressed the low contrast issue of steel surface defect images by introducing the CLAHE data augmentation method.The best model achieved an accuracy of 82.6% at a frame rate of 100.2fps on the NEU-DET dataset.Cheng & Yu (2020) proposed RetinaNet with differential channel attention and adaptive spatial feature fusion.The results demonstrated that the new network achieved a 78.25 mAP, exhibiting a 2.92% improvement compared to RetinaNet.Xing & Jia (2021) proposed a convolutional network classification model with symmetric modules for feature extraction, and designed an optimized IOU (XIoU).The results demonstrate that their model achieved a mAP of 79.89% on NEU-DET and 78.44% on the self-made detection dataset.Yu, Cheng & Li (2021) proposed a new detection model called CABF-FCOS, which is based on the anchor-free approach.This deep network utilizes channel attention mechanism (CAM) attention mechanism and bidirectional feature fusion network (BFFN) for bidirectional feature fusion, enabling the identification of specific categories and precise locations of steel defects.The experimental results showed that the new network achieved an mAP of 76.68% on the NEU-DET dataset, an improvement of 4.43% compared to FCOS.Tang et al. (2023) introduced the Transformer structure as an alternative to the commonly used CNN network architecture.They utilized the self-attention mechanism to capture global information and employed parallel computation to enhance computational efficiency.Through the Swin Transformer, they extracted multi-scale features from the images.The combination of FPN and RPN facilitated the integration of features from different scales.Finally, they improved the ROI head to obtain the category and precise localization of defects.The achieved detection accuracy of the final model was 81.1%, surpassing numerous classical CNN-based detection methods.
To meet the current industrial production needs, researchers are increasingly studying how to continuously improve the accuracy of defect detection while ensuring real-time monitoring.This type of research primarily focuses on single-stage object detection networks, particularly the YOLO series algorithms.Liu & Ma (2023) improved the utilization of defect features by adjusting the receptive field at different scales and attention weight preferences through the addition of an expanded and weighted cross-stage feature pyramid network in the Neck.They maximized the extraction of useful information by enhancing the cross-stage partial connection with ResNet in the Backbone.In order to increase robustness, the Head section adopted a decoupled head.As a result, their algorithm achieved better detection results with 79.93% and 72.76% mAP on the GC10-DET and NEU-DET datasets, respectively.Liu & Jia (2023) proposed a new model, ST-YOLO, for detecting defects in steel.This model utilizes a streamlined fusion network structure to meet the computational requirements for classification and localization tasks.To optimize label assignment, a self-adjusting label assignment algorithm is introduced, which guides the model to flexibly complete training.This method achieves an average detection accuracy of 80.3% at a frame rate of 46 frames per second.Furthermore, it has demonstrated excellent performance in real defect detection applications.Zhang et al. (2023) introduced and optimized the weighted bidirectional feature pyramid network with embedded residual module in YOLOv5s, and preprocessed the images using Laplacian sharpening.The best model achieved an mAP of 86.8% on the NEU-DET dataset, while efficiently processing RGB images of size 640 × 640 at a speed of 51 FPS.Liu et al. (2022) enhanced the representation of dense small defects in YOLOv3's DarkNet53 backbone network by adding an extra scale prediction layer on top of the existing three layers.They further improved the capability by densely linking multi-scale feature maps across layers.This method achieved an average detection accuracy of 89.24% and demonstrated the ability to detect nearly 26 images of size 416*416 pixels per second.Xu et al. (2023) developed an end-to-end steel surface defect detection and size measurement system based on YOLOv5.They employed BiFPN in the Neck section to enhance the feature fusion and introduced the CA mechanism in the Head section to strengthen the spatial correlation of steel surface.Furthermore, they proposed an adaptive anchor box generation method based on defect shape difference features.As a result, the improved YOLOv5 achieved high detection accuracy of 93.6% at a fast detection speed of 133 FPS.It also exhibited remarkable accuracy in locating small defective objects.In conclusion, the YOLO-based object detection algorithm effectively addresses the problem of surface defect detection, but requires a trade-off between speed and accuracy.Building on this research, this paper proposes an improved metal surface defect detection model based on YOLOv7-tiny.From an optimization standpoint, our solution offers a way to detect metal surface defects, and the excellent performance of the improved YOLOv7-tiny is demonstrated.

METHOD
The YOLOv7-tiny algorithm is a simplified version of Yolov7, which retains the model scaling strategy based on the cascade idea.It also improves the efficient long aggregation network (ELAN) for higher detection accuracy, with smaller parameter sizes and faster detection speeds.The main differences between them lie in the internal components and the depth and width of these components.The backbone of the YOLOv7-tiny network primarily The process of defect detection using the YOLOv7-tiny model (Fig. 1) is as follows: 1. Firstly, load the relevant dataset based on the configuration file.2. Next, preprocess the dataset to meet the input requirements of the YOLOv7-tiny model.
3. Input the preprocessed data into the YOLOv7-tiny model and start the iterative training process, updating the parameter values.Finally, when the model training reaches

Improved network structure based on YOLOv7-tiny model
The improved YOLOv7-tiny algorithmic network structure architecture proposed in this study is marked in the box in Fig. 2, and detailed information is given in the subsequent three sections.
In this study, specific improvements in the YOLOv7-tiny algorithm structure are highlighted in Fig. 2

Depth Aware Convolution module
In the YOLOv7-tiny network, the ELAN-T module is used to extract features from feature maps of various regions.However, for our metal surface defect dataset, defects within the same category vary greatly in size and shape.Furthermore, there may also be similarities between defects of different categories.Moreover, due to the variations in different sample materials and the influence of lighting conditions, the grayscale values of intra-class defect images also undergo certain changes, which in turn disrupt the accuracy of the detection results.These factors collectively make it challenging for the network to extract meaningful features.The ELAN-T module is a simplified version of the ELAN module in YOLOv7.However, the ELAN module itself is not very effective in feature extraction (Wang et al., 2023b).Hence, it is clear that the feature extraction capability of ELAN-T is also unsatisfactory.Therefore, we attempt to improve the ELAN-T module of YOLOv7-tiny by introducing our innovative depth aware convolution module (DAC) (as shown in Fig. 3).We replace all ELAN-T modules in the entire model structure with DAC modules to further enhance the defect feature extraction and fusion capabilities in the YOLOv7-tiny network architecture while maintaining a certain inference speed.
The inspiration for improving the DAC module comes from the combination of V7 and V7-tiny themselves.Due to the simple structure of the ELAN-T (C5) module in V7-tiny, its feature extraction capabilities are significantly limited compared to other models in the V7 series; The ELAN-T module consists of five convolutions: three 1 × 1 convolutions and two 3 × 3 convolutions.It is known that 1 × 1 convolutions only facilitate information exchange and feature fusion between channels, lacking interactions and fusion between neighboring pixels.On the other hand, there are only two 3 × 3 convolutions that enhance feature fusion between neighboring pixels.Consequently, it is reasonable to expect a poor feature extraction capability from the ELAN-T module.In summary, we enhance the C5 module by increasing the number of 1 × 1 and 3 × 3 convolutions.The 1 × 1 convolutions strengthen the information exchange among channels, while the increased 3 × 3 convolutions widen the receptive field, promoting better interaction of spatial information and feature fusion.Additionally, we employ the Concat operation to increase the number of features, consequently improving the non-linear transformation of the network.

AWFP-add module
YOLOv7-tiny incorporates the feature fusion network of the YOLOv5 series, which consists of the Feature Pyramid Network (FPN) and the Path Aggregation Network (PAN) architecture (Wang, Bochkovskiy & Liao, 2023a).Lin et al. (2017) effectively transfers strong semantic information from deeper feature layers to even deeper layers (Chen et al., 2021).On the other hand, PAN transmits accurate localization information from bottom to top.By combining FPN and PAN, different detection layers from the backbone are parameterized together, enhancing the feature fusion capability of the network.However, this combination introduces a drawback: the PAN structure takes as input the features that have been processed by the FPN, but some defect features extracted from the backbone's original information are lost along the way.The lack of original information for learning can lead to bias in training and consequently affect detection accuracy.To address this issue, we propose incorporating the concepts of bi-directional weighted feature pyramid network (BiFPN) and fast normalization fusion from its associated paper into the conventional Add module.This allows the Add module to have its own learnable parameters.Additionally, in terms of structure, we insert the Add module after each of the three Concat modules in the feature fusion network.We also combine the practical significance of surface defect detection in this study and the concept of residuals.Specifically, we connect the Add module and Backbone to the three feature maps provided to the Head, enabling the network to retain more shallow semantic information without losing too much deep semantic information.At the same time, different weights are set according to the importance of different input features to ensure that the network can adjust the weight parameters adaptively during gradient backhaul, and pay attention to the importance of different features and adjust them.This innovation is the second improvement described in this article, AWFP-Add, and the formula and structure are shown in Fig. 4: (3) Equation ( 1) is the mathematical expression of fast normalized fusion, for this study, directly introducing fast normalized fusion can improve model accuracy to a certain extent, but the improvement is very limited.Moreover, it introduces more learnable parameters, not only increasing the model's computational complexity but also significantly affecting the model's stability.For this reason, we opt to adopt the simple and efficient sigmoid function to restrict the weight value range between (0, 1) and implement the task of the network to focus on the importance of different features by itself through Eq. ( 3), and its complete computational process is illustrated in Fig. 5.

Focal-SIoU
The loss function of YOLOv7-tiny consists of three components: bounding box regression loss (BBR), confidence loss function, and class loss function.The bounding box regression loss measures the error in the predicted box's coordinate localization error.The confidence loss reflects the confidence error of the predicted box, while the class loss function captures the prediction box's error in predicting the target class.Specifically, the class loss function uses binary cross-entropy (BCE) loss, only calculating the classification loss of positive samples.The confidence loss function is also BCE loss, but it measures the confidence loss between the predicted bounding box and the ground truth box using complete intersection over union (CIOU).This loss is calculated for all samples.The bounding box loss function also uses CIOU loss, but it only calculates the position loss of positive samples.The CIOU loss takes into account three important geometric factors: overlap area, the distance between centroid points, and aspect ratio, which makes the bounding box regression more stable.This is shown in Fig. 6.Given the predicted box B and the ground truth box B gt .The CIOU loss is defined by Eq. ( 4): where ρ 2 b,b gt denotes the Euclidean distance between the center point of the prediction box and the center point of the ground truth box, denoted by d in Fig. 6, c denotes the diagonal distance between the ground truth box and the smallest closed rectangle contained in the prediction box.α is defined by Eq. ( 5) below and ν is defined by Eq. ( 6) below.
where w gt represents the width of the ground truth box, h gt represents the height of the ground truth box, w represents the width of the prediction box, h represents the height of the prediction box.However, there is still room for improvement in the boundary box loss function.For instance, the CIOU does not take into account the orientation between the ground truth and predicted bounding boxes, which results in slower convergence.To address this, we propose replacing the original CIOU loss function with scylla intersection over union (SIOU), which introduces the vector angle between the ground truth and predicted bounding boxes to redefine correlation (Gevorgyan, 2022).The SIOU loss function consists of four components: distance loss, angle loss, shape loss, and IOU loss.
(1) ANGLE COST The angle cost is defined by Eq. ( 7).Its diagram is shown in Fig. 7.  (2) DISTANCE COST The distance cost is defined by Eq. ( 10).Its diagram is shown in Fig. 8.
where (cw,ch) is the width and height of the minimum external matrix of the ground truth box and the prediction box.
(3) SHAPE COST The shape cost is defined by Eq. ( 13). 1 − e −w t θ = 1 − e −w w θ + 1 − e −w h θ (13) where (w,h) is the width and height of the prediction box, w gt ,h gt is the width and height of the ground truth box, θ controls the degree of attention to shape loss.
where A represents the intersection of the ground truth box and the prediction box, B represents the union of the ground truth box and the prediction box.
In boundary box regression, the problem of imbalanced training samples also arises, where the sparsity of the target objects in the images leads to a scarcity of high-quality examples with small regression errors compared to low-quality examples.To concentrate the SIOU loss on high-quality examples, this study considers combining the focal loss, which specifically deals with the imbalance between positive and negative samples, with the SIOU loss (Zhang et al., 2022).Therefore, we propose the Focal-SIOU loss to enhance the performance of the SIOU loss, and it is defined by Eq. ( 17).

EXPERIMENT AND RESULT ANALYSIS
In this section, the data set, evaluation metrics, comparison objects, and methods are described and the experimental results are analyzed to confirm the validity of the improved model.

Datasets
In our experiments, we used two popular public datasets to validate the utility of the proposed method, namely GC10-DET (see Fig. 9) and NEU-DET (see Fig. 10).

Experimental setup
We use the PyTorch deep learning framework to train and test our proposed model.The experimental setup consists of an AMD 15vCPU, RTXA5000 GPU, 24GB of memory, and the SGD optimizer for model optimization.A larger batch size is beneficial as it improves the model's detection performance.Therefore, in this study, we use a batch size of 32 and train for 500 epochs with image sizes set to 640 × 640.During the training process, we employ data augmentation techniques such as random flipping, contrast adjustment, cropping, and scaling transformations to enhance the model's robustness.

Performance evaluation
In industrial production, the accuracy and speed of defect detection are two critical factors.Incorrect results in terms of defect type or location can lead to machine misjudgment.Slow inspection speeds significantly reduce the efficiency of defect detection and could even cause accidents.To address these issues, three measurement values, namely AP, mAP, and FPS, are used to evaluate the strip defect detection model.AP represents the average accuracy for each defect, mAP represents the average accuracy for all categories, and FPS represents the frames per second.These metrics are used to determine whether the model meets the requirements for real-time monitoring.The calculations for these metrics are as follows: In the given context, TP represents the number of defect samples correctly detected, FP represents the number of non-defect samples detected, FN represents the number of defect samples falsely detected, and P and R represent precision and recall respectively.

Comparisons with other methods on GC10-DET
The experimental results on the metal surface defect dataset using the improved YOLOv7tiny model are shown in Table 1.The detection results, illustrated in Fig. 11, display each predicted defect area enclosed in a box, along with the corresponding defect category and confidence level.As depicted in the figure, the improved model accurately locates and classifies defects, predicts their sizes, and exhibits relatively good detection performance even for small defects.
To evaluate the effectiveness of our proposed model, we compared our approach with some recently published and highly effective methods, as shown in   compared to other methods.It also outperformed in terms of accuracy on three commonly challenging defect categories (Os, Ss, In).The improved model in this study reached an mAP of 81, which is a 6.9 increase compared to the improved YOLOv5 mAP.Compared to the improved YOLOv3 model, it had a 9.7 higher mAP and outperformed YOLOXD with a 2.55 mAP advantage.In comparison to LFF-YOLO, the improved model in this study showed an almost 21 improvement in mAP.Compared to DCC-CenterNet, our model achieved a 19.07 increase in mAP and a 4.6 times improvement in FPS.Besides, compared to EDDN, our model showed significant improvements in mAP (from 65.10 to 81) and FPS (from 30.3 to 144.9).Although FANet achieved the highest accuracy in defect categories Wf, Pu, Cg, and Ws, its mAP was 0.5 lower than our proposed model, and its detection Figure 12 Comparison with some of the latest target detection algorithms on GC10-DET.Additionally, the detection accuracy and speed of the baseline model are elaborated in detail in the Table 5 (mAP: 70.2, FPS: 181.8).After considering the trade-off between two important metrics, mAP and FPS, our method achieves optimal performance.Full-size DOI: 10.7717/peerjcs.1727/fig- 12 speed was far inferior.Compared to RDD-YOLO, our model increased the mAP from 75.2 to 81 and the FPS from 57.5 to 144.9.Finally, as shown in Fig. 12, it can be concluded that our proposed model achieved the best performance in terms of speed and accuracy.

Comparisons with other methods on NEU-DET
In order to further investigate the effectiveness and generalization performance of the improved model, experiments were conducted on NEU-DET, and the results are shown in Table 3.The detection results are illustrated in Fig. 13, and detailed comparisons with the latest methods can be found in Table 4.It is evident that our model achieves an mAP of 80.1 while maintaining a high detection speed.In comparison to the improved YOLOv3 model, our model shows significant differences in terms of mAP and accuracy in detecting most categories of defects.Our model's mAP is 7.7 higher than EDDN, 2.6 higher than the improved YOLOv5 model, 0.87 higher than LFF-YOLO, 0.69 higher than DCC-CenterNet, and 0.7 higher than MSC-DNET.Additionally, our model demonstrates the best detection speed among all models listed in the table, surpassing the second-highest FPS by nearly 35 points.When compared to RDD-YOLO, our method remains competitive, with slightly lower accuracy but with an increased FPS from 57.8 to 106.4.Furthermore, Table 4 includes the FANet model, which has the highest mAP and the best accuracy in detecting defects across all categories.However, this model has a fatal drawback-its FPS is only 34, In practical production environments, the performance of devices can be influenced by external interferences, leading to fluctuations in the model's detection speed.Consequently, the marginal difference between an achieved frame rate of 34 frames per second (fps34) and the industry-recognized real-time detection threshold of 30 fps is insufficient.It is evident that this level of performance falls short of achieving true real-time detection capabilities in an industrial context.Therefore, deployment in a genuine industrial setting is unfeasible.
The above data indicates that our model is effective in detecting various defects while maintaining a fast detection speed.

Ablation study
Our experiment was conducted on two mentioned datasets, and we used ablation experiments to demonstrate the effectiveness of each improvement.To validate the 3. YOLOv7-tiny with the DAC module and the AWFP-Add module is referred to as CA-YOLO.
4. YOLOv7-tiny with the DAC module, AWFP-Add module, and Focal-SIoU loss function is referred to as CAF-YOLO.

Experiments on GC10-DET
From Table 5, it is evident that our improvements have been effective.The original YOLOv7-tiny model (baseline) achieved an mAP of 70.2.In comparison, C-YOLO outperforms it with a 4.3 higher mAP.Furthermore, C-YOLO demonstrates superior detection accuracy than YOLOv7-tiny in all defect categories except for Pu.CF-YOLO and CA-YOLO achieve mAPs of 75.3 and 77.1 respectively, surpassing the original model by 5.1 mAP and 6.9 mAP.CAF-YOLO attains an mAP of 81%, the highest among them.Regarding specific defect types, CAF-YOLO boasts the highest performance in all defect detections except for Wl.The comparison of our improvement with the baseline model  on the PR curve is shown in Fig. 14, confirming that our modification can identify various defects.CF-YOLO increases the mAP by +4.3 to +5.1, indicating that the mode of Focal Loss+SIoU to some extent mitigates the issue of imbalance between positive and negative examples in bounding box regression, as well as inaccurate regression boxes.By introducing learnable parameters in the Add module and incorporating the architecture of adaptive weighted fusion paths, the improvement range expands from +4.3 to +6.9, reflecting the positive impact of the AWFP-Add module.

Experiments on NEU-DET
To further investigate the effectiveness and robustness of our proposed method, we conducted ablation experiments on NEU-DET.Based on the displayed detection results, we conclude that low contrast is one of the reasons for detection failure.In some cases, although the network accurately identifies defects, the boundaries of these defects are blurry, resulting in a single defect area being detected as two or more adjacent defects, or multiple adjacent defects being detected as one defect.To address such situations, optimization of the defect annotation information in the dataset itself is necessary to avoid unnecessary problems caused by the quality of dataset labeling.Additionally, enhancing the feature extraction capability of the network can alleviate boundary confusion.Moreover, YOLOv7-tiny also possesses the ability of instance segmentation, which can be used for pixel-level defect detection.

CONCLUSION
In this article, we propose a novel detector for defect detection on metal surfaces, which is based on the YOLOv7-tiny model, with improvements on the Backbone and Head sections.The improved YOLOv7-tiny model demonstrates excellent performance in terms However, for some obscure and minor defects, the performance of the model in this paper needs further improvement, and in the next study, we will further improve the network structure, such as the attention mechanism and some powerful feature extraction methods.

Figure 2
Figure 2 The architecture of the proposed defect detection network.Full-size DOI: 10.7717/peerjcs.1727/fig-2 using colored boxes.The addition of the AWFP-add module in a light purple box strengthens the model's feature extraction and cross-scale fusion abilities within the Head network.The replacement of the ELAN-T structure with the DAC module

Figure 5
Figure 5 Add module design.(A) The computational flow of the ordinary Add module.(B) The computational flow of Add module with learnable parameters.Full-size DOI: 10.7717/peerjcs.1727/fig-5

Figure 16
Figure 16 Failure cases of defect detection.(A, B) The case where different defect images are marked as 1 defect and the detection results is multiple defects.Figure source credit: GC10-DET database.Full-size DOI: 10.7717/peerjcs.1727/fig-16

Table 2 .
According to Table 2, our proposed model achieved the highest mAP and the fastest detection speed

Table 2 Detection results of state-of-the-art methods on GC10-DET.
Bold values indicate the maximum value of the comlum.The larger the value, the better the detection effect.

Table 4 Detection results of state-of-the-art methods on NEU-DET.
Bold values indicate the maximum value of the comlum.The larger the value, the better the detection effect.

Table 5 The Ablation experiments on GC10-DET.
Bold values indicate the maximum value of the comlum.The larger the value, the better the detection effect.
Table6presents the results, showing that the baseline model YOLOv7-tiny achieved an mAP of 70.9.Meanwhile, C-YOLO, CF-YOLO, and CA-YOLO achieved mAP scores of74.2, 76.3, and 76.7, respectively.The highest mAP is implemented by CAF-YOLO, and its comparison with the base model on the PR curve, as shown in Fig.15, most accurately identifies defects such as Patches, Scratches, etc.The results indicate that when we use the DAC module, the accuracy of most types of defect detection increases.Nevertheless, the AWFP-Add module achieved a 2.5 mAP boost, and the use of Focal-SIoU resulted in a performance improvement of 2.1 mAP, which fully demonstrates the effectiveness of our modified Backbone and Head components.

Table 6 The ablation experiments on NEU-DET.
Bold values indicate the maximum value of the comlum.The larger the value, the better the detection effect.