An improved Faster R-CNN for defect recognition of key components of transmission line

: In a national power grid system, it is necessary to keep transmission lines secure. Detection and identification must be regularly performed for transmission tower components. In this paper, we propose a defect recognition method for key components of transmission lines based on deep learning. First, based on the characteristics of the transmission line image, the defect images are preprocessed, and the defect dataset is created. Then, based on the TensorFlow platform and the traditional Faster R-CNN based on the R-CNN model, the concept-ResNet-v2 network is used as the basic feature extraction network to improve the network structure adjustment and parameter optimization. Through feature extraction, target location, and target classification of aerial transmission line defect images, a target detection model is obtained. The model improves the feature extraction on transmission line targets and small target component defects. The experimental results show that the proposed method can effectively identify the defects of key components of the transmission lines with a high accuracy of 98.65%.


Introduction
Electricity has gradually become an indispensable energy source in our lives and production. Scholars worldwide have carried out considerable research work on fault detection and key accuracy of network model recognition, the optimal network model for transmission line defect recognition based on Faster R-CNN is obtained.
In this paper, according to the characteristics of defect images of key components of transmission lines, the image is preprocessed, and the defect dataset of the key components of the transmission line is made. Under the TensorFlow platform, the excellent classification network of the Perception-ResNet-v2 network is optimized. As the basic network of the two-stage detector fast RCNN, it is used for defect recognition of key components of transmission lines, and it is the most effective in the experimental results compared with different algorithms; this method can effectively improve the accuracy and reliability of defect identification of key components in transmission line inspection.

Basic principles
Fast R-CNN [16] needs to search a large number of fault node feature data in the application, which leads to a long process and low fault recognition efficiency. To strengthen the fault detection ability of the method and further improve the detection efficiency, reference [17] improved the method, and proposed a new Fast R-CNN detection method, which used the regional proposal network (RPN) to identify the defects, thus greatly improving the efficiency of defect identification. The combination of RPN and Fast R-CNN forms a Fast R-CNN detection link, as shown in Figure 1.
As seen in Figure 1, Faster R-CNN first inputs the image to be detected into a convolutional neural network (such as VGG- 16). After arriving at the shared convolution layer, the feature map is divided into two directions. One direction is used as the input of RPN to generate candidate regions, and the other continues to arrive at the unique convolution layer to generate a higher-dimensional feature map. In the practical application of RPN, the fault area range of insulators is obtained by inputting the characteristic graph of the fault node in the sliding window area, and by evaluating the score of fault elements in the area, the non-maximum value is inserted to suppress the fault parameters, and approximately 300 fault candidate areas are screened out and sent to the ROI pooling layer. The ROI pooling layer not only contains the RPN candidate regions but also integrates RPN Sliding window the volume base high-dimensional feature data, which can extract the key features from the target region. Finally, the fault node data are obtained through border regression and region classification to realize the analysis and identification of fault range and fault type.

2.2．Regional recommendation network optimization
The essence of the RPN network is a sliding fault window, which contains a large number of fault pixels. Each pixel has an independent scale, making it a multi-scale anchor. Because the target of self-explosion defect of the transmission line identified in this paper is small, three different scales (162, 322, 642) and different aspect ratios (1:1, 1:2, 2:1) are adopted in this paper. There are nine kinds of anchors in total. The coordinate position of the original image is obtained by translating the anchor coordinates. The output nine anchor classes have specific position coordinates, and each coordinate has an independent foreground and background. The window regression layer can output nine anchors at all positions, corresponding to the parameters (x, y, w, h) of the moving and zooming windows, where x, y, w, h represent the center point coordinates and width height of the window respectively, as shown in Figure 2. To train the RPN network better, this paper assigns class labels {foreground, background} to each anchor and uses the non-maximum suppression method to obtain the prediction box. For positive labels, the anchor corresponds to the maximum value of the overlapped IOU and the anchors corresponding to the IOU greater than 0.7 of any labelled border; for negative labels, the IOU of any labelled border is less than 0.3 The corresponding anchor cannot be used as a positive or negative label, and the anchor in the convenient position of the image can be ignored. Finally, RPN

Inception-ResNet-V2 structure
In this paper, Inception-ResNet-V2 is introduced into the research of defect identification of key components of transmission lines, and the basic network structure of defect identification is constructed. Inception-ResNet [18] integrates the Inception network structure and ResNet network structure, which is essentially a combination of the two network structures. The network structure model of the Inception [19] series is a model proposed by Google. It does not limit its use of specific convolution cores but uses all convolution cores of different sizes simultaneously and then stitches the resulting matrix together. To further improve the computational efficiency, it is necessary to introduce a 1 × 1 convolution kernel before the 3 × 3 junction to simplify the number of convolution kernel channels and optimize the convolution dimension. After multiple convolution processing, the stem structure generated by Inception V4 is also processed twice by pooling. Pooling takes the convolution pooling parallel structure proposed by Inception-v3 [20] as the core, which can greatly reduce redundant convolution parts. ResNet [21] is a 152-layer residual neural network structure proposed by Kaiming he in the ImageNet competition. For more complex problems, a deeper neural network is needed, but deeper networks cause problems such as overfitting, gradient dispersion, and gradient explosion. The residual neural network introduces a shortcut structure, which adds the input layer transfer and convolution results, and the problem of gradient dispersion caused by too deep of a neural network is alleviated. The residual element is shown in Figure 3.

Weight layer
Weight layer Inception-ResNet-V2 is not only the optimization and improvement of InceptionV4, but also the introduction of ResNet. The architecture of the model is divided into input blocks, Inception-Resnet-C, and Reduction-B blocks. The network structure composed of Inception-Resnet-A and Inception-Resnet-B is improved in both accuracy and operation efficiency.
The operation efficiency is almost the same as that of Inception V4, and the operation precision is higher, and the operation result is closer to reality. The overall structure is shown in Figure 4.

Multi-scale feature fusion
The conventional structure of Faster R-CNN is decomposed into multiple levels by pooling layer processing, and each level has an independent scale, which makes it achieve higher recognition accuracy. However, there is a problem: because the different depths in different layers affect the size of the receptive field in each layer, the extracted image feature information also has different characteristics. In the low-level structure, it is easy to achieve more integrity, so it has more favorable information for target positioning. However, in the high-level structure, it is more inclined to obtain the information of target classification, so there is a large gap between the semantic information of high-level and low-level. Therefore, to obtain the features that can be obtained by using as many layers as possible, they should be fused to achieve the best target detection results.
The commonly used scale fusion learning structure mainly consists of four types of graph scales: image pyramid, multi-level feature map, multi-scale feature fusion, and feature pyramid. Among them, the image scale under the image pyramid structure mainly realizes the expression of image scale information through the size of the image resolution, which has great time and space costs. For the Faster R-CNN network, the training time may be too long or the memory may be insufficient. predict the change trend and direction of image features and needs very few image feature points to be calculated. According to the feature pyramid principal, this paper optimizes the structure of the traditional Inception-ResNet network, as shown in Figure 5, and outputs characteristic graphs of Inception-ResNet-A and Inception-ResNet-B are dimensionalized to 1,792 of Inception-ResNet-C by a 1 × 1 convolution kernel. Inception-ResNet-C processes the image, outputs the image with feature samples, and samples the image features more than twice. Then, Inception-ResNet-B integrates and optimizes the image features. Finally, by fusing with P1, the mixed features are obtained based on 3 × 3 convolution processing and are introduced into the RPN layer.

Convolution layer optimization
By comparing and observing the defects of the key components in the image, it can be found that the size of the target defect image is small, as shown in Figure 6. When the convolution layer adopts the minimum convolution kernel, it is difficult to identify the defect features. To improve the recognition accuracy, the following structure optimization is proposed for the traditional Inception-ResNet-V2 network. In the Inception-ResNet-A module, the convolution kernel size is 3 × 3, which can be refined into two independent convolution kernels of 1 × 3 and 3 × 1, thus simplifying the calculation process of defect characteristic parameters. As shown in Figure 7,  For Perception-ResNet-B, the original 1 × 7 convolution core is changed to convolution core 1 × 5, as shown in Figure 8, and a new branch is added. The 1 × 3 convolution structure is adopted, that is, the width of the convolution layer is increased, and multiple receptive fields are used to capture different levels of feature information. Finally, they are summarized together.  Since a large number of 3 × 3 and 5 × 5 convolution kernels are used in Inception-ResNet-A and Inception-ResNet-B, we hope that they can extract higher-order abstract features in Inception-ResNet-C. therefore, the 3 × 3 kernel in the original Inception-ResNet-C is replaced by 7 × 7 kernels, and a 5 × 5 branch is added to aggregate large-area feature information in the image, as shown in Figure 9.

Image preprocessing
Due to the UAV in the process of outdoor high-altitude shooting, it is easily affected by wind, rain, light and other factors, resulting in image blur, uneven light and shade, noise pollution and other problems, so that the image quality is not high. If these data are directly used as the input of subsequent image recognition deep learning, it will often affect the experiment because it cannot provide enough features. The image obtained in this paper is affected by the weather, and there are different light and dark; in the process of outdoor image shooting, there must be noise pollution and light interference. To reduce the negative impact on the image quality, the aerial image is preprocessed.
In the process of histogram equalization, the most critical step is to calculate the probability of the grey value in different image regions and then make the only image. Combined with the image stretching processing, the number of pixels in a certain range is the same to achieve the effect of peak contrast enhancement and valley bottom contrast weakening. The method of color image histogram equalization is the same as that of grey images. Both of them distinguish and recognize the image regions of red, green, and blue. The process of histogram equalization is as follows: If the grey level of the original image at ( , ) is and the grey level after equalization is , the process can be expressed as mapping the grey level at ( , ) to . The mathematical method can be defined as formula (1).
The mapping function ( ) needs to meet the following two conditions at the same time: (1) The value range of T increases monotonically on the grey level 0～L-1; (2) The value range of T is between 0 and L-1, where L = 256.
There is an important function in image processing, as shown in formula (2), which meets the above conditions. This function is called CDF (cumulative distribution function), and its core function is to describe the different variables in a specific image area distribution probability.
Since the image pixel distribution is a discrete function, formula (2) can be written as formula (3): where ( ) represents the probability of the grey level occurring in the image. Assuming that the total number of pixels in the image is and the number of pixels in each grey level in the histogram is ℎ( ), it is finally expressed as formula (4).
The image is processed by the above method, and the result after the histogram equalization is shown in Figure 10.  It can be seen in Figure 10 that compared with the original image, the characteristics of the insulator are more obvious in the picture after image enhancement, and the background is more prominent. By adjusting the image brightness, some small and unimportant parts in the background can be removed, thereby weakening the background.

Dataset preparation
The  Figure 11.
In this paper, open source LabelMe software is used to mark all the defect images collected, as shown in Figure 12. The peripheral box of the defect location is marked, the coordinate position of the peripheral box is recorded, and the category label is assigned. In the image, 80% of the region is defined as the image training set, and 20% of the region is defined as the test set. After statistical analysis, a total of 1,008 key component defects were marked.

Description of defect identification
This paper mainly identifies the defects of insulators, bird nests, and anti-vibration hammers in transmission lines. The specific description is shown in Table 1. Self-explosion defects of insulators, corrosion of anti-vibration hammers, and bird's nest defects are identified. The number of each defect in the dataset is also shown in Table 1.

Training process and parameter setting
The training process of the improved Faster R-CNN model in this paper follows the following four steps. (1) The improved pre-training Inception-ResNet-V2 model is used to initialize the convolution network. First, the RPN network is trained independently, and then the first_stage_nms_iou_threshold is obtained. The initial parameter of the threshold is set to 0.7, and first_stage_max-proposals are obtained. When max proposals are set to 300, 300, candidate boxes are obtained, and the Faster R-CNN network learning rate is set to 0.0002. (2) The pre-trained Inception-ResNet-V2 model is used to initialize the convolution network, and the candidate region in step (1) is used as the input to initialize the Faster R-CNN network. The two networks do not share the convolution layer. (3) The convolution layer parameters updated by using the loss function in step (2) are used to redefine an RPN network, which is a convolution network, the learning rate of the R-CNN network is set to 0.0002, and only the RPN network is updated. (4) The candidate region generated in step (3) is taken as the input, and the updated convolution layer in step (2) is sent to the Faster R-CNN network to form a unified network. After extracting the internal features of the network, the RPN is used to generate the candidate region, and the Faster R-CNN network identifies and locates the features in the candidate region.
Through the above process, the RPN network and Faster R-CNN network share a convolution layer, and other parameter settings in the training process are shown in Table 2.

Experimental environment
The experiment is based on one of the most popular deep learning frameworks, TensorFlow, which is a convenient, efficient, extensible, lightweight software that can run on multiple platforms. The software is easy to install, easy to learn, developed and maintained by Google and has a strong community. It can refer to and solve a series of problems in project implementation. It supports multiple GPUs to accelerate the training process. Its built-in TensorBoard visualization software can track the topology and performance of the network. Therefore, model problems can be identified as soon as possible and solved in time. The software and hardware configuration used in the experiment is shown in Table 3.

5.2．Evaluation index
In the process of comparing the application effects of all target detection technologies, it is not only necessary to compare the detection results with the naked eye, but also to select some objective evaluation indicators as the evaluation basis for the validity of the detection results. In most cases, AP (average precision index), mAP (mean average precision), and recall rate (recall rate) can be used as control indicators.
Assuming that there are only two types of test results, positive samples, and negative samples, the test results can be divided into four cases, as shown in Table 4. Accuracy indicates how many of the results predicted to be positive samples are true samples, and it indicates the performance of the detection model on the ability to determine the target category, its calculation formula is shown in formula (5).
The recall rate indicates how many of the positive samples in the sample are correctly predicted. It measures the ability to detect the model on the ability to recognize the target. Its calculation formula is shown in formula (6).
When there are multiple targets in the detection task, the average accuracy of each target is often different. Therefore, to comprehensively consider the detection effect, the average accuracy of all targets can be used as the basis for solving the average value. The area under the PR curve is defined as AP, as shown in formula (7). The mean value of AP under all categories is calculated to obtain the map value, as shown in formula (8).

5.3．Comparative analysis of experimental results
In this paper, the Faster R-CNN training method and Inception-ResNet-V2 pre-training method are used to process the image in the process of image feature recognition of key components of transmission lines, and the best performance of Faster R-CNN is obtained. The results show that the accuracy recognition rate of this method is 98.65%, and the recall rate is 96.45%. The effective test time of a single image is 676 ms, which shows that this method has high accuracy and efficiency and has strong practicability.
To prove the accuracy and effectiveness of the algorithm, the combination of different feature extraction networks and detection models is used in the transmission line defect recognition image, and the comparison results of accuracy and recall rate under different combination models are obtained. Through the result analysis, as shown in Table 5, in terms of test time, the test speed of the SDD model is faster than that of Faster R-CNN, but the recognition accuracy is not high. The algorithm in this paper is higher than other combined models in accuracy, and the test time is also within the acceptable range, so the combination model of Faster R-CNN and Perception-ResNet-v2 can obtain a better recognition effect. Table 5. Different combinations of model accuracy Compare.
The loss function can reflect the learning quality of the model. By observing the convergence process of the loss curve, the model gradient disappearance and gradient explosion can be understood. TensorBoard is used to observe the loss curve that appears during the training process to determine the training completion. Figure 13 shows the loss curve obtained after 200 000 iterations. In the early stage of training, the drop speed of the loss value is faster, and when it reaches 0.08, it goes into a fluctuating state. The more iterations, the smaller the loss value. After 10 K iterations, the numerical fluctuation decreases. Although some data have large fluctuations, the impact on the overall convergence effect is very limited. Therefore, the learning effect of the model can be evaluated based on the change characteristics of the loss curve. Figure 14 shows the test time and accuracy of each comparison model in this paper.

Conclusion
This paper first analyses the traditional Faster R-CNN target recognition method and then improves the traditional method. Considering the characteristics of the image defects of the key components of transmission line that need to be identified, Inception-ResNet-V2 is used as the feature extraction pre-training network of Faster R-CNN, and the network optimization and parameter adjustment of the pre-training network are carried out. The experimental results show that the improved network can effectively improve the operation efficiency of the network, and the recognition accuracy of the small target of transmission line faults also significantly improved, reaching 98.65%, which provides a new idea for improving the power grid inspection efficiency and ensuring normal operation.