Abstract
Non-ferrous metals are very important strategic resources, and electrolysis is an essential step in refining non-ferrous metals. In the electrolysis process, plate short circuit is the most common fault, which seriously affects output and energy consumption. The rapid and accurate detection of faulty plates is of great significance to the metal refining process. Given the weak generalization ability and complex feature rule design of traditional object detection algorithms, and the poor detection effect of existing deep learning models in infrared images with many interference factors, an improved Mask R-CNN-based fault detection algorithm is proposed to improve the generation strategy and non-maximum suppression algorithm of proposals to reduce the missed detection. We also propose a globally generalized intersection over union loss function to characterize better the position and scale relationship between the predicted box and target box, which is beneficial to the bounding box regression. The experimental results show that the improved model has an accuracy rate 10.4% higher than the original model, reaching 86.8%. Compared with the common one- and two-stage object detection models, the improved model has a stronger detection ability. This algorithm has some reference value for the accurate detection and location of electrolytic cell faults.
Export citation and abstract BibTeX RIS
1. Introduction
Nowadays, non-ferrous metals have become necessary for developing a country's economy, technology, and national defense industry. The electrolysis process is an essential step in refining many non-ferrous metals, which directly affects the productivity and quality of finished metals [1].
In the electrolysis process of copper, lead, zinc, and other metals, cathode and anode plates are arranged parallelly in a cell [2]. As shown in figure 1, plates are uniformly conductive through the busbar. As electrolysis proceeds, the anode and cathode plates may contact each other, and the two plates are usually 2–4 cm apart. As shown in figure 2, the most common situation is that raised particles are formed on the surface of the plate due to impurity attachment, resulting in abnormal contact between the cathode plate and the anode plate. Abnormal contact between the two plates will lead to a short circuit [3–5]. It not only consumes a lot of electric energy, but also reduces the output of the metal. Therefore, it is of great significance to detect faults in time.
Download figure:
Standard image High-resolution imageThe current fault detection methods include indirect and direct detection. The indirect detection methods mainly include sprinkling tanks and supporting meters to detect magnetic fields. The temperature or magnetic field is used to reflect short-circuit conditions. Although this method is simple in principle, convenient in operation, and low in cost, it is difficult for its sensitivity, accuracy, and real-time performance to meet the requirements. The direct detection method measures the voltage and current of the electrolytic cell with a sensor. Although it can directly reflect the operating status of the electrode plates, the sensors and detection lines are easily corroded in a harsh environment. The plate's temperature will increase during a short circuit, and infrared imaging technology can be used to obtain the temperature of the electrolytic cell surface in all weather, no contact, and real time [6, 7]. Therefore, fault detection, classification, and location based on the infrared image are relatively efficient and straightforward methods.
The temperature of the electrolytic cell can be directly reflected in the infrared image, but it is difficult to detect faulty plates in the infrared image quickly, accurately, and entirely because of complex factors such as irregular cover, acid fog, water vapor spreading, uneven heating of the electrode plate, and local heating of the busbar [8]. The traditional methods are region selection based on sliding windows and manual design of feature extraction rules [9, 10]. Ojala et al [11] proposed a local binary pattern (LBP) method which takes the pixel point in the center of the local area as the threshold and compares it with the surrounding pixel value to form a binary LBP code. Each pixel of the original image is operated to obtain an LBP map. The map is divided into small regions, and the histogram of small regions is counted to make it into a feature vector, which is input into the classifier for classification. Based on scale-invariant feature transform, Ke et al [12] used principal component analysis to calculate the feature descriptor to make its features more prominent. The experimental results show that this method has higher detection performance. These methods have no pertinence and weak generalization ability, which can easily cause false and missed detection.
Object detection technology based on the neural network has been widely used in engineering and has an excellent detection effect [13]. At present, the mainstream object detection algorithms based on deep learning mainly include one-stage algorithms such as YOLOv3 [14] and SSD [15], and two-stage algorithms such as Faster R-CNN [16] and Mask R-CNN [17]. Generally, the accuracy of two-stage algorithms is better than one-stage algorithms, so Mask R-CNN is selected as the basic model.
For many specific object detection problems, the feature input and bounding box regression stage need to be modified according to the actual scene [18]. To improve the detection accuracy, Bodla et al [19] proposed a soft-non-maximum suppression (NMS) algorithm to reduce the confidence of the high coincidence proposals. However, because the proposal of all positive samples is retained, the number of calculations increases. To carry out accurate positioning and speed up the loss convergence speed, Yu et al [20] proposed to use intersection over union (IoU) loss as the bounding box regression loss. The detection effect is greatly improved. However, regression training cannot be performed when the proposal and the target box do not intersect or overlap completely.
Given these problems, combined with the specific characteristics of electrode plate faults, we propose a fault detection method based on Mask R-CNN. This algorithm has three primary improvements as follows:
- (a)
- (b)
- (c)Propose a globally generalized IoU loss function. When adjusting the regression parameters of the boundary box, the sensitivity of the loss function to the scale and distance relationship between the target box and predicted box is increased, and the loss can be calculated according to the coincidence state, relative position, and scale relation of the two boxes.
2. Approach
2.1. Mask R-CNN architecture
The Architecture of Mask R-CNN, shown in figure 3, is mainly composed of feature extraction network, region proposal network (RPN), RoI Align layer, and classification and regression layer.
Download figure:
Standard image High-resolution imageThe process is: (a) obtain the feature map by ResNet50 and FPN, the size is m × n × d; (b) divide the input image corresponding to the feature map into m × n regions and generate k proposals in each region; (c) filter proposals according to the NMS algorithm; (d) project the proposals on the feature map to obtain the corresponding feature matrix; and (e) scale each feature matrix to a uniform size through the RoIAlign layer. Then, the prediction results are obtained through a series of fully connected layers.
2.2. Feature extraction network
Extracting image depth features is an important step in the entire model training. This paper uses ResNet50 combined with FPN as a feature extraction network. Using multi-layer convolution can extract image features more efficiently, and the residual structure solves the problem of gradient disappearance. Compared with VGG [24] and ZF [25], the overall network performance has great advantages. FPN generates a new set of feature layers based on the RestNet50, and each feature layer is the result of the fusion of different stages of convolutional layers in RestNet50.
FPN combines deep and shallow feature fusion with multi-resolution prediction by connecting high-resolution, low-semantic shallow features to low-resolution, high-semantic deep features to achieve good detection results for multi-scale targets. Its structure is shown in figure 4. The output of the residual block of levels 3–5 of ResNet50 on the left is taken as the feature of FPN, denoted as (C3, C4, C5). (M5, M4, M3) are the intermediate layers obtained by upsampling from the deepest layer, and then connecting them to (C5, C4, C3) to fuse multi-scale features. In order to reduce the impact of upsampling, the layer must perform a 3 × 3 convolution operation. Finally, a multi-scale feature map is obtained, denoted as (P3, P4, P5).
Download figure:
Standard image High-resolution image2.3. RPN working principle
In the early two-stage object detection algorithm, the selective search algorithm [26] was used to generate predicted boxes by clustering similar regions. This process is very time-consuming and generates a lot of redundant boxes, which seriously affects the detection efficiency. The RPN proposed in Faster-RCNN effectively reduces the generation time and number of proposals, and there is no need to repeatedly calculate the feature map corresponding to each predicted box, which greatly improves the recognition efficiency. As shown in figure 5, on the feature map, each pixel is used as an anchor point to generate k proposals. In the infrared image of the electrode plate, the size and shape of the faulty target are relatively fixed. In order to make the proposal have a higher initial matching degree with the target fault and reduce the amount of subsequent bounding box regression calculations, the initial scale of the proposal is set to (32 × 32, 64 × 64, 128 × 128) and the initial aspect ratio is set to (1:4, 1:8), so the k in this paper is 6. Because FPN is used to fuse the deep/shallow features, the three different initial scales directly correspond to (P3, P4, P5).
Download figure:
Standard image High-resolution imageIn figure 5, a k proposals are generated in total at the corresponding pixel points of each feature layer, and then the proposals are input into the classification layer and regression layer; 2k class scores and 4k bounding box regression parameters are obtained. Based on the class scores, the NMS algorithm is used to eliminate the redundant proposals, and a part of the remaining proposals is sampled for RPN training.
2.4. Improved NMS algorithm
The NMS algorithm is the most commonly used post-processing algorithm in object detection, and is mainly used to eliminate redundant proposals. The NMS algorithm flow is shown in figure 6. When the initial proposals generated by the RPN layer are screened, the proposal with the highest score is retained, the other proposals are sorted by score, and the IoU of the proposal and the highest score proposal, respectively, are judged. If the IoU is greater than the threshold, it will be eliminated until all the proposals are filtered.
Download figure:
Standard image High-resolution imageThe traditional NMS algorithm uses a greedy strategy to eliminate the proposal, which may cause missed detection. As shown in figure 7, when multiple plates fail, the predicted box of adjacent targets may be eliminated as a redundant box. The reserved predicted box is not the most appropriate.
Download figure:
Standard image High-resolution imageTo solve this problem, we propose a multi-stage Gaussian penalty NMS algorithm. When the IoU of the current box and the highest score box exceeds the upper threshold (), it can be considered as a complete overlap and the current box is eliminated. The remaining proposals are kept, but the Gaussian penalty is given according to the degree of overlap. When IoU is less than the lower threshold (), the score is not adjusted (different target). The improved NMS algorithm is as follows:
where is the proposal with the highest score, is the current proposal to be screened, is the original score of the current box, and is the score after reassignment, set to 0.65. When is 0.4 and is 0.85, the precision and recall rate can reach a high level, and the redundant boxes can be efficiently eliminated.
2.5. G2-IoU loss mechanism
The original loss function of the Mask R-CNN model is shown in equation (2). It consists of three parts, namely class loss, bounding box regression loss, and mask loss. Since there is no need to perform semantic segmentation on the target in this paper, mask segmentation is not trained to reduce the amount of calculation
where is class loss, and it can be computed by:
where is the class probability distribution predicted by the classifier and is the corresponding real class label.
is the bounding box regression loss, and is computed by smoothL 1 norm. The following formulas are used to compute
where is the regression parameter of the corresponding class predicted by the boundary regressor, and is the bounding box regression parameter of the real target.
It should be noted that when the smoothL 1 norm is used as the loss of the bounding box regression, the loss is smaller, but the IoU value is not necessarily higher. As shown in figure 8, when the smoothL 1 norm is all 3, the IoU values are quite different.
Download figure:
Standard image High-resolution imageIoU loss directly performs bounding box regression according to the evaluation criteria and accurately describes the distance and scale relationship between the two boxes. Compared with smoothL 1, it has excellent performance, but there are also many problems. When the predicted box and the target box do not overlap or are fully contained, the relationship between the two boxes cannot be correctly reflected, resulting in no gradient backhaul and thereby weakening network performance. Based on this, we propose the G2-IoU loss mechanism, namely the Globally Generalized IoU loss. G2-IoU considers the coincidence rate of the target box and the predicted box, and considers the distance and scale factors between the two boxes. G2-IoU is as shown in equation (6). In G2-IoU, penalty terms based on the Euclidean distance between the center points of the two boxes and the diagonal distance of the smallest bounding box [27], and on the scale of the two boxes, are added to the algorithm. Thus, the sensitivity of the loss function to distance and scale is increased, which is more conducive to the accurate positioning of the predicted box
The second term, , is the normalized Euclidean distance penalty term, and its formula is as follows:
where is the Euclidean distance of the coordinates of the center point between the target box and the predicted box, while is the diagonal distance of the smallest bounding box of the two boxes.
Figure 9 shows that no matter what state the two boxes are in, the bounding box loss can be quickly regressed through . Even if the two boxes are completely contained and the is fixed, the loss can be calculated by .
Download figure:
Standard image High-resolution imageThe third term, , is the penalty term based on the scale of the predicted box and the target box (scale includes shape and area). Assuming that the main diagonal coordinate of the predicted box is and that of the target box is , the following formulas are used to compute
where λ is the penalty equilibrium coefficient, δ is the shape similarity coefficient of the two boxes, and τ is the area similarity coefficient of the two boxes. When the difference in shape is great, the priority is to calculate the loss based on IoU, Euclidean distance, and shape similarity, so τ is effective only when δ is greater than 0.5.
When the centers of the two boxes overlap, and become fixed values, and the punishment based on the distance between the two boxes becomes invalid. At this time, the loss can also be calculated according to the scale of the two boxes.
In summary, the final bounding box regression loss function G2-IoU loss is:
When the two boxes coincide with each other, the loss is 0. When the two boxes do not coincide at all, the loss is non-negative and bounded.
3. Experiments
The experimental environment is Windows10-based Pytorch1.8, server configuration is Xeon Gold 6230R@2.1 GHz, memory is 256 G, and GPU is 24 G NVIDIA RTX3090.
3.1. Experimental data
The data was collected from the electrolysis workshop of a smelter in Hunan Province, China. After rectifying and segmenting the original infrared images, 2000 electrolytic cell images with a size of 155 × 548 × 3 were obtained. Figure 10 shows the segmented single-cell image.
Download figure:
Standard image High-resolution imageThe dataset is divided into the training set, validation set, and test set, among which 1280 are training set, 320 are validation set, and 400 are test set.
As shown in figure 11, the faults are divided into three levels: first-level (F1), second-level (F2), and third-level (F3). In order to enrich the target types and facilitate fault location, the electrolyte inlet (In) and outlet (Out) are also marked in the dataset.
Download figure:
Standard image High-resolution image3.2. Experimental model
In order to verify the fault detection effect of Mask R-CNN and the effectiveness of the proposed improved algorithm, this experiment uses three algorithms for comparison: (a) original Mask R-CNN; (b) original Mask R-CNN algorithm but the bounding box loss function uses G2-IoU loss, denoted as Mask R-CNN + G2-IoU; (c) improved Mask R-CNN. ResNet50 and FPN are used to extract feature maps for all three models. The SGD optimizer is used for training. The learning rate is 0.005, momentum factor is 0.9, and weight decay value is 0.0005. The batch size is 4, epoch is 25, and number of iterations is 8000. In order to facilitate training, the image size will be adjusted to 800 × 800 during input.
In addition, the improved model proposed in this paper is compared with other popular object detection models on the same test set, such as Faster R-CNN and YOLOv3, so as to show the detection capability of the improved model more comprehensively.
3.3. Evaluation index
The evaluation index of this experiment adopts the evaluation index of the COCO data set commonly used in the target detection field, that is, the average precision (AP) and the mean average precision (mAP) under different IoU thresholds.
AP is an important indicator for evaluating the detection effect of a single class. It can be calculated through precision (P) and recall (R). P refers to the percentage of the number of targets correctly detected by the model in the total number of detected targets, and R refers to the percentage of the number of targets correctly detected by the model in the total number of targets of this type. mAP is the average AP of all target categories. The calculation formula of each indicator is as follows:
where is the number of target categories detected correctly, is the number of non-target categories detected incorrectly, is the number of target categories not detected, and is .
3.4. Result analysis
The loss convergence curves of the three models are shown in figure 12. It can be found that the improved model starts to stabilize after the 600 iterations and finally converges at about 0.08. The loss converges faster than the original model and the loss value is smaller, which proves the effectiveness of the G2-IoU loss function of the improved model.
Download figure:
Standard image High-resolution imageThe mAP curves are shown in figure 13. The mAP of the improved model increased rapidly in the first seven epochs; then there was a slight increase as the number of iterations increased and eventually stabilized at around 0.86. The improved model is significantly improved compared with the original models.
Download figure:
Standard image High-resolution imageTable 1 shows the detection results of the three models used in the experiment on the test set, including the average precision (AP) of each class when the IoU threshold is 0.5 (AP0.5) and the mAP under different IoU thresholds (mAP0.5:0.95, mAP0.75, mAP0.5).
Table 1. Performance comparison of Mask R-CNN-based models.
Model | AP0.5 (%) | mAP0.5:0.95 (%) | mAP0.75 (%) | mAP0.5 (%) | ||||
---|---|---|---|---|---|---|---|---|
In | Out | F1 | F2 | F3 | ||||
Original Mask R-CNN | 99.8 | 99.8 | 40.9 | 79.6 | 61.7 | 50.4 | 58.2 | 76.4 |
Mask R-CNN + G2-IoU | 99.8 | 100 | 52.7 | 91.4 | 72.4 | 63.3 | 75.9 | 83.3 |
Improved Mask R-CNN | 100 | 99.9 | 61.9 | 92.3 | 79.8 | 64.1 | 72.5 | 86.8 |
Note: Under the current evaluation index, the bold text indicates that the corresponding model has the best effect. This allows for a more dramatic comparison of the effects of the models.
As shown from table 1 the detection accuracy of the Improved Mask R-CNN is greatly improved compared with the Original Mask R-CNN. When the IoU threshold is 0.5, the model accuracy reaches 86.8%, an increase of 10.4%. In the case of only improving the loss function, the accuracy rate also reached 83.3%, an increase of 6.9%. Among them, the detection accuracy of the F1 and F3 has been improved the most.
The detection results based on different object detection models are shown in table 2. Compared with Faster R-CNN, a common two-stage model, the mAP and detection speed of the Improved Mask R-CNN are improved. Although YOLOv3 has the best real-time performance and the fastest detection speed, its accuracy and recall rate are very low. Overall, the effect of Improved Mask R-CNN in plate fault detection is greatly improved.
Table 2. Performance comparison of different models.
Model | mAP0.5 (%) | Detection Time a (s) |
---|---|---|
YOLOv3 | 62.2 | 0.65 |
Faster R-CNN | 70.7 | 1.24 |
Mask R-CNN b | 76.4 | 0.88 |
Improved Mask R-CNN | 86.8 | 0.71 |
a Average detection time of 100 images. b Mask R-CNN does not include Mask training. Note: Under the current evaluation index, the bold text indicates that the corresponding model has the best effect. This allows for a more dramatic comparison of the effects of the models.
Figure 14 shows the detection results based on the improved Mask R-CNN model. It can be found that the faults can be identified with high confidence and accurately located in a variety of complex situations such as continuous plate faults, irregular covering, and local busbar heating.
Download figure:
Standard image High-resolution imageHowever, there are also a small number of false detections, mainly when the background brightness is high. At this time, some background characteristics are similar to the F1, so the non-faulty plate is incorrectly identified as the F1. However, the recognition accuracy of the F2 and the F3 that seriously affect production is high. In an industrial production environment, the priority is to identify the more severe F2 and F3 accurately.
4. Conclusion
In order to improve the detection effect of electrolytic cell plate faults and reduce the false detection rate and missed detection rate of faults, a detection method based on improved Mask R-CNN and infrared images is proposed in this paper. The feature input layer and regression prediction layer of the original Mask R-CNN model have been modified in a targeted manner, mainly improving the proposal's generation strategy and NMS algorithm of the RPN layer, as well as the loss function of bounding box regression. The model is optimized according to the characteristics of the electrolyzer, which is mainly suitable for harsh environments and continuous fault conditions, and can train the model more effectively. The experimental results show that the improved model has an accuracy rate of 86.8%, which is 10.4% higher than the original Mask R-CNN model, and the loss convergence speed is faster. It also has a higher recognition accuracy under complex working conditions, which proves the effectiveness of the proposed algorithm. Compared with the Faster R-CNN and YOLOv3 models, the improved model has better recognition accuracy. As a two-stage object detection model, this model also has a great improvement in recognition speed.
Acknowledgments
This work is supported by the Projects of International Cooperation and Exchanges NSFC (Grant No. 61860206014), the National Natural Science Foundation of China (Grant No. 92167105), and the Natural Science Foundation of Hunan Province (Grant Nos. 2019JJ50823 and 2021JJ30880).
Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.