Brought to you by:
Paper

Fault detection of electrolyzer plate based on improved Mask R-CNN and infrared images

, , , and

Published 9 May 2022 © 2022 IOP Publishing Ltd
, , Citation Hongqiu Zhu et al 2022 Meas. Sci. Technol. 33 085405 DOI 10.1088/1361-6501/ac5b29

0957-0233/33/8/085405

Abstract

Non-ferrous metals are very important strategic resources, and electrolysis is an essential step in refining non-ferrous metals. In the electrolysis process, plate short circuit is the most common fault, which seriously affects output and energy consumption. The rapid and accurate detection of faulty plates is of great significance to the metal refining process. Given the weak generalization ability and complex feature rule design of traditional object detection algorithms, and the poor detection effect of existing deep learning models in infrared images with many interference factors, an improved Mask R-CNN-based fault detection algorithm is proposed to improve the generation strategy and non-maximum suppression algorithm of proposals to reduce the missed detection. We also propose a globally generalized intersection over union loss function to characterize better the position and scale relationship between the predicted box and target box, which is beneficial to the bounding box regression. The experimental results show that the improved model has an accuracy rate 10.4% higher than the original model, reaching 86.8%. Compared with the common one- and two-stage object detection models, the improved model has a stronger detection ability. This algorithm has some reference value for the accurate detection and location of electrolytic cell faults.

Export citation and abstract BibTeX RIS

1. Introduction

Nowadays, non-ferrous metals have become necessary for developing a country's economy, technology, and national defense industry. The electrolysis process is an essential step in refining many non-ferrous metals, which directly affects the productivity and quality of finished metals [1].

In the electrolysis process of copper, lead, zinc, and other metals, cathode and anode plates are arranged parallelly in a cell [2]. As shown in figure 1, plates are uniformly conductive through the busbar. As electrolysis proceeds, the anode and cathode plates may contact each other, and the two plates are usually 2–4 cm apart. As shown in figure 2, the most common situation is that raised particles are formed on the surface of the plate due to impurity attachment, resulting in abnormal contact between the cathode plate and the anode plate. Abnormal contact between the two plates will lead to a short circuit [35]. It not only consumes a lot of electric energy, but also reduces the output of the metal. Therefore, it is of great significance to detect faults in time.

Figure 1.

Figure 1. Diagram of the electrolytic cell.

Standard image High-resolution image
Figure 2.

Figure 2. Particles on the cathode plate.

Standard image High-resolution image

The current fault detection methods include indirect and direct detection. The indirect detection methods mainly include sprinkling tanks and supporting meters to detect magnetic fields. The temperature or magnetic field is used to reflect short-circuit conditions. Although this method is simple in principle, convenient in operation, and low in cost, it is difficult for its sensitivity, accuracy, and real-time performance to meet the requirements. The direct detection method measures the voltage and current of the electrolytic cell with a sensor. Although it can directly reflect the operating status of the electrode plates, the sensors and detection lines are easily corroded in a harsh environment. The plate's temperature will increase during a short circuit, and infrared imaging technology can be used to obtain the temperature of the electrolytic cell surface in all weather, no contact, and real time [6, 7]. Therefore, fault detection, classification, and location based on the infrared image are relatively efficient and straightforward methods.

The temperature of the electrolytic cell can be directly reflected in the infrared image, but it is difficult to detect faulty plates in the infrared image quickly, accurately, and entirely because of complex factors such as irregular cover, acid fog, water vapor spreading, uneven heating of the electrode plate, and local heating of the busbar [8]. The traditional methods are region selection based on sliding windows and manual design of feature extraction rules [9, 10]. Ojala et al [11] proposed a local binary pattern (LBP) method which takes the pixel point in the center of the local area as the threshold and compares it with the surrounding pixel value to form a binary LBP code. Each pixel of the original image is operated to obtain an LBP map. The map is divided into small regions, and the histogram of small regions is counted to make it into a feature vector, which is input into the classifier for classification. Based on scale-invariant feature transform, Ke et al [12] used principal component analysis to calculate the feature descriptor to make its features more prominent. The experimental results show that this method has higher detection performance. These methods have no pertinence and weak generalization ability, which can easily cause false and missed detection.

Object detection technology based on the neural network has been widely used in engineering and has an excellent detection effect [13]. At present, the mainstream object detection algorithms based on deep learning mainly include one-stage algorithms such as YOLOv3 [14] and SSD [15], and two-stage algorithms such as Faster R-CNN [16] and Mask R-CNN [17]. Generally, the accuracy of two-stage algorithms is better than one-stage algorithms, so Mask R-CNN is selected as the basic model.

For many specific object detection problems, the feature input and bounding box regression stage need to be modified according to the actual scene [18]. To improve the detection accuracy, Bodla et al [19] proposed a soft-non-maximum suppression (NMS) algorithm to reduce the confidence of the high coincidence proposals. However, because the proposal of all positive samples is retained, the number of calculations increases. To carry out accurate positioning and speed up the loss convergence speed, Yu et al [20] proposed to use intersection over union (IoU) loss as the bounding box regression loss. The detection effect is greatly improved. However, regression training cannot be performed when the proposal and the target box do not intersect or overlap completely.

Given these problems, combined with the specific characteristics of electrode plate faults, we propose a fault detection method based on Mask R-CNN. This algorithm has three primary improvements as follows:

  • (a)  
    Obtain the reduced multi-scale feature layer based on ResNet50 [21] and feature pyramid network (FPN) [22], and then the preset size of the proposal is adaptively modified according to the length–width ratio of the electrode plate.
  • (b)  
    Improve the NMS algorithm [16, 23] to reduce the missed detection rate of continuous faulty plates.
  • (c)  
    Propose a globally generalized IoU loss function. When adjusting the regression parameters of the boundary box, the sensitivity of the loss function to the scale and distance relationship between the target box and predicted box is increased, and the loss can be calculated according to the coincidence state, relative position, and scale relation of the two boxes.

2. Approach

2.1. Mask R-CNN architecture

The Architecture of Mask R-CNN, shown in figure 3, is mainly composed of feature extraction network, region proposal network (RPN), RoI Align layer, and classification and regression layer.

Figure 3.

Figure 3. Architecture of Mask R-CNN. The backbone is the feature extraction network, Conv is convolution, FCN is fully connected network.

Standard image High-resolution image

The process is: (a) obtain the feature map by ResNet50 and FPN, the size is m × n × d; (b) divide the input image corresponding to the feature map into m × n regions and generate k proposals in each region; (c) filter proposals according to the NMS algorithm; (d) project the proposals on the feature map to obtain the corresponding feature matrix; and (e) scale each feature matrix to a uniform size through the RoIAlign layer. Then, the prediction results are obtained through a series of fully connected layers.

2.2. Feature extraction network

Extracting image depth features is an important step in the entire model training. This paper uses ResNet50 combined with FPN as a feature extraction network. Using multi-layer convolution can extract image features more efficiently, and the residual structure solves the problem of gradient disappearance. Compared with VGG [24] and ZF [25], the overall network performance has great advantages. FPN generates a new set of feature layers based on the RestNet50, and each feature layer is the result of the fusion of different stages of convolutional layers in RestNet50.

FPN combines deep and shallow feature fusion with multi-resolution prediction by connecting high-resolution, low-semantic shallow features to low-resolution, high-semantic deep features to achieve good detection results for multi-scale targets. Its structure is shown in figure 4. The output of the residual block of levels 3–5 of ResNet50 on the left is taken as the feature of FPN, denoted as (C3, C4, C5). (M5, M4, M3) are the intermediate layers obtained by upsampling from the deepest layer, and then connecting them to (C5, C4, C3) to fuse multi-scale features. In order to reduce the impact of upsampling, the layer must perform a 3 × 3 convolution operation. Finally, a multi-scale feature map is obtained, denoted as (P3, P4, P5).

Figure 4.

Figure 4. FPN structure.

Standard image High-resolution image

2.3. RPN working principle

In the early two-stage object detection algorithm, the selective search algorithm [26] was used to generate predicted boxes by clustering similar regions. This process is very time-consuming and generates a lot of redundant boxes, which seriously affects the detection efficiency. The RPN proposed in Faster-RCNN effectively reduces the generation time and number of proposals, and there is no need to repeatedly calculate the feature map corresponding to each predicted box, which greatly improves the recognition efficiency. As shown in figure 5, on the feature map, each pixel is used as an anchor point to generate k proposals. In the infrared image of the electrode plate, the size and shape of the faulty target are relatively fixed. In order to make the proposal have a higher initial matching degree with the target fault and reduce the amount of subsequent bounding box regression calculations, the initial scale of the proposal is set to (32 × 32, 64 × 64, 128 × 128) and the initial aspect ratio is set to (1:4, 1:8), so the k in this paper is 6. Because FPN is used to fuse the deep/shallow features, the three different initial scales directly correspond to (P3, P4, P5).

Figure 5.

Figure 5. RPN structure.

Standard image High-resolution image

In figure 5, a k proposals are generated in total at the corresponding pixel points of each feature layer, and then the proposals are input into the classification layer and regression layer; 2k class scores and 4k bounding box regression parameters are obtained. Based on the class scores, the NMS algorithm is used to eliminate the redundant proposals, and a part of the remaining proposals is sampled for RPN training.

2.4. Improved NMS algorithm

The NMS algorithm is the most commonly used post-processing algorithm in object detection, and is mainly used to eliminate redundant proposals. The NMS algorithm flow is shown in figure 6. When the initial proposals generated by the RPN layer are screened, the proposal with the highest score is retained, the other proposals are sorted by score, and the IoU of the proposal and the highest score proposal, respectively, are judged. If the IoU is greater than the threshold, it will be eliminated until all the proposals are filtered.

Figure 6.

Figure 6. NMS algorithm flow.

Standard image High-resolution image

The traditional NMS algorithm uses a greedy strategy to eliminate the proposal, which may cause missed detection. As shown in figure 7, when multiple plates fail, the predicted box of adjacent targets may be eliminated as a redundant box. The reserved predicted box is not the most appropriate.

Figure 7.

Figure 7. Adjacent fault missed detection. As the distance between plates is small, the proposals are easily stacked with each other.

Standard image High-resolution image

To solve this problem, we propose a multi-stage Gaussian penalty NMS algorithm. When the IoU of the current box and the highest score box exceeds the upper threshold (${T_{\text{u}}}$), it can be considered as a complete overlap and the current box is eliminated. The remaining proposals are kept, but the Gaussian penalty is given according to the degree of overlap. When IoU is less than the lower threshold (${T_{\text{l}}}$), the score is not adjusted (different target). The improved NMS algorithm is as follows:

Equation (1)

where $M$ is the proposal with the highest score, ${b_i}$ is the current proposal to be screened, ${s_i}$ is the original score of the current box, and $s^{*}_i$ is the score after reassignment, set $\sigma $ to 0.65. When ${T_{\text{l}}}$ is 0.4 and ${T_{\text{u}}}$ is 0.85, the precision and recall rate can reach a high level, and the redundant boxes can be efficiently eliminated.

2.5. G2-IoU loss mechanism

The original loss function of the Mask R-CNN model is shown in equation (2). It consists of three parts, namely class loss, bounding box regression loss, and mask loss. Since there is no need to perform semantic segmentation on the target in this paper, mask segmentation is not trained to reduce the amount of calculation

Equation (2)

where ${L_{\text{cls}}}\,$is class loss, and it can be computed by:

Equation (3)

where $p$ is the class probability distribution predicted by the classifier and $u\,$ is the corresponding real class label.

${L_{\text{loc}}}\,$ is the bounding box regression loss, and is computed by smoothL 1 norm. The following formulas are used to compute ${L_{{\text{loc}}}}$

Equation (4)

Equation (5)

where $b$ is the regression parameter $\left( {{b_x},{b_y},{b_w},{b_h}} \right)$ of the corresponding class predicted by the boundary regressor, and ${b^{{\text{gt}}}}$ is the bounding box regression parameter $(b_x^{{\text{gt}}},b_y^{{\text{gt}}},b_w^{{\text{gt}}},b_h^{{\text{gt}}})$ of the real target.

It should be noted that when the smoothL 1 norm is used as the loss of the bounding box regression, the loss is smaller, but the IoU value is not necessarily higher. As shown in figure 8, when the smoothL 1 norm is all 3, the IoU values are quite different.

Figure 8.

Figure 8. Different IoU with the same smoothL 1 norm. The solid line box is the target box, and the dashed box is the predicted box.

Standard image High-resolution image

IoU loss directly performs bounding box regression according to the evaluation criteria and accurately describes the distance and scale relationship between the two boxes. Compared with smoothL 1, it has excellent performance, but there are also many problems. When the predicted box and the target box do not overlap or are fully contained, the relationship between the two boxes cannot be correctly reflected, resulting in no gradient backhaul and thereby weakening network performance. Based on this, we propose the G2-IoU loss mechanism, namely the Globally Generalized IoU loss. G2-IoU considers the coincidence rate of the target box and the predicted box, and considers the distance and scale factors between the two boxes. G2-IoU is as shown in equation (6). In G2-IoU, penalty terms based on the Euclidean distance between the center points of the two boxes and the diagonal distance of the smallest bounding box [27], and on the scale of the two boxes, are added to the algorithm. Thus, the sensitivity of the loss function to distance and scale is increased, which is more conducive to the accurate positioning of the predicted box

Equation (6)

The second term, ${L_{\text{d}}}\left( {b,{b^{{\text{gt}}}}} \right)$, is the normalized Euclidean distance penalty term, and its formula is as follows:

Equation (7)

where $\rho $ is the Euclidean distance of the coordinates of the center point between the target box and the predicted box, while $c$ is the diagonal distance of the smallest bounding box of the two boxes.

Figure 9 shows that no matter what state the two boxes are in, the bounding box loss can be quickly regressed through ${L_{\text{d}}}\left( {b,{b^{{\text{gt}}}}} \right)$. Even if the two boxes are completely contained and the $c$ is fixed, the loss can be calculated by $d$.

Figure 9.

Figure 9. Normalized Euclidean distance penalty term, where $b$ is the predicted box, ${b^{{\text{gt}}}}$ is the target box, $b_{\text{c}}^{{\text{gt}}}$is the center point of the target box, ${b_{\text{c}}}$ is the center point of the predicted box, $d$ is the Euclidean distance between the two center points, and $c$ is the diagonal length of the smallest bounding box of the two boxes.

Standard image High-resolution image

The third term, ${L_{\text{s}}}\left( {b,{b^{{\text{gt}}}}} \right){ }$, is the penalty term based on the scale of the predicted box and the target box (scale includes shape and area). Assuming that the main diagonal coordinate of the predicted box is $\left( {{b_{x1}},{b_{y1}},{b_{x2}},{b_{y2}}} \right)$ and that of the target box is $(b_{x1}^{{\text{gt}}},b_{y1}^{{\text{gt}}},b_{x2}^{{\text{gt}}},b_{y2}^{{\text{gt}}})$, the following formulas are used to compute ${L_{\text{s}}}\left( {b,{b^{{\text{gt}}}}} \right)$

Equation (8)

Equation (9)

Equation (10)

where λ is the penalty equilibrium coefficient, δ is the shape similarity coefficient of the two boxes, and τ is the area similarity coefficient of the two boxes. When the difference in shape is great, the priority is to calculate the loss based on IoU, Euclidean distance, and shape similarity, so τ is effective only when δ is greater than 0.5.

When the centers of the two boxes overlap, ${\rho ^2}\left( {b,{b^{gt}}} \right)\,$and $\,c$ become fixed values, and the punishment based on the distance between the two boxes becomes invalid. At this time, the loss can also be calculated according to the scale of the two boxes.

In summary, the final bounding box regression loss function G2-IoU loss is:

Equation (11)

When the two boxes coincide with each other, the loss is 0. When the two boxes do not coincide at all, the loss is non-negative and bounded.

3. Experiments

The experimental environment is Windows10-based Pytorch1.8, server configuration is Xeon Gold 6230R@2.1 GHz, memory is 256 G, and GPU is 24 G NVIDIA RTX3090.

3.1. Experimental data

The data was collected from the electrolysis workshop of a smelter in Hunan Province, China. After rectifying and segmenting the original infrared images, 2000 electrolytic cell images with a size of 155 × 548 × 3 were obtained. Figure 10 shows the segmented single-cell image.

Figure 10.

Figure 10. Infrared image of a single cell. The electrolyte outlet is on the left, the electrolyte inlet is on the right, and the highlighted part is the faulty plate.

Standard image High-resolution image

The dataset is divided into the training set, validation set, and test set, among which 1280 are training set, 320 are validation set, and 400 are test set.

As shown in figure 11, the faults are divided into three levels: first-level (F1), second-level (F2), and third-level (F3). In order to enrich the target types and facilitate fault location, the electrolyte inlet (In) and outlet (Out) are also marked in the dataset.

Figure 11.

Figure 11. Data label diagram.

Standard image High-resolution image

3.2. Experimental model

In order to verify the fault detection effect of Mask R-CNN and the effectiveness of the proposed improved algorithm, this experiment uses three algorithms for comparison: (a) original Mask R-CNN; (b) original Mask R-CNN algorithm but the bounding box loss function uses G2-IoU loss, denoted as Mask R-CNN + G2-IoU; (c) improved Mask R-CNN. ResNet50 and FPN are used to extract feature maps for all three models. The SGD optimizer is used for training. The learning rate is 0.005, momentum factor is 0.9, and weight decay value is 0.0005. The batch size is 4, epoch is 25, and number of iterations is 8000. In order to facilitate training, the image size will be adjusted to 800 × 800 during input.

In addition, the improved model proposed in this paper is compared with other popular object detection models on the same test set, such as Faster R-CNN and YOLOv3, so as to show the detection capability of the improved model more comprehensively.

3.3. Evaluation index

The evaluation index of this experiment adopts the evaluation index of the COCO data set commonly used in the target detection field, that is, the average precision (AP) and the mean average precision (mAP) under different IoU thresholds.

AP is an important indicator for evaluating the detection effect of a single class. It can be calculated through precision (P) and recall (R). P refers to the percentage of the number of targets correctly detected by the model in the total number of detected targets, and R refers to the percentage of the number of targets correctly detected by the model in the total number of targets of this type. mAP is the average AP of all target categories. The calculation formula of each indicator is as follows:

Equation (12)

Equation (13)

Equation (14)

Equation (15)

where ${\text{TP}}$ is the number of target categories detected correctly, $\,{\text{FP}}$ is the number of non-target categories detected incorrectly, $\,{\text{FN}}$ is the number of target categories not detected, and $r$ is $\,R$.

3.4. Result analysis

The loss convergence curves of the three models are shown in figure 12. It can be found that the improved model starts to stabilize after the 600 iterations and finally converges at about 0.08. The loss converges faster than the original model and the loss value is smaller, which proves the effectiveness of the G2-IoU loss function of the improved model.

Figure 12.

Figure 12. Training loss curves.

Standard image High-resolution image

The mAP curves are shown in figure 13. The mAP of the improved model increased rapidly in the first seven epochs; then there was a slight increase as the number of iterations increased and eventually stabilized at around 0.86. The improved model is significantly improved compared with the original models.

Figure 13.

Figure 13. mAP curves.

Standard image High-resolution image

Table 1 shows the detection results of the three models used in the experiment on the test set, including the average precision (AP) of each class when the IoU threshold is 0.5 (AP0.5) and the mAP under different IoU thresholds (mAP0.5:0.95, mAP0.75, mAP0.5).

Table 1. Performance comparison of Mask R-CNN-based models.

ModelAP0.5 (%)mAP0.5:0.95 (%)mAP0.75 (%)mAP0.5 (%)
InOutF1F2F3
Original Mask R-CNN99.899.840.979.661.750.458.276.4
Mask R-CNN + G2-IoU99.8 100 52.791.472.463.3 75.9 83.3
Improved Mask R-CNN 100 99.9 61.9 92.3 79.8 64.1 72.5 86.8

Note: Under the current evaluation index, the bold text indicates that the corresponding model has the best effect. This allows for a more dramatic comparison of the effects of the models.

As shown from table 1 the detection accuracy of the Improved Mask R-CNN is greatly improved compared with the Original Mask R-CNN. When the IoU threshold is 0.5, the model accuracy reaches 86.8%, an increase of 10.4%. In the case of only improving the loss function, the accuracy rate also reached 83.3%, an increase of 6.9%. Among them, the detection accuracy of the F1 and F3 has been improved the most.

The detection results based on different object detection models are shown in table 2. Compared with Faster R-CNN, a common two-stage model, the mAP and detection speed of the Improved Mask R-CNN are improved. Although YOLOv3 has the best real-time performance and the fastest detection speed, its accuracy and recall rate are very low. Overall, the effect of Improved Mask R-CNN in plate fault detection is greatly improved.

Table 2. Performance comparison of different models.

ModelmAP0.5 (%)Detection Time a (s)
YOLOv362.2 0.65
Faster R-CNN70.71.24
Mask R-CNN b 76.40.88
Improved Mask R-CNN 86.8 0.71

a Average detection time of 100 images. b Mask R-CNN does not include Mask training. Note: Under the current evaluation index, the bold text indicates that the corresponding model has the best effect. This allows for a more dramatic comparison of the effects of the models.

Figure 14 shows the detection results based on the improved Mask R-CNN model. It can be found that the faults can be identified with high confidence and accurately located in a variety of complex situations such as continuous plate faults, irregular covering, and local busbar heating.

Figure 14.

Figure 14. Detection result. (a) Continuous plate faults. (b) Acid fog and water vapor diffusion. (c) Background is highlighting and irregular covering. (d) Local busbar heating.

Standard image High-resolution image

However, there are also a small number of false detections, mainly when the background brightness is high. At this time, some background characteristics are similar to the F1, so the non-faulty plate is incorrectly identified as the F1. However, the recognition accuracy of the F2 and the F3 that seriously affect production is high. In an industrial production environment, the priority is to identify the more severe F2 and F3 accurately.

4. Conclusion

In order to improve the detection effect of electrolytic cell plate faults and reduce the false detection rate and missed detection rate of faults, a detection method based on improved Mask R-CNN and infrared images is proposed in this paper. The feature input layer and regression prediction layer of the original Mask R-CNN model have been modified in a targeted manner, mainly improving the proposal's generation strategy and NMS algorithm of the RPN layer, as well as the loss function of bounding box regression. The model is optimized according to the characteristics of the electrolyzer, which is mainly suitable for harsh environments and continuous fault conditions, and can train the model more effectively. The experimental results show that the improved model has an accuracy rate of 86.8%, which is 10.4% higher than the original Mask R-CNN model, and the loss convergence speed is faster. It also has a higher recognition accuracy under complex working conditions, which proves the effectiveness of the proposed algorithm. Compared with the Faster R-CNN and YOLOv3 models, the improved model has better recognition accuracy. As a two-stage object detection model, this model also has a great improvement in recognition speed.

Acknowledgments

This work is supported by the Projects of International Cooperation and Exchanges NSFC (Grant No. 61860206014), the National Natural Science Foundation of China (Grant No. 92167105), and the Natural Science Foundation of Hunan Province (Grant Nos. 2019JJ50823 and 2021JJ30880).

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Please wait… references are loading.
10.1088/1361-6501/ac5b29