Transformer graded fault diagnosis based on neighborhood rough set and XGBoost

. Aiming at the uncertainty of fault type reasoning based on fault data in transformer fault diagnosis model, this paper proposed a hierarchical diagnosis model based on neighborhood rough set and XGBoost. The model used arctangent transformation to preprocess the DGA data, which could reduce the distribution span of data features and the complexity of model training. Using 5 characteristic gases and 16 gas ratios as the input characteristic parameters of the XGBoost model at all levels, reduction was performed on these 21 input feature attributes, features that had a high contribution to fault classification were retained, and redundant features were removed to improve the accuracy and efficiency of model prediction. Taking advantage of XGBoost's strong ability to extract a few features, the output of the model was the superposition of leaf node scores for each type of fault, the maximum score was the type of failure the sample belonged to, and its value was also the probability value. The obtained probability was used as one of the evidence sources to use D-S evidence theory for information fusion to verify the reliability of the model. Experiments have proved that the XGBoost graded diagnosis model proposed in this article has the highest overall accuracy rate comparing with the traditional model, reaching 93.01%, the accuracy of XGBoost models at all levels has reached more than 90%, the average accuracy rate is higher than that of the traditional model by an average of more than 2.7%, and the average time-consuming is only 0.0695 s. After D-S multi-source information fusion, the reliability of the prediction results of the model proposed in this paper has been improved.


Introduction
Transformers are important equipment for transforming and transmitting electrical energy. Aiming at addressing the inaccuracy of the existing transformer fault diagnosis knowledge and in order to ensure the normal operation of the transformer, scholars have done research on transformer faults in multiple directions.
Reference [1] introduces a correction factor to the nearest neighbor component analysis algorithm, and maps the K nearest neighbors in combination with the training metric matrix, thereby improving the classification performance of the K nearest neighbor algorithm on unbalanced data sets; reference [2] inputs the DC transient excitation into the transformer winding, and takes the oscillating wave response at the end of the winding as the analysis object, and proposes a winding fault diagnosis technology; reference [3] quantifies the change characteristics of condition monitoring data over time, calculates the control limit of 2 T and Q statistics, and determines the samples that exceed the control limit as fault samples, thus proposes unsupervised concept drift recognition and dynamic graph embedded transformer fault detection method; reference [4] uses deep belief network for unsupervised training, extracts features from DGA data and combines D-S evidence theory to solve the uncertainty problem of transformer fault diagnosis. However, reference [1] requires Bayesian algorithm for hyperparameter tuning, and the overall structure of the model is complex and consumes lots of computing resources. reference [2] requires high signal-to-noise ratio of the excitation source and in high frequency bands, the sensitivity of the oscillatory wave method is low. Reference [3] only studies the boundary point between the fault data set and the normal data set, and does not involve the identification of specific fault types. Reference [4] uses deep belief network pre-training and parameter tuning for a long time.
In order to find early faults inside the transformer, combining with the characteristics of real-time, online, no electricity, and magnetic field interference based on DGA diagnosis [5] , This paper proposes XGBoost's multi-level transformer fault diagnosis based on neighborhood rough set. XGBoost [6] is an extreme gradient boosting algorithm that forms a strong classification model by integrating multiple CART [7] trees. This paper consequently uses D-S evidence theory [8] for information fusion to solve the uncertainty [9] and imprecision of diagnosis knowledge and methods, thereby improving the reliability of model diagnosis.
, that is, there are no extra attributes in condition subset B.
then B is a conditional reduction set of A.

Introduction to XGBoost
XGBoost uses the CART tree as the basic model and is developed from the gradient boosting decision tree. The objective function is: , j I is the set of samples contained on the leaf with index j. When each leaf j takes the value of the quadratic function to take the minimum value, the objective function takes the minimum value, so that the derivative of j Z is obtained on the quadratic function, and the derivative is 0 for the extreme value: The structure score is: To split a leaf node, the gain after splitted is: Perform gain calculation on all branch points of all features, and select the node with the largest gain, that is, the node with the fastest decline in the objective function for branching, when the gain is lower than the setted threshold , stop the growth of the tree and find the best tree structure.

1) Confidence function:
If the proposition A is a subset of 4 , the sum of the probability distributions of all the subsets in A is the confidence function of A, (5)shows the minimum degree of trust in proposition A. 2) Likelihood function: (6)indicates the maximum confidence in proposition A.
The confidence interval is > @ , which indicates the degree of confidence in a certain proposition.

3) Synthesis rule:
Assuming that there are n information sources, the mass function value fused on 4 can be obtained through the synthesis rule.
where 4 i A , K represent the conflict factor.
, the larger the K, the more intense the conflict between the evidence.

Choice of Input feature vector and fault type
The first level (XGBoost1) diagnoses normal or fault, the second level diagnosis (XGBoost2) diagnoses overheating(H), complex(M) or discharge(C), and the third level includes 3 models of XGBoost3-XGBoost5, which respectively diagnose 9 types of faults proposed by the uncoded ratio method. The volume fractions of five characteristic gases are selected as the reduction object of the firstlevel model. And take the characteristic gas ratio in Table  1 as the reduction object of the second-level and thirdlevel models, where TH represents total hydrocarbons, 4 2 2 2 4 D CH C H C H , select 9 types of faults proposed by the non-code ratio method as output: low energy discharge and overheating (MF1), high energy discharge and overheating(MF2), partial discharge (PD), low energy discharge(D1), high energy discharge(D2), low temperature overheating (T1), medium temperature overheating (T2), high temperature overheating (T3), normal (N).  This article uses 2093 pieces of data from a 330kV oilimmersed transformer as samples, 1079 pieces of data as the test set and 1014 pieces as inspection data. In order to avoid the highly skewed distribution of DGA data and the large gap between the characteristic gas ratios, the paper uses arctangent transformation processing on 5 characteristic gases and 16 characteristic gas ratios, in order to reduce the impact of initialization, normalize the data again:

Attribute reduction
In order to enable the model to identify the mapping between DGA data and fault types more efficiently and accurately, Using 1079 test set as the reduction object, this paper applies neighborhood rough set to reduce attributes, keep important attributes, and eliminate redundant attributes. It adopts forward greedy algorithm and does not need to discretize DGA data, thus maintaining the integrity of feature data, the lower limit of the importance of this article is 0.001, so that the reduction is accurate.

The definition of probability distribution function in D-S evidence theory
This paper uses the output of the XGBoost model as one of the evidences, and its basic probability value is calculated as follows: is the output probability value of the XGBoost model, n E is the error, )is the fault category; 1 N is the total number of fault categories. Other evidence uses statistical methods to determine its basic probability value.

The algorithm flow of XGBoost
1) Divide 1014 inspection data into 811 training set and 203 test set according to 4:1, and use the reduced attributes as input to complete the training of the XGBoost model, and use 203 test set data for prediction to verify the performance of the XGBoost model . 2) Input the data to be diagnosed into the trained XGBoost model according to the reduced attributes to obtain the output probability value.
3) According to equations (10) and (11), calculate the basic probability value of the network output, and use statistical analysis to obtain the basic probability value of other evidence. 4) Use equations (5)- (8) to calculate the confidence function, likelihood function, and conflict factor K of each focal element. Take the I and II diagnostic models as an example, the ROC [10] curves are shown in Figures 1 and 2 respectively. The first level only uses the XGBoost model. It can be seen from Figure 1 that after arctangent processing, the AUC area of the level I diagnostic model has increased by 0.0209 to 0.9764. For the same reason, Figure 2 shows that the AUC area of the three models XBGboost, Rand-omForest, and BPNN after arctangent processing increased by 0.0217 on average, and the AUC area stratification occurred: the AUC area of XBGboost is larger than RandomForest, and the AUC area of RandomForest is larger than BPNN. It shows that through arctangent transformation, the input data is mapped to [0, S /2], and E3S Web of Conferences 243, 01002 (2021) ICPEME 2021 https://doi.org/10.1051/e3sconf/202124301002 then normalized processing can effectively solve the long-tailed [11] distribution problem of DGA data, so that the model can better express the nonlinear relationship between input and output .

Comparison of accuracy after feature reduction
Supposing five characteristic gases are input characteristic parameters 1 S ,and the ratio of 9 characteristic gases of the non-code ratio method is input characteristic parameters 2 S , five characteristic gases and the ratios of 16 characteristic gases are input characteristic parameters 3 S , The input feature parameter after reducing 3 S by the neighborhood rough set theory is 4 S , Input the above four characteristic parameters into the XGBoost graded diagnosis model, and get Tables 2 and 3. Observing Table 2, we can see that the diagnostic accuracy of level I of the four input feature parameters is the highest compared to level II and III, all of which are above 90%, and the highest is 95.40%. This is because the I-level XGBoost model can easily grasp the difference between the characteristics of normal and fault data through learning. The accuracy of the four feature quantities as the input of the II and III models has decreased, indicating that when the fault type needs to be specifically subdivided, the generalization ability of the model is more demanding. In level II diagnosis, due to the large sample size, the generalization ability of the model is effectively improved through learning, so the accuracy rate is high. The average accuracy of level II is 90.38%, and the highest is 92.62% for 4 S . Since the five characteristic gases are directly used to distinguish fault types from DGA data, they contain more redundant information, so the level III diagnosis accuracy of the five characteristic gases is low, with an average accuracy rate of only 86.21%. The non-code ratio method is used as the input feature parameter. Since the ratio selection is only 9 kinds, it is not comprehensive and detailed, and it is easy to cause uncertainty in reasoning. Therefore, the average accuracy rate of the third level of the non-code ratio method is only 87.76%.
When the five characteristic gases and the ratios of 16 characteristic gases are used as input parameters, because the included fault characteristic information data is relatively complete and accurate, the average accuracy rate of level III reaches 90%. When the feature parameters after attribute reduction are used as input, the diagnostic model of each level has selected features, so the accuracy of each level is the highest, and the average accuracy of level III is 91.44%. Through the comparison of accuracy rate, it can be obtained that the predictive ability of XGBoost diagnostic models at all levels is the best after being reduced by the neighborhood rough set theory. Calculate the average time for each of the five models of the four characteristic parameters. The longest average time is the five characteristic gases and the ratios of the 16 characteristic gases ( 3 S ), the value is 0.07094s. The shortest average time is non-code ratio method( 2 S ), which has a value of 0.0551s, and the average time it takes to input the features to all levels of models after attribute reduction( 4 S ) is 0.0695s, which is shorter than 3 S and meets actual requirements.

Comparison of different diagnostic methods
Taking the hierarchical diagnosis proposed in this paper as a model, using the reduced feature parameters in as the input of BPNN, RandomForest and XGBoost, respectively, the accuracy comparison histogram of each classifier under different methods is obtained. In the  Figure 3 shows that the XGBoost model has the highest accuracy rate compared to BPNN and RandomForest at the same level of diagnosis, which is 0.06046 higher than BPNN on average and 0.02774 higher than Random-Forest on average. From the perspective of the AUC area under the ROC curve, the XGBoost model has the largest AUC area with a value of 0.9791, which is 0.0865 and 0.0127 higher than BPNN and RandomForest, respectively. It can be seen that the XGBoost model proposed in this paper has the best effect among the several classifiers. At the same time, it can be seen from Figure 3 that when the fault types are subdivided level by level, the accuracy rate decreases. This is due to the limited sample size, The sample labels are set to the nine fault types proposed by the non-code ratio method, and the nine characteristic gas ratios proposed by the non-code ratio method are used as the input of BPNN and RandomForest, which do not use a graded model, and compared with the prediction accuracy of the XGBoost graded diagnosis model proposed in this paper. Table 4 can be obtained.  Table 4 indicates that due to the limited fault feature information contained in the non-code ratio method, the accuracy rate is only 88.46%, BPNN is easy to fall into the local optimum, and the accuracy rate is only 90.14%. XGBoost is more suitable for processing data with a few features than RandomForest, and considering the second derivative, adding regular term coefficients to the loss function, and pre-pruning the decision tree, its accuracy is 1.1% higher than RandomForest, reaching 93.01%.

Case of study
Taking the DGA data in Table 5 as an example, the fault type is high-energy discharge. This article uses D-S evidence theory to conduct information fusion [12] analysis. Input the reduced characteristic parameters into the hierarchical XGBoost diagnosis model in turn. The actual output of the I-level XGBoost1 model is [0.0221, 0.9779], and the expected output is [0, 1]. According to equations (10) and (11), the basic probability distribution of evidence source 1 e is [0.0220, 0.9774]. The analysis of transformer fault statistics data shows that the basic probability distribution of another source of evidence 2 e is [0.0986,0.9000], then Table 6 can be obtained: According to formulas(5)-(8), carry out evidence fusion and calculate the conflict factor K to be 0.8838. The confidence and likelihood of various operating conditions are shown in Table 7: Through the level I decision fusion, the conclusion can be drawn: the operating condition is a fault, the uncertainty is 7 -10 5 . 9 u , the confidence interval is (0.9974, 0.99740095), and the reliability of the level I model for diagnosing the operating condition as fault F is improved.
Further analysis, there are three possible fault types: discharge (C), overheat (H), and complex (M). input the reduced feature parameters into the class II XGBoost2 model, the actual output of the class II model is [0.9389, 0.0359, 0.0252], and the expected output is [1,0,0]. According to equations (10) and (11) Table 8: Using formulas (5)- (8) in the D-S evidence theory for fusion, the conflict factor K can be obtained as 0.4775. The confidence and likelihood of various failure types are shown in Table 9. The conclusion can be drawn from the second-level decision: the fault type is discharge, the uncertainty is 0.0005, and the confidence interval is (0.9495, 0.95), and the reliability of the second-level model for diagnosing the fault type as discharge is improved. There are three possibilities for this level of diagnosis from the diagnosis result of the upper level as discharge: low-energy discharge ( 1 D ), high-energy discharge ( 2 D ) and partial discharge (PD), input the reduced features into the III-level XGBoost5 model, the actual output is [0.0281, 0.9507, 0.0212], and the expected output is [ 0, 1, 0]. According to formulas (10) and (11), the basic probability distribution of evidence 5 e is calculated as [0.0280, 0.949, 0.0211], and the basic probability distribution of evidence 6 e as which the non-code ratio method act is [0.27, 0.51, 0.21], then Table 10 : Table 10. Probability distribution of level III diagnostic model. In the same way, the conflict factor K is 0.5079, and the confidence and likelihood of various failure types are shown in Table 11:  The conclusion can be drawn from the third-level decision: the fault type is high-energy discharge, the uncertainty is 5 -10 7 . 3 u , the confidence interval is (0.9735, 0.973537), and the reliability of the third-level model in diagnosing the fault type as high-energy discharge is improved.

Conclusion
Based on the XGBoost model, this paper uses the neighborhood rough set theory to reduce its input, and uses its output as one of the evidence sources to employ D-S evidence theory for information fusion, and establish the XGBoost graded diagnosis model, which is compared with the non-code ratio method, BPNN and RandomForest, the prediction accuracy of the transformer graded diagnosis model proposed in this paper is higher than 2.7% on average, and the average time from training model to prediction is less than 0.07 s. In the analysis of the examples, the D-S evidence theory was used to verify the efficiency and reliability of the diagnosis model at all levels proposed in this paper. After summarizing, the following conclusions are drawn: 1) By preprocessing the data with arctangent transformation, and then normalizing, the AUC area of the model can be increased, and the long-tailed distribution problem of DGA data can be solved.
2) The feature attribute reduction using the neighborhood rough set theory can retain the features that have a greater contribution to distinguishing fault types and remove redundant features, which can improve the prediction accuracy and shorten the model training time .
3) Through the use of D-S evidence fusion theory, the ambiguity is resolved. In the future, more sufficient data will be used to establish the XGBoost hierarchical diagnosis model, which can solve the problem of data imbalance, so that all levels of models have a high diagnostic accuracy rate, and there will be no decline.