Research on imbalanced data fault diagnosis of on-load tap changers based on IGWO-WELM

: Aiming at the problem of on-load tap changer (OLTC) fault diagnosis under imbalanced data conditions (the number of fault states is far less than that of normal data), this paper proposes an OLTC fault diagnosis method based on an Improved Grey Wolf algorithm (IGWO) and Weighted Extreme Learning Machine (WELM) optimization. Firstly, the proposed method assigns different weights to each sample ac-cording to WELM, and measures the classification ability of WELM based on G-mean, so as to realize the modeling of imbalanced data. Secondly, the method uses IGWO to optimize the input weight and hidden layer offset of WELM, avoiding the problems of low search speed and local optimization, and achieving high search efficiency. The results show that IGWO-WLEM can effectively diagnose OLTC faults under imbalanced data conditions, with an improvement of at least 5% compared with existing methods.


Introduction
An on-load tap changer (OLTC) is the core component in a load-ratio voltage transformer and the only movable component in a transformer.As the mechanical structure of an on-load tap-changer is complicated and the voltage is frequently regulated, it experiences frequent faults.According to international transformer fault data, the faults caused by OLTCs account for more than 20% of total showed that the classification accuracy of the optimized WELM was significantly improved compared with the unmodified WELM.
The WELM method is used to solve the problem of OLTC imbalanced data based on the idea of ref [21].Based on the advantages of GWO in parameter optimization, this paper uses the GWO algorithm to optimize WELM input weight and hidden layer offset parameters.However, similar to other methods, in the late iteration period of GWO algorithm, the grey wolf individual search speed gradually decreases, and the overall convergence is premature, which increases the probability of falling into local optimum.Considering that the PSO algorithm has better search ability and higher execution strategy, this paper introduces the PSO algorithm into the update equation of the grey wolf algorithm, and proposes the improved grey wolf optimization (IGWO) and OLTC fault diagnosis method of WELM (IGWO-OLTC for short).As an improvement strategy, IGWO-OLTC can improve the search ability and development ability of the whole algorithm, and reduce the probability of falling into local optimum.
This paper presents an OLTC fault diagnosis method based on improved Grey Wolf Optimization (IGWO) algorithm and WELM.The research consists of three parts: 1) Aiming at the problem that the classification results of traditional machine learning algorithms are not accurate when dealing with imbalanced data, a WELM based OLTC fault diagnosis model is proposed; 2) Because WELM is easily affected by input weight and hidden layer deviation, GWO is used to optimize WELM; 3) Considering that the GWO can easily fall into local optimum and the convergence speed is slow, the particle swarm optimization algorithm is used to optimize it, and the IGWO-WELM fault diagnosis model is proposed.Through the analysis of simulation data and experimental data, the proposed fault diagnosis model has high accuracy.

GWO Method
Marjiali et al. proposed a new swarm intelligence algorithm based on the tightly organized system and hunting behavior of grey wolves, which includes three parts: tracking prey, surrounding prey, attacking prey, and other optimization processes, summarized as follows [26][27][28]: 1) Rank stratification of wolf pack: Grey wolves mainly live in groups, and the group follows the social hierarchy, as shown in Figure 1.It can be seen from the figure that the α Wolf is the leader of the social group and is mainly responsible for making decisions about activities such as predation, while the rest of the wolves obey the command of the α Wolf.Level 2: β Wolf, obeying and assisting α Wolf, can dominate all the wolves except for α Wolf.Level 3: δ Wolf, obeying the arrangement of α and β Wolf at the same time, can dominate the rest of the remaining wolf pack, and rank ω is the lowest level.The overall predation behavior of grey wolves is led by α wolves, and the task of other wolves is to besiege the prey.
2) Surrounding prey: Grey wolves surround their prey as they hunt.The mathematical model of encircling prey is as follows: where X(t) represents the position of grey wolves, and X p represents the position vector of prey: ( 1) where A and C represent coefficient vectors, and the calculation formula is as follows: where t represents the current number of iterations, and a = 2 (1-t/T max ) represents that the variable decreases linearly from 2 to 0, r 1 , r 2 ∈[0,1] during the iteration process.
3) Hunting prey: Grey wolves can identify prey and surround it.The search process is α Wolf commands and leads, β and δ sometimes, they will take part in hunting.Hypothesis α, β and δ The wolf can have a deeper understanding of the potential location of prey, and accordingly, during the algorithm iteration process, save the best location of the three wolves in the current population, and mark them as α, β and δ.Then, according to the position of the three parameters ϖ Wolf individuals are updated, and the mathematical model is as follows: ( 1) where X represents the position of the grey wolves.When 1 A  , the grey wolves will try to disperse in each area to search for prey.When 1 A  , the wolves will search for prey in a predetermined area.

IGWO method
The GWO algorithm has been successfully applied in the fields of job shop scheduling, power system analysis, economic forecasting, etc.However, like other algorithms, the GWO is prone to fall into the local optimum and has a slow convergence speed [28].Therefore, in order to improve the global convergence and convergence speed, this paper uses the Particle Swarm Optimization (PSO) algorithm to improve the grey wolf algorithm, namely IGWO [27].The main reason for choosing the PSO algorithm is that the search process is simple and easy to implement, and the convergence speed and search speed are fast.The specific formula is as follows: Where, max  is the maximum weight value, min  is the minimum weight value, and T max is the maximum number of iterations.

IGWO algorithm
In order to verify the effectiveness of the algorithm, eight common standard test functions are used in this paper to verify the IGWO, GOA, PSO, MFO, GWO and SCA algorithms [26,29].The test function expressions are shown in Table 1.In order to verify the effectiveness of the proposed algorithm, the average value, the lowest value, the best fitness value, the standard deviation, the precision rate and the optimization success rate are used as evaluation indexes to calculate.
This paper tests each function in the table 20 times, including 30 algorithm populations and 500 iterations.The final calculation results for IGWO and GOA, PSO, MFO, GWO and SCA, are shown in Table 2.It can be seen from the total statistical values of SRs in Table 2 that the number of IGWOs is 6, the average value is 75%, the number of PSOs is 3, the average value of PSOs is 37.5%, and the value of the other 4 SRs is 0, indicating that the optimization ability of the method presented in this paper is the best of the six methods.According to the table, the standard deviation and mean value of IGWO are the smallest of the eight algorithms.Friedman test was performed on the means and standard deviations of all the algorithms, and the results are ranked as follows: IGWO < PSO < GOA < SCA < MFO < GWO.The Wilcoxon test shows that the progressive significance of IWOA and the five optimization algorithms in the same dimension is less than 0.05, which proves that there is a significant difference between IWOA and the five different algorithms mentioned above.The above verification shows that IGWO has excellent optimization accuracy and stability.Figure 2 records the convergence results of IGWO, GWO, PSO, MFO, GOA and SCA in each test function.It can be seen from the eight figures that the method in this paper has a faster convergence speed compared with the other seven methods.As can be seen from the F1-F3 and F6-F8 iteration graphs, the IGWO algorithm significantly outperforms the other algorithms in convergence speed and reaches the optimal value at the end of the iteration.With the increase of the number of iterations, the convergence rate of IGWO is the fastest in all algorithms except F4 and F5, which further indicates that the improvement of GWO algorithm by PSO is effective.

WELM algorithm
Weighted Extreme Learning Machine (WELM) was proposed by Zong et al. [30] in 2013.This method retains the advantages of ELM, such as easy implementation and wide classification of mapping functions, and can be directly used to deal with data imbalance problems.
The WELM correlation principle is implemented based on the cost-sensitive idea.Each sample xi is weighted by introducing a weighting matrix, and the diagonal matrix W of N s × N s is formed by weighting, and the elements on the diagonal are the weight values of corresponding samples.If x i belongs to the majority class, a smaller weight is assigned; conversely, if x i is a minority class, a larger weight is assigned.After the weight ω is introduced, the optimization problem of WELM can be obtained according to the solution idea of extreme learning machine in the last section, and the mathematical problem can be modeled.The expression of WELM is as follows [31]: The constraint expression is as follows: , The corresponding Lagrangian form is: According to KKT theory, the Lagrange penalty factor λ is assumed to be constant.Let the partial derivative of WELM with respect to Φ, λ, and ξ be 0, and the specific equation is as follows [30]: The corresponding Φ expression is shown in the following equation: (14) where I represents the identity matrix and L represents the number of hidden layers in the network.
For binary classification problems, the decision function of WELM classifier is ( ) sign ( ) , and the specific expression is as follows: (15)

IGWO-WELM imbalance diagnostic model
Although the WELM algorithm is widely used for data imbalance, WELM, as a variant of ELM derived from the weighting idea, has similar problems to ELM.The randomly selected hidden layer bias and input weight may lead to model ill-conditioning problems, resulting in an unsatisfactory diagnosis.To solve the above problems and further improve the fault diagnosis accuracy of WELM, this paper uses IGWO to optimize the input weight and implicit bias of WELM and establishes an OLTC data imbalance fault diagnosis model based on IGWO-WELM (Weighted Extreme Learning Machine Based on Improved Grey Wolf Algorithm).
(a) Design of fitness function To evaluate and select the next generation of grey wolf individuals, appropriate evaluation criteria must be selected as the fitness function of IGWO.The commonly used performance evaluation index of conventional machine learning algorithms is Accuracy (ACC).However, when ACC is used as an evaluation index to evaluate the performance of imbalanced data classification algorithms, the algorithm results will be biased toward most categories, resulting in high classification accuracy and the possibility of a high false-negative rate.Therefore, ACC is not suitable as a classification index for imbalanced data, and an evaluation index that can take into account both majority and minority classification results is needed.For binary classification problems, the minority class is usually defined as a positive category, and the majority class is defined as a negative category.In order to evaluate the classification results, the sample set is assumed to be composed of P anode and N cathode samples, and TP, FN, TN and FP are defined respectively, where TP represents the number of correctly classified samples in the positive category, FN represents the number of incorrectly classified samples in the positive category, TN represents the number of correctly classified samples in the negative category, and FP represents the number of misclassified samples in the negative category, according to this confusion matrix, as shown in Table 3 [32,33].
Two indexes are obtained according to Table 3, namely Recall and G-mean, which evaluate the classification results of positive categories.A larger Recall value means that most positive category samples are detected, and G-mean is a good index for overall evaluation.The calculation formulas for Recall and G-mean are as follows [32,33]: According to Eq (16), the fitness function expression of IGWO-WELM is as follows: Where, c represents the number of categories and fitness represents the fitness function.

IGWO-WELM imbalance diagnostic model
The main steps of the IGWO-WELM algorithm are as follows: 1) Set the initial parameters A, C and a of the algorithm, the maximum number of iterations T max , and select the appropriate number of wolves N; 2) According to the order from large to small, the fitness is calculated by Eq (18).The individuals corresponding to the first three fitness values are α, β, δ, and the corresponding positions of each grey wolf are X α , X β , and X δ , respectively.
3) Calculate A and C according to Eqs (3) and ( 4); 4) According to Eq (6), the update positions of individual gray wolves under the guidance of α, β and δ wolves are calculated as X 1 , X 2 and X 3 , respectively; 5) According to Eqs ( 7)−( 9), the moving speed v i and moving position X i of grey wolf individuals are updated with the idea of particle swarm optimization; 6) Judge whether t reaches T max , and if so, obtain the optimal input weight and hidden layer offset; Otherwise, return to step 2; The WELM model is tested on the test set with the optimal weight and hidden layer bias, and the final classification results are obtained.The specific process of the diagnosis model is shown in Figure 3.

Analysis of experimental results of imbalanced datasets based on KEEL
To observe the generalization ability of IGWO-WELM, eight datasets in the KEEL database are used to verify the proposed method, which contain binary and multi-classification datasets, and all of them have data imbalance problems.The specific parameters of the data are shown in Table 4.In order to illustrate the generalization ability, the proportion of data imbalance in the table increases gradually from top to bottom.
In order to comprehensively analyze the IGWO-WELM method, GWO-WELM [34], GOA-WELM [34], GA-WELM [34], WOA-WELM [34], PSO-WELM [24], WELM, and Support Vector Machine (SVM) are used as the over sampling algorithms of the base classifier (SMOTE-SVM, SSVM), and Kernel Extreme Learning Machine (KELM) [32] are used as the over sampling algorithms of the base classifier (SMOTE-KELM, SKELM) [35].The improved oversampling algorithm -Borderline SMOTE for random forest (RF) (Borderline SMOTE-Random F the sampling size of each algorithm N = 10, the maximum number of iterations T max = 30, the kernel parameter g of SVM is 1, and the penalty factor c is 2. The specific parameters of each algorithm are shown in Table 5.
The number of decision trees s 1 = 10, the maximum number of features c 5 = 42 80% of each category in the eight data sets is randomly selected as the training set and 20% as the test set.In order to avoid the randomness brought by the algorithm, each algorithm repeats the calculation 30 times to obtain the G-mean value and average it.It should be noted that SKELM, SSVM, and BSRF are used for training.First, the training set is oversampled, and different types of data samples in the training set are balanced.Then KELM, SVM and RF are trained to establish a classification model for the balanced training set.Finally, the test samples are input into the established classification model to verify the performance of the oversampling algorithm.
To verify the optimization performance of IGWO, the IGWO algorithm, GWO, GOA, WELM, MFO and GA algorithms are used to verify the optimization performance of WELM.The iteration diagram is shown in Figure 4.It can be seen from the figure that with the increase of iteration times, the advantages of IGWO are gradually highlighted, and it is optimal in all eight data sets.Table 6 shows the imbalanced classification results the different algorithms.It can be seen from the results that the evaluation indicators of the remaining 8 KEEL datasets of IGWO-WELM have the best results, except that the evaluation indicators of the contractual dataset are less than those of BSRF.To sum up, IGWO-WELM is better than the other nine methods in imbalanced data classification.
In order to evaluate the impact of imbalanced data on the proposed model, models with an imbalanced ratio of training data of 2:1, 3:1, 4:1, 5:1, 6:1, and 7:1 were used to verify IGWO-WELM.When designing the training data of imbalanced data, 48 samples are randomly selected from the normal state feature set composed used as in ref [2], and 24 samples are randomly selected from other fault sample feature sets to form the 2:1 imbalanced monitoring data and conduct training.Similarly, 24 samples are randomly selected from the feature set, 24 samples are randomly selected from other fault sample feature sets, and all the remaining samples form a test set.Calculate    Furthermore, the proportion of fault samples mistaken as normal samples in the total number of fault samples is calculated for statistics, which is defined as the false alarm rate, as shown in Table 8.As shown in Table 7, the overall performance of the IGWO-WELM algorithm in OLTC imbalance data diagnosis is better than that of the other nine methods.As the imbalance of the imbalance data gradually deepens, the advantages of the IGWO-WELM algorithm become increasingly obvious.The G-mean values of IGWO-WELM under different proportions are higher than those of the other nine methods, and the average values are higher than PSO-WELM, GA-WELM, GOA-WELM, WOA-WELM, GWO-WELM, WELM, SSVM, SKELM and BSRF at 11.89, 25.3, 5.67, 37.58, 13.17, 9.53 and 9.41% respectively.Secondly, the BSRF is better, which shows that the over-sampling algorithm is feasible to change the training set method, but it is still not as effective as IGWO-WELM.The worst method is WELM.This is due to the influence of hidden layer bias and input weight on the model, which makes the diagnosis accuracy low.This shows the importance of WELM parameter optimization.Table 7 further shows that the optimization effect of WOA-WELM is the worst among the six optimization methods, followed by GOA-WELM, which is caused by the performance defect of the algorithm itself.
From Table 8, we can see that IGWO-WELM has the lowest false alarm rate among all the methods and can maintain the false alarm rate at a low level, significantly lower than the other nine algorithms.It can be further seen from the table that WOA-WELM has the highest false alarm rate among the 10 methods, followed by GA-WELM and GOA-WELM.For further explanation, it can be seen from Figure 5 that IGWO-WELM has the best effect among all classification methods, followed by PSO-WELM, GA-WELM, GGO-WELM, SSVM, SKELM and BSRF, and the worst are WOA-WELM and WELM.WELM is not optimized, resulting in poor results.To sum up, it can be further shown that the WOA WELM algorithm is not suitable for OLTC unbalanced data fault diagnosis.

Conclusions
Aiming at the problems of classification bias and model invalidation when traditional machine learning algorithms deal with OLTC imbalanced data distribution, this paper proposes a fault diagnosis method for OLTC imbalanced distribution based on IGWO-WELM.The main conclusions are as follows: 1) The particle algorithm is used to improve GWO, and the IGWO algorithm is proposed.This algorithm can overcome the problem that the GWO algorithm can easily fall into the local optimum and has slow convergence.
2) IGWO-WELM algorithm is proposed by using IGWO's good global search and fast convergence ability to optimize the input weight and implicit offset of WELM, and G-mean is used as the fitness function of IGWO-WELM.
3) By comparing other classical methods of imbalanced data fault diagnosis with the method in this paper through the KEEL datasets and OLTC dataset, the method in this paper shows improvement of least 5%, which has certain theoretical research and practical engineering significance.

Figure 4 .
Figure 4. Imbalanced data fault diagnosis model based on IGWO-WELM.(a) Iteration of different algorithms for wine data.(b) Iteration of different algorithms for contraceptive data.(c) Iteration of different algorithms for newth-yroid2 data.(d) Iteration of different algorithms for dermatology data.(e) Iteration of different algorithms for segment data.(f) Iteration of different algorithms for zoo data.(g) Iteration of different algorithms for lymphography data.(h) Iteration of different algorithms for shuttle data.

(
where, b 1 and b 2 are learning factors, P gbest , t and t are the best positions experienced by the i-th grey wolf individual, ω is the inertial weight, and the inertial weight formula is as follows:

Table 3 .
The confusion matrix table.

Table 4 .
The characteristics of KEEL datasets.

Table 6 .
Classification results of the different algorithms.
30 timesunder each proportion, and calculate the average value as the final test result.

Table 7 .
Classification results of the different algorithms.

Table 8 .
False alarm rate of the different algorithms.