Feature Selection and Classification of Transformer Faults Based on Novel Meta-Heuristic Algorithm

El-kenawy, El-Sayed M.; Albalawi, Fahad; Ward, Sayed A.; Ghoneim, Sherif S. M.; Eid, Marwa M.; Abdelhamid, Abdelaziz A.; Bailek, Nadjem; Ibrahim, Abdelhameed

doi:10.3390/math10173144

Open AccessArticle

Feature Selection and Classification of Transformer Faults Based on Novel Meta-Heuristic Algorithm

¹

Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura 35111, Egypt

²

Electrical Engineering Department, College of Engineering, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

³

Electrical Engineering Department, Shoubra Faculty of Engineering, Benha University, 108 Shoubra St., Cairo 11629, Egypt

⁴

Faculty of Engineering, Delta University for Science and Technology, Mansoura 11152, Egypt

⁵

Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura 11152, Egypt

⁶

Department of Computer Science, College of Computing and Information Technology, Shaqra University, Shaqra 11961, Saudi Arabia

⁷

Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt

⁸

Energies and Materials Research Laboratory, Faculty of Sciences and Technology, University of Tamanghasset, Tamanrasset 11001, Algeria

⁹

Computer Engineering and Control Systems Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(17), 3144; https://doi.org/10.3390/math10173144

Submission received: 29 June 2022 / Revised: 21 August 2022 / Accepted: 29 August 2022 / Published: 1 September 2022

(This article belongs to the Special Issue Data-Driven Methods and Artificial Intelligence in Reliability and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

:

Detecting transformer faults is critical to avoid the undesirable loss of transformers from service and ensure utility service continuity. Transformer faults diagnosis can be determined based on dissolved gas analysis (DGA). The DGA traditional techniques, such as Duval triangle, Key gas, Rogers’ ratio, Dornenburg, and IEC code 60599, suffer from poor transformer faults diagnosis. Therefore, recent research has been developed to diagnose transformer fault and the diagnostic accuracy using combined traditional methods of DGA with artificial intelligence and optimization methods. This paper used a novel meta-heuristic technique, based on Gravitational Search and Dipper Throated Optimization Algorithms (GSDTO), to enhance the transformer faults’ diagnostic accuracy, which was considered a novelty in this work to reduce the misinterpretation of the transformer faults. The robustness of the constructed GSDTO-based model was addressed by the statistical study using Wilcoxon’s rank-sum and ANOVA tests. The results revealed that the constructed model enhanced the diagnostic accuracy up to 98.26% for all test cases.

Keywords:

diagnostic accuracy; transformer faults; Gravitational Search; artificial intelligence

MSC:

68T07; 68T10; 68T20

1. Introduction

The power transformer considers one of the most vital elements in the electrical power system, as the wrong or repeated disconnection leads to the loss of a lot of profits for the electricity companies [1,2]. The undesirable outage of the transformer from the electrical power system occurs as a result of the exposure of the insulation system, which is the insulation oil and insulation paper, to various stresses, whether electrical, thermal, or mechanical. These stresses lead to insulation damage and rapid deterioration, which requires early prediction of the deterioration of the insulation condition inside the transformer. Dissolved gas analysis (DGA) is a common technique that was used for detecting and exploring the transformer faults based on the dissolved gases concentrations [3,4,5].

Several traditional DGA techniques, such as Key gases, Dornenburg, Rogers’ ratios, IEC code, and Duval triangle, contribute to interpreting the cause of transformer faults [6,7,8,9,10]. The DGA methods above were developed to diagnose the transformer faults based on the combustible gases’ ratios [4]. The combustible gases are Hydrogen (

H_{2}

), Methane (

C H_{4}

), Ethan (

C_{2} H_{6}

), Ethylene (

C_{2} H_{4}

), and Acetylene (

C_{2} H_{2}

). Some of these methods used four ratios between the five gases, such as Dornenburg and Rogers’ four ratios and the other methods used three ratios, such as IEC60599 code [11,12,13]. The Duval triangle method is one of the most common DGA techniques which has been used until now for diagnosing transformer faults, which is based the diagnose technique on the three gases (Methane (

C H_{4}

), Ethylene (

C_{2} H_{4}

), and Acetylene (

C_{2} H_{2}

)) [8,9]. The limitation of the traditional DGA techniques is the failure to interpret the transformer faults when the cases are out of the code. New graphical DGA techniques were developed to enhance the diagnostic accuracy of the transformer faults, such as Pentagon [14,15,16] and heptagon [3].

Researchers have recently merged artificial intelligence with the traditional DGA techniques to enhance its diagnostic accuracy. Artificial Neural Networks (ANNs) were utilized to build a smart system to enhance the diagnostic accuracy by comparing the output of five traditional DGA techniques to identify the dominant transformer faults. In addition, the ANN was used in research for the same purpose [17,18,19,20,21]. Fuzzy logic was also merged with traditional DGA techniques to enhance diagnostic accuracy, and several publications were reported [22,23]. A neuro-fuzzy was developed in [24] to enhance the transformer fault diagnostic accuracy. Support vector machine (SVM) and other classifiers were also used based on the feature selection for each fault to diagnose the transformer faults [25,26].

Several kinds of research addressed the utilization of optimization techniques with DGA to adapt the ratio limits of the traditional IEC Code and Rogers’ ratio methods to enhance the diagnostic accuracy of the transformer faults. The Particle Swarm Optimization (PSO) algorithm in [27] was used to modify the limits of the ratios among the gases to overcome the diagnostic failure of the transformer faults. It is also addressed in several works, as in [28,29] that used PSO with the DGA techniques to enhance the transformer faults’ diagnostic accuracy. Some other optimization algorithms were merged with the DGA to increase the fault diagnostic accuracy, such as Teaching learning-based optimization [30], adaptive dynamic meta-heuristics [2], and Grey Wolf Optimization [12].

This paper uses a novel Meta-heuristic algorithm, based on the Gravitational Search and Dipper Throated Optimization Algorithms (GSDTO), to enhance the diagnostic accuracy of the transformer faults’ misinterpretation with the other techniques, whether the traditional DGA or that merged the AI with the traditional DGA methods. As the Dipper Throated Optimization (DTO) algorithm relies on numerous variables in the optimization process, its performance degrades. In addition, the convergence of the algorithm is premature. However, the satisfactory balance between exploration and exploitation is a considerable advantage. To take advantage of this benefit, the suggested approach employs the DTO algorithm. Gravitational Search Algorithm (GSA), despite its simplicity and outstanding balance between exploration and exploitation, has disadvantages such as a low exploration rate and performance decrease when a large number of local optimum solutions exist. Utilizing the dipper throated optimizer, this study uses this algorithm to take advantage of its benefits while compensating for its limitations.

The proposed model uses the raw gases’ concentration regardless of the gases’ ratios considered in the conventional DGA techniques, which resulted in a wrong diagnosis of the fault due to the out-of-code cases. ANOVA statistical analysis is applied to investigate the stability of the proposed model with the uncertainty that occurred in the input data. The results and the statistical analysis indicated the robustness of the constructed algorithm and enhanced the diagnostic accuracy of the transformer faults for the testing samples. The dataset employed in this work consists of 460 samples collected from the literature and the Egyptian Electricity Holding Company central chemical laboratory.

The proposed GSDTO enhances the Long Short Term Memory (LSTM) classification method parameters. Excellent diagnostic accuracy of the transformer faults is achieved based on the proposed GSDTO+LSTM classification model. A binary version of the proposed (GSDTO) algorithm is first used for feature selection from the tested dataset. The binary GSDTO (bGSDTO) algorithm is tested first compared to PSO [31], Grey Wolf Optimizer (GWO) [32], Whale Optimization Algorithm (WOA) [33], Biogeography-Based Optimizer (BBO) [34], Firefly Algorithm (FA) [35], Genetic Algorithm (GA) [36], and Bat Algorithm (BA) [37]. Then, a classifier based on the suggested GSDTO algorithm and LSTM method is examined for the tested dataset. Comparative analysis is performed between the GSDTO+LSTM algorithms and WOA+LSTM, GWO+LSTM, GA+LSTM, and PSO+LSTM algorithms. The GSDTO+LSTM algorithm’s diagnostic accuracy is also examined using randomly selected data from the total number of samples.

This work’s main contributions can be expressed as follows:

A novel Gravitational Search Dipper Throated Optimization Algorithm (GSDTO) is proposed.
A binary GSDTO algorithm, a binary version of the proposed algorithm, is applied for feature selection from the tested dataset.
A GSDTO+LSTM classifier, based on the proposed GSDTO algorithm and LSTM method, is developed to improve the tested dataset classification accuracy.
The GSDTO algorithm’s statistical difference is tested by Wilcoxon’s rank-sum and ANOVA tests.
The GSDTO algorithm is used to improve the LSTM classification method performance for classifying purposes which can be applied in a new high voltage engineering application.
The binary GSDTO algorithm and the LSTM-based classification algorithm can be generalized and tested for various types of datasets.

The organization of this paper is as follows. Materials and methods as presented in Section 2. Section 3 discusses the proposed GSDTO algorithm, binary GSDTO algorithm, and the GSDTO+LSTM-based model. Section 4 shows the experimental results and analysis. The validation and discussion of the proposed GSDTO-based model compared to state-of-the-art models are discussed in this section. In Section 6, the conclusion of this work and the future directions are presented.

2. Materials and Methods

2.1. Distribution of the Data

A total number of 460 samples were used in this study. A number of 386 samples for training and 74 samples for testing purposes were used to investigate the robustness of the constructed model. The collection of samples was from the central chemical laboratory of Egyptian Electricity Holding Company and literature. The dataset has several attributes including Hydrogen (

H_{2}

), Methane (

C H_{4}

), Ethane (

C_{2} H_{6}

), Ethylene (

C_{2} H_{4}

), Acetylene (

C_{2} H_{2}

), Power factor, Interfacial V, Dielectric rigidity, Water content, Health index, Life expectation, Status, Category (ACT), and Time. Table 1 shows the distribution of training data samples. The data samples in the testing process are categorized as 43 samples for Partial Discharge (PD), 69 samples for low energy discharge (D1), 115 samples for high energy discharge (D2), 81 samples for a low thermal fault (T1), 24 samples for a medium thermal fault (T2), and 54 samples for a high thermal fault (T3). Table 2 illustrates the distribution of testing samples based on the source of the data and the fault types. The data samples in the testing process are categorized as seven samples for PD, 13 samples for D1, 24 samples for D2, 16 samples for T1, 4 samples for T2, and 10 samples for T3.

2.2. Machine Learning

One type of classification and prediction model is an artificial neural network (ANN). Complex interactions between data patterns or sets of cause-and-effect variables are modeled with artificial neural networks (ANN). These techniques may include transient detection and pattern recognition. ANN is an information processing pattern composed of neurons that work together to solve problems similar to the brain. When developing an algorithmic solution and extracting the structure from existing data, a neural network is useful [48].

Random forest, a technique based on statistical learning theory, offers a number of benefits, including fewer configurable parameters, greater prediction precision, and increased generalization capacity. It takes several samples from the original sample using the bootstrap sampling method, constructs decision tree models based on each bootstrap sample, integrates multiple decision tree predictions, and determines the outcome through a voting procedure [48].

The conclusions of the prediction algorithm utilizing the k-NN technique are based on historical events similar to the current state based on a measure of distance. The k-NN output values, simple average or weighted averaging, are used to generate forecasts. Consequently, specialists can evaluate the k-NN method’s outcomes. In k-NN numerical prediction, the object’s predictable variable is the average value of its k nearest neighbors. The k-NN algorithm is a fundamental and effective tool for machine learning [48].

The LSTM model is an improved artificial neural network model that may be used to solve a variety of problems, as detailed in [49]. The primary advantage of the LSTM is its ability to retain information for an extended period of time. Figure 1 illustrates the LSTM design. The LSTM model’s initial step is to determine which data from the cell state should be disregarded. As seen in Equation (1), this is accomplished using a forget gate or sigmoid layer.

f_{t} = σ (b_{f} + W_{f} [h_{t - 1}, x_{t}])

(1)

The new data to be stored in the cell state should be determined. The sigmoid layer determines which values require updating, and the tanh layer adds the new candidate to the produced state, as described in Equations (2) and (3).

i_{t} = σ (b_{i} + W_{i} [h_{t - 1}, x_{t}])

(2)

C_{t}^{'} = tanh (b_{i} + W_{i} [h_{t - 1}, x_{t}])

(3)

The old cell state,

C_{t - 1}

, is updated into the new cell state,

C_{t}

, based on the following equation using Equations (1)–(3).

C_{t} = i_{t} \times C_{t}^{'} + f_{t} \times C_{t - 1}

(4)

The cell state will then multiply the sigmoid gate output by tanh and force values between [−1, 1].

h_{t} = o_{t} \times tanh (C_{t}), o_{t} = σ (b_{o} + W_{o} [h_{t - 1}, x_{t}])

(5)

2.3. Dipper Throated Optimization (DTO)

Among passerines, dipper throated birds are unusual. They are excellent swimmers, divers, and hunt underwater. Additionally, their flexible and small wings enable them to fly straight and fast without glides or pauses. The algorithm based on the dipper throated birds’ (DTO) behavior assumes that birds fly and swim in search of food supplies, with

N_{f s}

representing the number of birds. Birds’ positions are marked by

BP

, while

BV

denotes their velocities. The parameters

BP

and

BV

are denoted as follows [50]:

BP = [\begin{matrix} B P_{1, 1} & B P_{1, 2} & B P_{1, 3} & \dots & B P_{1, d} \\ B P_{2, 1} & B P_{2, 2} & B P_{2, 3} & \dots & B P_{2, d} \\ B P_{3, 1} & B P_{3, 2} & B P_{3, 3} & \dots & B P_{3, d} \\ \dots & \dots & \dots & \dots & \dots \\ B P_{n, 1} & B P_{n, 2} & B P_{n, 3} & \dots & B P_{n, d} \end{matrix}]

(6)

BV = [\begin{matrix} B V_{1, 1} & B V_{1, 2} & B V_{1, 3} & \dots & B V_{1, d} \\ B V_{2, 1} & B V_{2, 2} & B V_{2, 3} & \dots & B V_{2, d} \\ B V_{3, 1} & B V_{3, 2} & B V_{3, 3} & \dots & B V_{3, d} \\ \dots & \dots & \dots & \dots & \dots \\ B V_{n, 1} & B V_{n, 2} & B V_{n, 3} & \dots & B V_{n, d} \end{matrix}]

(7)

where

B P_{i, j}

denotes ith position of bird in the jth dimension and

B V_{i, j}

denotes ith bird velocity in the jth dimension. The initial locations of

B P_{i, j}

are distributed uniformly between the lower and higher boundaries. The fitness values

f_{n}

are calculated as shown in the accompanying array.

f = [\begin{matrix} f_{1} (B P_{1, 1}, B P_{1, 2}, B P_{1, 3}, \dots, B P_{1, d}) \\ f_{2} (B P_{2, 1}, B P_{2, 2}, B P_{2, 3}, \dots, B P_{2, d}) \\ f_{3} (B P_{3, 1}, B P_{3, 2}, B P_{3, 3}, \dots, B P_{3, d}) \\ \dots \\ f_{n} (B P_{n, 1}, B P_{n, 2}, B P_{n, 3}, \dots, B P_{n, d}) \end{matrix}]

(8)

Following that, these values are arranged ascendingly.

{BP}_{b e s t}

is declared as the first best solution. The remainder of the answers is expected to be regular birds

{BP}_{n d}

for follower birds.

{BP}_{G b e s t}

is declared to be the global best solution. The term R is a random value within

[0, 1]

and is employed in the DTO algorithm to switch between swimming and flying birds. If

R < 0.5

, the birds are considered to be swimming birds, and their positions are changed as

{BP}_{n d} (t + 1) = {BP}_{b e s t} (t) - C_{1} . | C_{2} . {BP}_{b e s t} (t) - {BP}_{n d} (t) |

(9)

where

{BP}_{b e s t} (t)

is the best bird’s position at iteration t and

{BP}_{n d} (t)

is a bird’s position. The parameters of

C_{1}

and

C_{2}

are calculated as follows.

\begin{matrix} C_{1} = 2 C . r_{1} - C, \\ C_{2} = 2 r_{1}, \\ C = 2 (1 - {(\frac{t}{T_{m a x}})}^{2}) \end{matrix}

(10)

where C changes exponentially from 2 to 0, parameter of

r_{1}

has a random value within

[0, 1]

and maximum number of iterations is denoted as

T_{m a x}

.

If

R \geq 0.5

, the birds are considered to be flying birds and their positions are updated as follows:

{BP}_{n d} (t + 1) = {BP}_{n d} (t) + BV (t + 1)

(11)

The updated bird’s velocity, denoted as

BV (t + 1)

, is calculated as

\begin{matrix} BV (t + 1) = & C_{3} BV (t) + C_{4} r_{2} ({BP}_{b e s t} (t) - {BP}_{n d} (t)) + \\ C_{5} r_{2} ({BP}_{G b e s t} - {BP}_{n d} (t)) \end{matrix}

(12)

where

C_{3}

is a weight value,

C_{4}

and

C_{5}

are constants.

{BP}_{G b e s t}

represents global best position (best of all agents in the population) and

r_{2}

parameter value is within

[0; 1]

. The DTO algorithm is explained step by step in Algorithm 1 [50].

2.4. Gravitational Search Algorithm (GSA)

The GSA algorithm was proposed based on Newton’s laws of motion and gravity and was applied for Solar Radiation Forecasting in [51]. The position and gravitational masses (inertial, active, and passive) represent the main attributes for each agent in this algorithm. A problem’s solution is described by these attributes and decided by a fitness function. In GSA, the agent’s position is defined as in the following equation.

x_{i} = (x_{i}^{1}, \dots, x_{i}^{d}, \dots, x_{i}^{N}), i = 1, 2, \dots, n

(13)

where

x_{i}^{d}

indicates ith agent position in

d^{t h}

dimension of N dimensions and n is number of agents (masses). The

x_{i}^{d} (t)

, the position of each agent, is updated as

x_{i}^{d} (t + 1) = x_{i}^{d} (t) + v_{i}^{d} (t + 1),

(14)

where

v_{i}^{d} (t)

, the velocity of an agent, is calculated as

v_{i}^{d} (t + 1) = {rand}_{i} . v_{i}^{d} (t) + a_{i}^{d} (t)

(15)

where

{rand}_{i}

is a random value in

[0, 1]

. The acceleration of agent i in

d^{t h}

direction, denoted as

a_{i}^{d} (t)

, is updated using the ith agent inertial mass, denoted as

M_{i i} (t)

, as

a_{i}^{d} (t) = \frac{F_{i}^{d} (t)}{M_{i i} (t)}

(16)

Algorithm 1 DTO Algorithm.

1:: Initialization positions of agents ${BP}_{i} (i = 1, 2, \dots, n)$ with n agents, velocities of agents ${BV}_{i} (i = 1, 2, \dots, n)$ , maximum iterations $T_{m a x}$ , objective function $f_{n}$ , parameters of $r_{1}$ , $r_{2}$ , $C_{1}$ , $C_{2}$ , $C_{3}$ , $C_{4}$ , $C_{5}$ , C, R, $t = 1$
2:: Obtain $f_{n}$ for each agent ${BP}_{i}$
3:: Find best agent ${BP}_{b e s t}$
4:: while $t \leq T_{m a x}$ do
5:: for ( $i = 1 : i < n + 1$ ) do
6:: if ( $R < 0.5$ ) then
7:: Update current swimming agent position as
: ${BP}_{n d} (t + 1) = {BP}_{b e s t} (t) - C_{1} . | C_{2} . {BP}_{b e s t} (t) - {BP}_{n d} (t) |$
8:: else
9:: Update current flying agent velocity as
: $BV (t + 1) = C_{3} BV (t) + C_{4} r_{2} ({BP}_{b e s t} (t) - {BP}_{n d} (t)) + C_{5} r_{2} ({BP}_{G b e s t} - {BP}_{n d} (t))$
10:: Update current flying agent position as
: ${BP}_{n d} (t + 1) = {BP}_{n d} (t) + BV (t + 1)$
11:: end if
12:: end for
13:: Obtain $f_{n}$ for each agent ${BP}_{i}$
14:: Update $C_{1}$ , $C_{2}$ , C, R
15:: Find best agent ${BP}_{b e s t}$
16:: Set ${BP}_{G b e s t}$ = ${BP}_{b e s t}$
17:: Set $t = t + 1$
18:: end while
19:: Return ${BP}_{G b e s t}$

Total gravitational force, denoted as

F_{i}^{d} (t)

, is calculated as

F_{i}^{d} (t) = \sum_{j = 1, j \neq i}^{N} {rand}_{j} F_{i j}^{d} (t)

(17)

where

{rand}_{j}

is a random value in

[0, 1]

. The force acting on mass i from mass j, denoted as

F_{i j}^{d} (t)

, is updated by the following equation.

F_{i j}^{d} (t) = G (t) \frac{M_{p i} (t) \times M_{a j} (t)}{| | x_{i} (t), x_{j} (t) {| |}_{2} + ε} (x_{j}^{d} (t) - x_{i}^{d} (d))

(18)

where

M_{p i}

represents passive gravitational mass for agent i and

M_{a j}

represents active gravitational mass for agent j. The

G (t)

parameter indicates the gravitational constant at time t and

ε

represents a constant value. The term

| | x_{i} (t), x_{j} (t) {| |}_{2}

denotes Euclidean distance between agent i and agent j.

The fitness evaluation computes gravitational and inertia masses. The values of masses are computed using the fitness map, assuming that the gravitational and inertia mass are identical. The gravitational and inertial masses are revised by using the following equations:

M_{p i} = M_{a j} = M_{i i} = M_{i}, i = 1, 2, \dots, N

(19)

M_{i} (t) = \frac{m_{i} (t)}{\sum_{j = 1}^{N} m_{j} (t)}, m_{i} (t) = \frac{{fit}_{i} (t) - w o r s t (t)}{b e s t (t) - w o r s t (t)}

(20)

where

{fit}_{i} (t)

indicates the agent i fitness value at iteration t. For a minimization problem,

w o r s t (t)

and

b e s t (t)

are calculated as in the following equations.

b e s t (t) = \min_{j \in {1, \dots, N}} {fit}_{j} (t)

(21)

w o r s t (t) = \max_{j \in {1, \dots, N}} {fit}_{j} (t)

(22)

The GSA algorithm is explained step by step in Algorithm 2. Since all possible solutions are used to update the position of each solution, the GSA algorithm exhibits an extremely exploratory nature. Each solution in a rotation can affect the others depending on their distances and quality. However, the precision of this approach is frequently suboptimal.

Algorithm 2 GSA Algorithm.

1: Initialization positions of agents

x_{i} (i = 1, 2, \dots, n)

with n agents, maximum iterations

T_{m a x}

, objective function

f_{n}

, parameters of

{rand}_{i}

,

{rand}_{j}

,

t = 1

2: Obtain

f_{n}

for each agent

x_{i}

3: Find best agent

x_{b e s t}

4: while

t \leq T_{m a x}

do
5: for (

i = 1 : i < n + 1

) do
6:           Update gravitational and inertia masses by Equations (19) and (20)
7:           Update acceleration of current agent by

a_{i}^{d} (t) = \frac{F_{i}^{d} (t)}{M_{i i} (t)}

8: Update acceleration of current agent by

v_{i}^{d} (t + 1) = {rand}_{i} . v_{i}^{d} (t) + a_{i}^{d} (t)

9: Update position of current agent by

x_{i}^{d} (t + 1) = x_{i}^{d} (t) + v_{i}^{d} (t + 1)

10: end for
11: Obtain

f_{n}

for each agent

x_{i}

12: Update

{rand}_{i}

,

{rand}_{j}

,

t = t + 1

13: Find best agent

x_{b e s t}

14: Set

x_{G b e s t}

=

x_{b e s t}

15: end while
16: Return

x_{G b e s t}

3. Proposed Methodology

3.1. Proposed GSDTO Algorithm

The proposed Gravitational Search Dipper Throated optimization (GSDTO) algorithm is presented in Algorithm 3 step by step. The GSDTO algorithm covers the disadvantages of the DTO and GSA algorithms and combines their advantages to achieve the best global solution. The algorithm starts with initializing the positions of predetermined n agents

x_{i} (i = 1, 2, \dots, n)

and their velocities

v_{i} (i = 1, 2, \dots, n)

. It sets the allowed iterations for the execution process as

T_{m a x}

, the objective function

f_{n}

, the DTO parameters of

r_{1}

,

r_{2}

,

C_{1}

,

C_{2}

,

C_{3}

,

C_{4}

,

C_{5}

, C, R, and the GSA parameters of

{rand}_{i}

and

{rand}_{j}

. The term

{r a n d}_{G S D T O}

is a random value within

[0, 1]

.

Algorithm 3 Proposed GSDTO Algorithm.

1: Initialization positions of agents

x_{i} (i = 1, 2, \dots, n)

with n agents, velocities of agents

v_{i} (i = 1, 2, \dots, n)

, maximum number of iterations

T_{m a x}

, objective function

f_{n}

, parameters of

r_{1}

,

r_{2}

,

C_{1}

,

C_{2}

,

C_{3}

,

C_{4}

,

C_{5}

, C, R,

{rand}_{i}

,

{rand}_{j}

,

{r a n d}_{G S D T O}

,

t = 1

2: Obtain

f_{n}

for each agent

x_{i}

3: Find best agent

x_{b e s t}

4: Set

x_{G b e s t}

=

x_{b e s t}

5: while

t \leq T m a x

do
6: if (

{r a n d}_{G S D T O} > 0.5

) then
7: for (

i = 1 : i < n + 1

) do
8: if (

R < 0.5

) then
9: Update current swimming agent position by

x (t + 1) = x_{b e s t} (t) - C_{1} . | C_{2} . x_{b e s t} (t) - x (t) |

10:        else
11:           Update current flying agent velocity by

v (t + 1) = C_{3} v (t) + C_{4} r_{2} (x_{b e s t} (t) - x (t)) + C_{5} r_{2} (x_{G b e s t} - x (t))

12: Update current flying agent position by

x (t + 1) = x (t) + v (t + 1)

13:         end if
14:      end for
15:    else
16:      for (

i = 1 : i < n + 1

) do
17: Update gravitational and inertia masses by Equations (29) and (30)
18: Update acceleration of current agent by

a (t) = \frac{F (t)}{M_{i i} (t)}

19:         Update velocity of current agent by Equation (27)
20:         Update position of current agent by Equation (26)
21:      end for
22:    end if
23:    Obtain

f_{n}

for each agent

x_{i}

24: Update

C_{1}

,

C_{2}

, C, R,

{rand}_{i}

,

{rand}_{j}

,

{r a n d}_{G S D T O}

25: Find best agent

x_{b e s t}

26: Set

x_{G b e s t}

=

x_{b e s t}

27: Set

t = t + 1

28: end while
29: Return best agent

x_{G b e s t}

If

{r a n d}_{G S D T O} > 0.5

, the GSDTO algorithm starts to update the positions and velocities of agents as in the following equations. The positions will be updated considering swimming agent if

R < 0.5

by

x (t + 1) = x_{b e s t} (t) - C_{1} . | C_{2} . x_{b e s t} (t) - x (t) |

(23)

Otherwise, agents will be considered as flying agents and the positions is updated as

x (t + 1) = x (t) + v (t + 1)

(24)

where

v (t + 1)

, updated velocity of each agent, will be calculated as

v (t + 1) = C_{3} v (t) + C_{4} r_{2} (x_{b e s t} (t) - x_{n d} (t)) + C_{5} r_{2} (x_{G b e s t} - x (t))

(25)

If

{r a n d}_{G S D T O} \leq 0.5

, the GSDTO algorithm will update the positions and velocities of the agents according to the following equations. The

x (t)

, position of each agent at iteration t, will be calculated as follows.

x (t + 1) = x (t) + v (t + 1),

(26)

where

v (t)

, velocity of each agent at iteration t, is changed as

v (t + 1) = {rand}_{i} . v (t) + a (t)

(27)

where the parameter

a (t)

, acceleration of each agent at iteration t, changes as

a (t) = \frac{F (t)}{M_{i i} (t)}

(28)

The values of masses are computed using the fitness map for a minimization problem, assuming that the gravitational and inertia mass are identical by the following equations:

M_{p i} = M_{a j} = M_{i i} = M_{i}, i = 1, 2, \dots, N

(29)

M_{i} (t) = \frac{m_{i} (t)}{\sum_{j = 1}^{N} m_{j} (t)}

(30)

m_{i} (t) = \frac{{fit}_{i} (t) - \max_{j \in {1, \dots, N}} {fit}_{j} (t)}{\min_{j \in {1, \dots, N}} {fit}_{j} (t) - \max_{j \in {1, \dots, N}} {fit}_{j} (t)}

(31)

where

{fit}_{i} (t)

indicates the agent i fitness value at iteration t.

The GSDTO algorithm’s computational complexity in this work is expressed as follows. For iterations

T_{m a x}

and n number of agents, the complexity is defined as

Initialize parameters of the GSDTO algorithm, $x_{i} (i = 1, 2, \dots, n)$ , $v_{i} (i = 1, 2, \dots, n)$ , $T_{m a x}$ , $C_{1}$ , $C_{2}$ , $C_{3}$ , $C_{4}$ , $C_{5}$ , $r_{1}$ , $r_{2}$ , C, R, ${rand}_{i}$ , ${rand}_{j}$ , ${r a n d}_{G S D T O}$ , and $t = 1$ : O(1).
Calculate $f_{n}$ for each agent $x_{i}$ : O(n).
Obtain the best agent $x_{b e s t}$ : O (n).
Update current swimming agent position: O( $T_{m a x} \times n$ ).
Update current flying agent velocity: O( $T_{m a x} \times n$ ).
Update current flying agent position: O( $T_{m a x} \times n$ ).
Update acceleration of current agent $a_{i} (t)$ : O( $T_{m a x} \times n$ ).
Update velocity of current agent $v_{i} (t + 1)$ : O( $T_{m a x} \times n$ ).
Update position of current agent $x_{i} (t + 1)$ : O( $T_{m a x} \times n$ ).
Calculate $f_{n}$ for each agent $x_{i}$ : O( $T_{m a x}$ ).
Update $C_{1}$ , $C_{2}$ , C, R, ${rand}_{i}$ , ${rand}_{j}$ , ${r a n d}_{G S D T O}$ : O( $T_{m a x}$ ).
Obtain best agent $x_{b e s t}$ : O( $T_{m a x}$ ).
Set $x_{G b e s t}$ = $x_{b e s t}$ : O( $T_{m a x}$ ).
Set $t = t + 1$ : O( $T_{m a x}$ ).
Obtain global best agent $x_{G b e s t}$ : O(1)

Based on the above analysis of the GSDTO algorithm, the computation complexity is set to O(

T_{m a x} \times n

) and it will be O(

T_{m a x} \times n \times d

) for d dimension.

3.2. Proposed Binary GSDTO Algorithm

The solutions of the GSDTO algorithm will be strictly binary, with values of 0 or 1, in case of feature selection issues. Thus, the continuous values of the proposed GSDTO algorithm will be transformed to binary [0, 1] to facilitate the feature selection process from the dataset. This study employs the following equation, which is based on the

S i g m o i d

function [52].

\begin{matrix} x_{d}^{t + 1} = \{\begin{matrix} 1 & if S i g m o i d (m) \geq 0.5 \\ 0 & o t h e r w i s e \end{matrix}, \\ S i g m o i d (m) = \frac{1}{1 + e^{- 10 (m - 0.5)}}, \end{matrix}

(32)

where

x_{d}^{t + 1}

, at iteration t and dimension d, denotes the binary solution. The

S i g m o i d

function can scale the output solutions to be binary ones. If

S i g m o i d (m) \geq 0.5

, this will change the value to 1; otherwise, the value will be 0. The m parameter reflects the algorithm’s selected features. Figure 2 shows how the

S i g m o i d

function scales the output solutions to binary [0, 1]. The binary GSDTO algorithm is described in detail in Algorithm 4. By analyzing the bGSDTO algorithm, the computation complexity is set to O(

t_{m a x} \times n

) and it will be O(

t_{m a x} \times n \times d

) for d dimension.

Algorithm 4 Proposed Binary GSDTO Algorithm.

1:: Initialization parameters of the GSDTO
2:: Convert solution to binary [0 or 1]
3:: Obtain fitness for each agent and Find best agent
4:: while $t \leq T m a x$ do
5:: if ( $r a n d_{G S D T O} > 0.5$ ) then
6:: for ( $i = 1 : i < n + 1$ ) do
7:: if ( $R < 0.5$ ) then
8:: Update current swimming agent position
9:: else
10:: Update current flying agent velocity and position
11:: end if
12:: end for
13:: else
14:: for ( $i = 1 : i < n + 1$ ) do
15:: Update gravitational and inertia masses
16:: Update acceleration of current agent
17:: Update velocity and position of current agent
18:: end for
19:: end if
20:: Convert updated solution to binary
21:: Obtain fitness for each agent and Find best agent
22:: Update parameters
23:: end while
24:: Return best solution

3.3. Proposed GSDTO+LSTM Based Model

Figure 3 shows the proposed model based on the presented GSDTO algorithm. The proposed GSDTO+LSTM-based model is constructed based on two main phases: feature selection and classification. The first phase of the proposed model focuses on preprocessing and feature selection. The preprocessing is interesting in cleaning and normalizing the tested dataset, including correlation analysis, feature scaling, and removing null values. Feature selection in this phase is performed after cleaning the dataset. The binary version of the proposed GSDTO algorithm is employed in this step for selecting the best number of features from the total number of attributes in the tested dataset. The validation of this phase, to confirm the quality of the bGSDTO algorithm in feature selection, is performed in the experiments using feature selection performance metrics, including average error and standard deviation.

The second phase, the intermediate phase in Figure 3, which is utilized to compare results, includes applying the base models of NN, k-NN, and RF for the selected features in phase one. The output results of the three base classifiers in phase two are recorded to be compared with the GSDTO+LSTM model classification results in the third phase. The last stage of the proposed GSDTO+LSTM-based model, the classification phase, includes optimizing the LSTM model parameters by the presented GSDTO algorithm. The classification in the third phase of the model is applied based on the selected features in phase one, and the output results are compared with the results of phase two as the final results of this model.

4. Experimental Results

This section explains the results of this study in detail. The experiments are divided into three scenarios. The first scenario discusses the feature selection ability of the proposed bGSDTO algorithm for the tested dataset, while the second scenario shows the presented algorithm’s ability for classification purposes. In the third and last scenario, the GSDTO+LSTM algorithm’s diagnostic accuracy is examined using randomly selected data from the total number of samples.

4.1. Feature Selection Scenario

The binary version of the proposed (GSDTO) algorithm is used for feature selection from the tested dataset as mentioned in Figure 3. In the first scenario, the feature selection results of this study’s presented GSDTO algorithm are discussed. Table 3 shows the GSDTO algorithm configuration of all parameters used in the experiment, while Table 4 presents the compared algorithms configuration. The binary GSDTO (bGSDTO) algorithm is tested compared to PSO [31], GWO [32], WOA [33], BBO [34], FA [35], GA [36], and BA [37].

The objective equation,

f_{n}

, is used to determine the quality of a solution in the binary GSDTO algorithm.

f_{n}

is employed in the following equation for a classifier’s error rate,

E r r

, a number of selected features, s, and a number of missing features, S, as follows.

f_{n} = α E r r + β \frac{| s |}{| S |}

(33)

where

α \in [0, 1]

and

β = 1 - α

indicate the population relevance of the given feature. The approach is satisfactory if it can provide a subset of features capable of producing a low classification error rate. K-nearest neighbor (k-NN) is a widely used, straightforward classification technique. The goodness of the chosen features is ensured in this method by using the k-NN as a classifier. The only factor used to determine classifiers is the smallest distance between the query instance and the training examples; no model for K-nearest neighbors is used in this experiment.

In Table 5, the performance metrics employed to measure the feature selection results of the proposed algorithm are presented. The variables mentioned in Table 5 are: number of runs of the optimizer, M, best solution at run j,

g_{j}^{*}

, the

g_{j}^{*}

vector size,

s i z e (g_{j}^{*})

, number of features, D, and N as number points, the output label of the classifier for a point i,

C_{i}

, the label of class for a point i,

L_{i}

, and the

M a t c h

function for matching calculation between two inputs.

The suggested and compared algorithms’ feature selection results, based on 20 runs and 80 iterations for ten agents as mentioned in Table 3, are provided in Table 6. The minimum average error of (0.1969) and the standard deviation of (0.0824) show the performance of the presented bGSDTO algorithm. The second best algorithm in feature selection of the tested data, minimum average error, is bGWO with (0.2141), followed by bBBO with (0.2161) and then bGA with (0.2277). The worst algorithm in feature selection is the bBA algorithm, with an average error of (0.2575). The selected features by the proposed algorithm from the tested dataset total features are named (

H_{2}

), (

C H_{4}

), (

C_{2} H_{6}

), (

C_{2} H_{4}

), (

C_{2} H_{2}

), and (

A C T

).

The statistical analysis, using a one-way analysis of variance (ANOVA) and Wilcoxon Signed Rank tests, is conducted to determine the performance of the proposed binary GSDTO algorithm based on the average error. Wilcoxon’s test is utilized to determine the p-values comparing the suggested algorithm to other algorithms. This statistical test can assess whether or not there is a significant difference between the results of the proposed algorithm and those of other algorithms with a p-value < 0.05. To determine whether there is a statistically significant difference between the suggested algorithm and other algorithms analyzed, an ANOVA test was also conducted. The ANOVA test results for the suggested versus compared algorithms are shown in Table 7, while Table 8 compares the proposed and compared algorithms using the Wilcoxon Signed-Rank test. The statistical analysis is performed using 20 runs of the presented and compared algorithms for fair comparisons.

Figure 4 uses the convergence curve for comparing the provided and comparable algorithms in feature selection. During the population initialization of the GSDTO algorithm and compared algorithm, the number of agents is set to 10, and the number of iterations is set to 80. The distribution of agents is then set randomly for fair comparisons, as shown in Figure 4. The proposed GSDTO algorithm shows fast convergence than other algorithms. Figure 5 also depicts the average error versus objective function for the provided and compared algorithms. According to Figure 5, the proposed GSDTO algorithm shows better results than other algorithms based on average error.

The residual, QQ (quantile-quantile), homoscedasticity plots, and heat map of the presented and comparative binary methods are shown in Figure 6. As opposed to the plot of the original dataset, the potential issues can be seen in the residual values and plots. Some datasets are poor categorization candidates. A residual plot places the independent variable on the horizontal axis and the residual values on the vertical axis. If the residual values are evenly and randomly distributed across the horizontal axis, the ideal scenario is realized. When the mean and the total of the residuals are equal to zero, the residual value is determined as (Actual value-Predicted value). The residual plot is displayed in Figure 6. A residual plot’s plot patterns can be used to identify whether a model is linear or nonlinear and which one is most appropriate. The homogeneity of variance or heteroscedasticity is visually inspected together with the projected scores for the dependent variable. A scenario known as homoscedasticity occurs when the error term, also known as noise or random disturbance in the relationship between the dependent and independent variables, is constant across all values of the independent variables. The accuracy of the research findings is increased by the heteroscedasticity plot, which is depicted in Figure 6. It can immediately and readily identify any violation.

In Figure 6, the QQ plot is also displayed. An example is a probability plot. It is mostly used to compare two probability distributions by graphing the quantiles against one another. The point distributions in the QQ plot may be seen to roughly fit on the line in the figure. As a result, the relationship between the actual and anticipated residuals is linear, supporting the effectiveness of the suggested approach. The provided and compared algorithms’ heat maps are displayed in Figure 6 as a data visualization tool. A two-dimensional color scale’s intensity reveals an algorithm’s level of sophistication. With regard to how the proposed method is superior to the comparable algorithms, the color variation provides clear visual indications. These figures confirm the quality of the bGSDTO algorithm in feature selection as mentioned in Figure 3.

4.2. Classification Scenario

The experiments’ second scenario discusses the classification results of the presented GSDTO algorithm based on the LSTM classifier for 20 runs and 80 iterations using ten agents as mentioned in Table 3. The basic classifiers of NN, k-NN, and RF models and the proposed GSDTO algorithm-based LSTM method are applied to the selected features from the tested dataset in phase 1 as mentioned in Figure 3. The classification results of the proposed algorithm are compared with the WOA+LSTM, GWO+LSTM, GA+LSTM, and PSO+LSTM based models and the basic models to show the performance of the presented algorithm. There are four hyperparameters to train the LSTM model, which are fed to the proposed algorithm (GSDTO). These hyperparameters are the size of attention weights set

N_{a}

, encoding length for each attention weights

L_{e}

, size of champion attention weights subset

W_{a}

, and number of epochs

T_{e}

.

The performance metrics used in this part are the Mean Square Error (MSE) and the Area Under the ROC Curve (AUC). Table 9 presents the configuration parameters of basic classification models of NN, k-NN, and RF employed in this scenario. Table 10 shows the signal/basic classification model results of AUC and MSE for the basic classification models. The best results with the maximum AUC value of (0.797) and the minimum MSE value of (0.04887) are achieved by the RF model, followed by the NN model and then the k-NN model.

The classification results of the proposed and compared algorithms, based on optimizing the parameters of the LSTM model, are shown in Table 11. The results with the maximum AUC value of (0.9826) and the minimum MSE value of (0.00001413) are achieved by the GSDTO+LSTM model. The results show the model’s superiority compared to the best basic classifier of the RF model, shown in Table 10, and the other models based on the LSTM method in Table 11. The WOA+LSTM-based model achieves the second best classification results with AUC of (0.957) and MSE of (0.000319), followed by the PSO+LSTM-based model, GWO+LSTM-based model, and the GA+LSTM-based model achieves the worst results.

The presented GSDTO+LSTM and compared classifiers statistical descriptions and the Wilcoxon Signed-Rank test results are shown in Table 12 and Table 13, respectively, based on 20 runs and 80 iterations for 10 agents as mentioned in Table 3, for fair comparison. This statistical test shows the significant difference between the results of the proposed algorithm (GSDTO+LSTM) and those of other algorithms with a p-value < 0.05.

Figure 7 compares the presented and compared algorithms convergence curves based on the LSTM model. The proposed GSDTO+LSTM algorithm shows faster convergence than other algorithms. The box plot in Figure 8 indicates the MSE of the presented and compared algorithms-based LSTM model. This figure shows that the GSDTO+LSTM algorithm achieved minimum MSE results. Figure 9 shows the histogram of MSE for the presented and compared algorithms using the LSTM model, based on the number of values with the Bin Center range (0.0–0.00072), which confirms the stability of the proposed algorithm. Figure 8 and Figure 9 present the visualization of changing the agents impact for the tested dataset for each run. Figure 10 shows the ROC curve for the presented GSDTO with PSO and WOA algorithms based LSTM model.

The residual, QQ, homoscedasticity plots, and heat map of the presented and compared algorithms based on the LSTM model are shown in Figure 11. The residual plot places the independent variable on the horizontal axis and the residual values on the vertical axis. The residual plot is displayed in Figure 11. The homogeneity of variance or heteroscedasticity is visually inspected together with the projected scores for the dependent variable and can identify any violation. The accuracy of the research findings is increased by the heteroscedasticity plot, which is depicted in Figure 11. In Figure 11, the QQ plot is displayed, and it is used to compare two probability distributions by graphing the quantiles against one another. The point distributions in the QQ plot may be seen to roughly fit on the line in the figure. As a result, the relationship between the actual and anticipated residuals is almost linear, supporting the effectiveness of the suggested approach. The provided and compared algorithms’ heat maps are displayed in Figure 11. A two-dimensional color scale’s intensity reveals an algorithm’s level of sophistication. These figures confirm the quality of the GSDTO+LSTM model as mentioned in Figure 3.

4.3. Validation and Discussion

The diagnostic accuracy of the suggested classification algorithm is compared to that of various DGA algorithms in the literature in Table 14. As shown in Table 14, the proposed classification algorithm has an overall diagnostic accuracy of 98.26 percent, which is higher than the diagnostic accuracy of other DGA techniques. The adaptive and NPR methods, which provide 94.6 and 90.54 percent accuracy, respectively, come close to the suggested algorithm. On the other hand, other classic DGA procedures, such as the Rogers Ratio method (45.95 percent), IEC 60599 (50 percent), and the Duval triangle method, have low overall diagnostic accuracies (66.27 percent). The diagnostic accuracy results for the suggested classification method indicated that it possesses a high capacity for correctly diagnosing transformer defects.

5. Sensitivity Analysis of the GSDTO Parameters

This section explores the GSDTO’s parameter sensitivity analysis. The GSDTO consists of five parameters: the R-Parameter, the exploration percentage, the population size, the number of iterations, and the C-Parameter. These settings determine the performance of the algorithm when solving the evaluated optimization problem in this work. Any change to a parameter can influence the optimization technique. As a result, a sensitivity analysis of these parameters is conducted.

5.1. One-at-a-Time Sensitivity Analysis

The sensitivity analysis has been conducted using the One-at-a-Time (OAT) sensitivity measure [54]. OAT is recognized as one of the simplest strategies for sensitivity analysis. OAT examines the performance of an algorithm by varying a single parameter while leaving the others unchanged. Table 15 and Table 16 detail the observed variations in GSDTO’s time and fitness values as parameter values are adjusted. As shown in Table 15 and Table 16, we selected 20 distinct values within the interval of each parameter by adding 5% to its length to generate a new evaluation value. Each of these variables underwent ten runs of the algorithm, and the averages for time and fitness are displayed in the table.

5.2. Regression Analysis

It has been determined through regression analysis how the algorithm’s parameters can explain algorithm performance variations. Regression analysis is appropriate when we need to predict the value of a dependent variable (algorithm performance) based on the value of an independent variable (parameter). The results of the regression analysis for the GSDTO parameters, convergence time, and fitness are shown in Table 17. The value of R Square indicates the proportion of the total variance in time or fitness that can be explained by the parameter values. According to Table 17, the number of iterations and population size have the highest R Square value for convergence time, indicating that they can explain any variation in convergence time exceptionally well. In contrast, the exploration percentage and the R-Parameter can account for 60.3% and 78.5% of the variance in convergence time, respectively. In Table 17, significance F values less than 0.05 indicate that the regression model significantly predicts the algorithm’s performance. The GSDTO’s parameters convergence time and minimum fitness versus objective function are shown in Figure 12.

5.3. Statistical Significance

To determine whether there is a statistically significant difference between the means of the observations in Table 15 and Table 16, we conducted an ANOVA on convergence time and fitness values while adjusting the GSDTO’s parameters. The results of the GSDTO’s ANOVA test are shown in Table 18. All p-values in Table 18 are less than 0.05. There is a statistically significant difference between the means of the five groups of convergence time and the five groups of minimal fitness observed by adjusting parameter values. A T-test with one tail was run at a significance level of 0.05. Table 19 provides the T-Test results for each observed pair of convergence time and minimum fitness parameters for GSDTO. In the table, p-values less than 0.05 indicate a statistically significant difference between groups.

6. Conclusions and Future Work

This research proposed a novel meta-heuristic technique based on a dataset from real-technical systems to classify the dissolved gas analysis (DGA) for transformer faults diagnosis, which is one of the most vital elements in the electrical power system. Initially, the suggested binary (GSDTO) technique is employed to choose features from the evaluated dataset. The binary GSDTO (bGSDTO) method is evaluated against PSO, GWO, WOA, BBO, FA, GA, and BA. A classifier based on the proposed GSDTO algorithm and LSTM approach is then used on the tested dataset. Comparing the classification results with WOA+LSTM, GWO+LSTM, GA+LSTM, and PSO+LSTM. The GSDTO+LSTM algorithm’s diagnostic accuracy is also examined using randomly selected data. Based on the statistical investigation, the robustness of the built model was examined. The results demonstrated that the developed model increased the diagnostic accuracy for all test cases to 98.26%. The sensitivity analysis of the GSDTO’s parameters, R-Parameter, exploration percentage, population size, number of iterations, and C-Parameter, confirm the performance of the algorithm. In the future, the binary GSDTO method and the GSDTO+LSTM-based classification algorithm can be generalized and evaluated on various datasets. Some additional experiments will be done to evaluate the scalability, runtime, and memory for the presented GSDTO+LSTM algorithm.

Author Contributions

Conceptualization, E.-S.M.E.-k., A.I. and A.A.A.; methodology, A.I.; software, E.-S.M.E.-k.; validation, S.S.M.G., N.B. and A.A.A.; formal analysis, S.S.M.G.; investigation, M.M.E.; resources, M.M.E.; data curation, S.S.M.G.; writing—original draft preparation, A.A.A. and A.I.; writing—review and editing, S.A.W., S.S.M.G. and N.B.; visualization, E.-S.M.E.-k.; supervision, S.A.W.; project administration, F.A.; funding acquisition, F.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Taif University Researchers Supporting Project TURSP-2020/97, Taif University, Taif, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors appreciate the Taif University Researchers Supporting Project TURSP-2020/97, Taif University, Taif, Saudi Arabia, for funding this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Badawi, M.; Ibrahim, S.A.; Mansour, D.E.A.; El-Faraskoury, A.A.; Ward, S.A.; Mahmoud, K.; Lehtonen, M.; Darwish, M.M.F. Reliable Estimation for Health Index of Transformer Oil Based on Novel Combined Predictive Maintenance Techniques. IEEE Access 2022, 10, 25954–25972. [Google Scholar] [CrossRef]
Ghoneim, S.S.M.; Farrag, T.A.; Rashed, A.A.; El-Kenawy, E.S.M.; Ibrahim, A. Adaptive Dynamic Meta-Heuristics for Feature Selection and Classification in Diagnostic Accuracy of Transformer Faults. IEEE Access 2021, 9, 78324–78340. [Google Scholar] [CrossRef]
Gouda, O.E.; El-Hoshy, S.H.; El-Tamaly, H.H. Proposed heptagon graph for DGA interpretation of oil transformers. IET Gener. Transm. Distrib. 2018, 12, 490–498. [Google Scholar] [CrossRef]
Ward, S.A.; El-Faraskoury, A.; Badawi, M.; Ibrahim, S.A.; Mahmoud, K.; Lehtonen, M.; Darwish, M.M.F. Towards Precise Interpretation of Oil Transformers via Novel Combined Techniques Based on DGA and Partial Discharge Sensors. Sensors 2021, 21, 2223. [Google Scholar] [CrossRef]
Benmahamed, Y.; Kherif, O.; Teguar, M.; Boubakeur, A.; Ghoneim, S.S.M. Accuracy Improvement of Transformer Faults Diagnostic Based on DGA Data Using SVM-BA Classifier. Energies 2021, 14, 2970. [Google Scholar] [CrossRef]
Std C57.104-2008; IEEE Guide for the Interpretation of Gases Generated in Oil-Immersed Transformers. IEEE: Piscataway, NJ, USA, 2009; pp. 1–36. [CrossRef]
IEC 60599: Mineral Oil-Filled Electrical Equipment in Service—Guidance on the Interpretation of Dissolved and Free Gases Analysis, Edition 2.1; IEC: Geneva, Switzerland, 2007.
Duval, M. A review of faults detectable by gas-in-oil analysis in transformers. IEEE Electr. Insul. Mag. 2002, 18, 8–17. [Google Scholar] [CrossRef]
Duval, M.; dePabla, A. Interpretation of gas-in-oil analysis using new IEC publication 60599 and IEC TC 10 databases. IEEE Electr. Insul. Mag. 2001, 17, 31–41. [Google Scholar] [CrossRef]
Gouda, O.E.; El-Hoshy, S.H.; Ghoneim, S.S.M. Enhancing the Diagnostic Accuracy of DGA Techniques Based on IEC-TC10 and Related Databases. IEEE Access 2021, 9, 118031–118041. [Google Scholar] [CrossRef]
Faiz, J.; Soleimani, M. Assessment of computational intelligence and conventional dissolved gas analysis methods for transformer fault diagnosis. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 1798–1806. [Google Scholar] [CrossRef]
Hoballah, A.; Mansour, D.E.A.; Taha, I.B.M. Hybrid Grey Wolf Optimizer for Transformer Fault Diagnosis Using Dissolved Gases Considering Uncertainty in Measurements. IEEE Access 2020, 8, 139176–139187. [Google Scholar] [CrossRef]
Li, X.; Wu, H.; Wu, D. DGA Interpretation Scheme Derived From Case Study. IEEE Trans. Power Deliv. 2011, 26, 1292–1293. [Google Scholar] [CrossRef]
The duval pentagon-a new complementary tool for the interpretation of dissolved gas analysis in transformers. IEEE Electr. Insul. Mag. 2014, 30, 9–12. [CrossRef]
Cheim, L.; Duval, M.; Haider, S. Combined Duval Pentagons: A Simplified Approach. Energies 2020, 13, 2859. [Google Scholar] [CrossRef]
Mansour, D.E.A. Development of a new graphical technique for dissolved gas analysis in power transformers based on the five combustible gases. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 2507–2512. [Google Scholar] [CrossRef]
Miranda, V.; Castro, A. Improving the IEC Table for Transformer Failure Diagnosis with Knowledge Extraction from Neural Networks. IEEE Trans. Power Deliv. 2005, 20, 2509–2516. [Google Scholar] [CrossRef]
Souahlia, S.; Bacha, K.; Chaari, A. MLP neural network-based decision for power transformers fault diagnosis using an improved combination of Rogers and Doernenburg ratios DGA. Int. J. Electr. Power Energy Syst. 2012, 43, 1346–1353. [Google Scholar] [CrossRef]
Guardado, J.; Naredo, J.; Moreno, P.; Fuerte, C. A comparative study of neural network efficiency in power transformers diagnosis using dissolved gas analysis. IEEE Trans. Power Deliv. 2001, 16, 643–647. [Google Scholar] [CrossRef]
Equbal, M.D.; Khan, S.A.; Islam, T. Transformer incipient fault diagnosis on the basis of energy-weighted DGA using an artificial neural network. Turk. J. Electr. Eng. Comput. Sci. 2018, 26, 77–88. [Google Scholar] [CrossRef]
Ou, M.; Wei, H.; Zhang, Y.; Tan, J. A Dynamic Adam Based Deep Neural Network for Fault Diagnosis of Oil-Immersed Power Transformers. Energies 2019, 12, 995. [Google Scholar] [CrossRef] [Green Version]
Abu-Siada, A.; Hmood, S.; Islam, S. A new fuzzy logic approach for consistent interpretation of dissolved gas-in-oil analysis. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 2343–2349. [Google Scholar] [CrossRef]
Noori, M.; Effatnejad, R.; Hajihosseini, P. Using dissolved gas analysis results to detect and isolate the internal faults of power transformers by applying a fuzzy logic method. IET Gener. Transm. Distrib. 2017, 11, 2721–2729. [Google Scholar] [CrossRef]
Mulyodinoto, K.U.; Prasojo, R.A.; Abu-Siada, A. Applications of ANFIS to Estimate the Degree of Polymerization Using Transformer Dissolve Gas Analysis and Oil Characteristics. Polym. Sci. 2018, 14, 1–9. [Google Scholar]
Bacha, K.; Souahlia, S.; Gossa, M. Power transformer fault diagnosis based on dissolved gas analysis by support vector machine. Electr. Power Syst. Res. 2012, 83, 73–79. [Google Scholar] [CrossRef]
Zhang, Y.; Li, X.; Zheng, H.; Yao, H.; Liu, J.; Zhang, C.; Peng, H.; Jiao, J. A Fault Diagnosis Model of Power Transformers Based on Dissolved Gas Analysis Features Selection and Improved Krill Herd Algorithm Optimized Support Vector Machine. IEEE Access 2019, 7, 102803–102811. [Google Scholar] [CrossRef]
Taha, I.B.M.; Hoballah, A.; Ghoneim, S.S.M. Optimal ratio limits of rogers’ four-ratios and IEC 60599 code methods using particle swarm optimization fuzzy-logic approach. IEEE Trans. Dielectr. Electr. Insul. 2020, 27, 222–230. [Google Scholar] [CrossRef]
Illias, H.A.; Chai, X.R.; Bakar, A.H.A.; Mokhlis, H. Transformer Incipient Fault Prediction Using Combined Artificial Neural Network and Various Particle Swarm Optimisation Techniques. PLoS ONE 2015, 10, e0129363. [Google Scholar] [CrossRef]
Illias, H.A.; Chai, X.R.; Bakar, A.H.A. Hybrid modified evolutionary particle swarm optimisation-time varying acceleration coefficient-artificial neural network for power transformer fault diagnosis. Measurement 2016, 90, 94–102. [Google Scholar] [CrossRef]
Ghoneim, S.S.M.; Mahmoud, K.; Lehtonen, M.; Darwish, M.M.F. Enhancing Diagnostic Accuracy of Transformer Faults Using Teaching-Learning-Based Optimization. IEEE Access 2021, 9, 30817–30832. [Google Scholar] [CrossRef]
Bello, R.; Gomez, Y.; Nowe, A.; Garcia, M.M. Two-Step Particle Swarm Optimization to Solve the Feature Selection Problem. In Proceedings of the Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007), Rio de Janeiro, Brazil, 20–24 October 2007; pp. 691–696. [Google Scholar] [CrossRef]
El-Kenawy, E.S.M.; Eid, M.M.; Saber, M.; Ibrahim, A. MbGWO-SFS: Modified Binary Grey Wolf Optimizer Based on Stochastic Fractal Search for Feature Selection. IEEE Access 2020, 8, 107635–107649. [Google Scholar] [CrossRef]
Eid, M.M.; El-kenawy, E.S.M.; Ibrahim, A. A binary Sine Cosine-Modified Whale Optimization Algorithm for Feature Selection. In Proceedings of the 2021 National Computing Colleges Conference (NCCC), Taif, Saudi Arabia, 27–28 March 2021. [Google Scholar] [CrossRef]
Simon, D. Biogeography-Based Optimization. IEEE Trans. Evol. Comput. 2008, 12, 702–713. [Google Scholar] [CrossRef]
Fister, I.; Yang, X.S.; Fister, I.; Brest, J. Memetic Firefly Algorithm for Combinatorial Optimization. arXiv 2012, arXiv:1204.5165. [Google Scholar]
Kabir, M.M.; Shahjahan, M.; Murase, K. A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 2011, 74, 2914–2928. [Google Scholar] [CrossRef]
Karakonstantis, I.; Vlachos, A. Bat algorithm applied to continuous constrained optimization problems. J. Inf. Optim. Sci. 2021, 42, 57–75. [Google Scholar] [CrossRef]
Egyptian Electricity Holding Company (EEHC) Reports. Available online: http://www.moee.gov.eg/english_new/report.aspx (accessed on 18 April 2022).
Agrawal, S.; Chandel, A.K. Transformer incipient fault diagnosis based on probabilistic neural network. In Proceedings of the 2012 Students Conference on Engineering and Systems, Allahabad, India, 16–18 March 2012; IEEE: Piscataway, NJ, USA, 2012. [Google Scholar] [CrossRef]
Wang, M.H. A novel extension method for transformer fault diagnosis. IEEE Trans. Power Deliv. 2003, 18, 164–169. [Google Scholar] [CrossRef]
Zhu, Y.; Wang, F.; Geng, L. Transformer Fault Diagnosis Based on Naive Bayesian Classifier and SVR. In Proceedings of the TENCON 2006–2006 IEEE Region 10 Conference, Hong Kong, China, 14–17 November 2006; IEEE: Piscataway, NJ, USA, 2006. [Google Scholar] [CrossRef]
Siva Sarma, D.V.S.S.; Kalyani, G.N.S. ANN approach for condition monitoring of power transformers using DGA. In Proceedings of the 2004 IEEE Region 10 Conference TENCON 2004, Chiang Mai, Thailand, 24 November 2004; Volume 3, pp. 444–447. [Google Scholar] [CrossRef]
Hu, J.; Zhou, L.; Song, M. Transformer Fault Diagnosis Method of Gas Hromatographic Analysis Using Computer Image Analysis. In Proceedings of the 2012 Second International Conference on Intelligent System Design and Engineering Application, Sanya, China, 6–7 January 2012; pp. 1169–1172. [Google Scholar] [CrossRef]
Seifeddine, S.; Khmais, B.; Abdelkader, C. Power transformer fault diagnosis based on dissolved gas analysis by artificial neural network. In Proceedings of the 2012 First International Conference on Renewable Energies and Vehicular Technology, Nabeul, Tunisia, 26–28 March 2012; pp. 230–236. [Google Scholar] [CrossRef]
Rajabimendi, M.; Dadios, E.P. A hybrid algorithm based on neural-fuzzy system for interpretation of dissolved gas analysis in power transformers. In Proceedings of the TENCON 2012 IEEE Region 10 Conference, Cebu, Philippines, 19–22 November 2012; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, G.; Yasuoka, K.; Ishii, S.; Yang, L.; Yan, Z. Application of fuzzy equivalent matrix for fault diagnosis of oil-immersed insulation. In Proceedings of the 1999 IEEE 13th International Conference on Dielectric Liquids (ICDL’99) (Cat. No. 99CH36213), Nara, Japan, 25 July 1999; pp. 400–403. [CrossRef]
Gouda, O.E.; Saleh, S.M.; El-hoshy, S.H. Power Transformer Incipient Faults Diagnosis Based on Dissolved Gas Analysis. Indones. J. Electr. Eng. Comput. Sci. 2016, 17, 10–16. [Google Scholar] [CrossRef]
El-Kenawy, E.S.M.; Mirjalili, S.; Ibrahim, A.; Alrahmawy, M.; El-Said, M.; Zaki, R.M.; Eid, M.M. Advanced Meta-Heuristics, Convolutional Neural Networks, and Feature Selectors for Efficient COVID-19 X-Ray Chest Image Classification. IEEE Access 2021, 9, 36019–36037. [Google Scholar] [CrossRef]
Ibrahim, A.; Mirjalili, S.; El-Said, M.; Ghoneim, S.S.M.; Al-Harthi, M.M.; Ibrahim, T.F.; El-Kenawy, E.S.M. Wind Speed Ensemble Forecasting Based on Deep Learning Using Adaptive Dynamic Optimization Algorithm. IEEE Access 2021, 9, 125787–125804. [Google Scholar] [CrossRef]
Takieldeen, A.E.; El-kenawy, E.S.M.; Hadwan, M.; Zaki, R.M. Dipper Throated Optimization Algorithm for Unconstrained Function and Feature Selection. Comput. Mater. Contin. 2022, 72, 1465–1481. [Google Scholar] [CrossRef]
El-kenawy, E.S.M.; Mirjalili, S.; Ghoneim, S.S.M.; Eid, M.M.; El-Said, M.; Khan, Z.S.; Ibrahim, A. Advanced Ensemble Model for Solar Radiation Forecasting using Sine Cosine Algorithm and Newton’s Laws. IEEE Access 2021, 9, 115750–115765. [Google Scholar] [CrossRef]
Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
Taha, I.B.; Dessouky, S.S.; Ghoneim, S.S. Transformer fault types and severity class prediction based on neural pattern-recognition techniques. Electr. Power Syst. Res. 2021, 191, 106899. [Google Scholar] [CrossRef]
Confalonieri, R.; Bellocchi, G.; Bregaglio, S.; Donatelli, M.; Acutis, M. Comparison of sensitivity analysis techniques: A case study with the rice model WARM. Ecol. Model. 2010, 221, 1897–1906. [Google Scholar] [CrossRef]

Figure 1. Architecture of LSTM Neural Network.

Figure 2. Sigmoid function to scale the output solutions to binary [0, 1].

Figure 3. Proposed feature selection and classification model based on the proposed GSDTO algorithm.

Figure 4. Feature selection comparison of the presented and compared algorithms based on convergence curve.

Figure 5. The bGSDTO and compared algorithms’ average error versus objective function.

Figure 6. Residual, QQ, homoscedasticity plots, and heat map of the bGSDTO and compared algorithms.

Figure 7. Comparison of the presented and compared algorithms convergence curves based LSTM model.

Figure 8. MSE for the presented and compared algorithms based LSTM model.

Figure 9. Histogram for the presented and compared algorithms based LSTM model.

Figure 10. ROC curve for the presented GSDTO with PSO and WOA algorithms based LSTM model.

Figure 11. Residual, QQ, homoscedasticity plots, and heat map of the presented and compared algorithms based on LSTM model.

Figure 12. The GSDTO’s parameters convergence time and fitness versus objective function.

Table 1. Distribution of the 386 training samples according to the fault types and the references.

Ref.	PD	D1	D2	T1	T2	T3	Total
[8]	2	0	0	3	0	0	5
[9]	9	24	48	0	0	18	99
[25]	0	2	1	1	3	1	8
[38]	27	42	55	70	18	28	240
[39]	1	0	5	2	0	1	9
[40]	3	0	4	4	3	5	19
[41]	1	1	2	1	0	1	6
Total	43	69	115	81	24	54	386

Table 2. The distribution of the 74 testing samples according to the fault types and the references.

Ref.	PD	D1	D2	T1	T2	T3	Total
[9]	1	6	8	1		1	17
[38]	6	6	11	11	3	2	39
[40]			2	1			3
[41]						1	1
[42]						2	2
[43]			1				1
[44]		1	1	1			3
[45]						1	1
[46]				2	1	3	6
[47]			1				1
Total	7	13	24	16	4	10	74

Table 3. Configuration parameters of the GSDTO algorithm.

Parameter (s)	Value (s)
# Agents	10
# Iterations	80
# Repetitions	20
Dimension	# features
C	$[0, 2]$
R	$[0, 1]$
$α$ of $F_{n}$	0.99
$β$ of $F_{n}$	0.01

Table 4. Configuration parameters of the compared algorithms.

Algorithm	Parameter (s)	Value (s)
PSO	$W_{m a x}$ , $W_{m i n}$	[0.9, 0.6]
	$C_{1}$ , $C_{2}$	[2, 2]
GWO	a	2 to 0
WOA	a	2 to 0
	r	[0, 1]
BBO	Habitat modification Probability	1.0
	Mutation Probability	0.05
	Immigration Probability	[0, 1]
	Migration rate	1.0
	Max immigration	1.0
	Step size	1.0
BA	Pluse rate	0.5
	Loudness	0.5
	Frequency	[0, 1]
GA	Crossover	0.9
	Mutation ratio	0.1
	Mechanism of Selection	Roulette wheel
FA	# Fireflies	10

Table 5. Feature selection performance metrics.

Metric	Value
Average Error	$1 - \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{N} \sum_{i = 1}^{N} M a t c h (C_{i}, L_{i})$
Average Select Size	$\frac{1}{M} \sum_{j = 1}^{M} \frac{s i z e (g_{j}^{*})}{D}$
Average Fitness	$\frac{1}{M} \sum_{j = 1}^{M} g_{j}^{*}$
Best Fitness	$M i n_{j = 1}^{M} g_{j}^{*}$
Worst Fitness	$M a x_{j = 1}^{M} g_{j}^{*}$
Standard Deviation	$\sqrt{\frac{1}{M - 1} \sum {(g_{j}^{*} - M e a n)}^{2}}$

Table 6. Presented bGSDTO and compared algorithms feature selection results.

	bGSDTO	bGWO	bPSO	bBA	bWOA	bBBO	bFA	bGA
Average error	0.1969	0.2141	0.2479	0.2575	0.2477	0.2161	0.2463	0.2277
Average Select size	0.1497	0.3497	0.3497	0.4891	0.5131	0.5135	0.3842	0.2921
Average Fitness	0.2601	0.2763	0.2747	0.2976	0.2825	0.2804	0.3266	0.2877
Best Fitness	0.1619	0.1966	0.255	0.1873	0.2466	0.2701	0.2453	0.191
Worst Fitness	0.2604	0.2635	0.3227	0.2889	0.3227	0.3566	0.3429	0.3061
Standard deviation Fitness	0.0824	0.0871	0.0865	0.0964	0.0887	0.1314	0.1233	0.0887

Table 7. Presented bGSDTO versus compared algorithms ANOVA test results.

	SS	DF	MS	F (DFn, DFd)	p Value
Treatment (between columns)	0.06262	7	0.008946	F (7, 152) = 173.9	p < 0.0001
Residual (within columns)	0.007819	152	0.00005144	-	-
Total	0.07044	159	-	-	-

Table 8. Presented bGSDTO and compared algorithms Wilcoxon Signed-Rank test results.

	bGSDTO	bGWO	bPSO	bBA	bWOA	bBBO	bFA	bGA
Theoretical median	0	0	0	0	0	0	0	0
Actual median	0.1969	0.2141	0.2479	0.2575	0.2477	0.2161	0.2463	0.2277
# Values	20	20	20	20	20	20	20	20
Wilcoxon Signed Rank Test
Signed ranks’ sum	210	210	210	210	210	210	210	210
Positive ranks’ sum	210	210	210	210	210	210	210	210
Negative ranks’ sum	0	0	0	0	0	0	0	0
P value (two tailed)	<0.0001	<0.0001	<0.0001	<0.0001	<0.0001	<0.0001	<0.0001	<0.0001
Exact or estimate?	Exact	Exact	Exact	Exact	Exact	Exact	Exact	Exact
Significant (alpha = 0.05)?	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Discrepancy	0.1969	0.2141	0.2479	0.2575	0.2477	0.2161	0.2463	0.2277

Table 9. Parameters of basic classification models.

Classifier	Parameter (s)	Value (s)
NN	beta_1	0.6
	beta_2	0.899
	epsilon	1 × 10 $^{- 6}$
	validation_fraction	0.1
	learning_rate_init	0.007
	hidden_layer_sizes	20
k-NN	leaf_size	20
	p	2
	n_neighbors	3
RF	min_weight_fraction_leaf	0.0
	min_samples_leaf	1
	n_estimators	40
	min_samples_split	2

Table 10. Classification results of the three basic models for the tested dataset.

Model	AUC	MSE
NN	0.744	0.080226
K-NN	0.7387	0.098999
RF	0.797	0.04887

Table 11. Proposed and compared algorithms classification results based on LSTM model.

	GSDTO+LSTM	WOA+LSTM	GWO+LSTM	GA+LSTM	PSO+LSTM
AUC	0.9826	0.957	0.943	0.934	0.949
MSE	0.00001413	0.000319	0.0002158	0.0007382	0.0004196

Table 12. Proposed and compared classifiers description.

	GSDTO+LSTM	WOA+LSTM	GWO+LSTM	GA+LSTM	PSO+LSTM
Number of values	20	20	20	20	20
Minimum	0.0000133	0.000124	0.000115	0.000641	0.000323
25% Percentile	0.0000133	0.000324	0.000215	0.000741	0.000423
Median	0.0000133	0.000324	0.000215	0.000741	0.000423
75% Percentile	0.0000133	0.000324	0.000215	0.000741	0.000423
Maximum	0.0000133	0.000424	0.0003315	0.0007854	0.000454
Range	0	0.0003	0.0002165	0.0001444	0.000131
10% Percentile	0.0000133	0.000324	0.000215	0.000741	0.000423
90% Percentile	0.0000133	0.000324	0.000215	0.000741	0.000423
Mean	0.0000133	0.000319	0.0002158	0.0007382	0.00042
Std. Deviation	0	0.00005104	0.00003521	$2.494 \times 10^{- 5}$	$2.38 \times 10^{- 5}$
Std. Error of Mean	0	0.00001141	$7.874 \times 10^{- 6}$	$5.576 \times 10^{- 6}$	$5.32 \times 10^{- 6}$
Coefficient of variation	0.000%	16.00%	16.32%	3.378%	5.666%
Geometric mean	0.0000133	0.000313	0.0002129	0.0007378	0.000419
Geometric SD factor	1	1.254	1.19	1.036	1.065
Harmonic mean	0.0000133	0.0003031	0.0002096	0.0007373	0.000418
Quadratic mean	0.0000133	0.0003229	0.0002185	0.0007386	0.00042
Skewness		−2.751	0.7003	−3.067	−3.734
Kurtosis		13.14	9.729	14.07	16.45
Sum	0.000266	0.00638	0.004317	0.01476	0.008391

Table 13. Proposed and compared classifiers Wilcoxon Signed-Rank Test results.

	GSDTO+LSTM	WOA+LSTM	GWO+LSTM	GA+LSTM	PSO+LSTM
Theoretical median	0	0	0	0	0
Actual median	0.0000133	0.000324	0.000215	0.000741	0.000423
# values	20	20	20	20	20
Wilcoxon Signed-Rank
Signed ranks’ Sum (W)	210	210	210	210	210
Positive ranks’ Sum	210	210	210	210	210
Negative ranks’ Sum	0	0	0	0	0
P value (two tailed)	<0.0001	<0.0001	<0.0001	<0.0001	<0.0001
Estimate or Exact?	Exact	Exact	Exact	Exact	Exact
Significant ( $α = 0.05$ )?	Yes	Yes	Yes	Yes	Yes
How big is the discrepancy?
Discrepancy	0.0000133	0.000324	0.000215	0.000741	0.000423

Table 14. The 74 testing samples diagnostic accuracy of the suggested algorithm compared to other techniques.

Fault Type	Samples	GSDTO+LSTM	Adaptive [2]	IEC-60599 [7]	IEC 60599 Modified [27]	Rog. Modified [27]
PD	7	100	100	28.57	100	100
D1	13	100	92.31	30.77	61.54	61.54
D2	24	95.83	91.67	41.67	87.5	79.17
T1	16	93.75	93.95	68.75	100	100
T2	4	100	100	75	100	100
T3	10	100	100	70	100	100
Overall	74	98.26	94.6	50	89.19	86.49
Fault Type	Samples	Duval [8,9]	NPR [53]	SVM [53]	Rog. 4 Ratios [6]
PD	7	42.86	100	85.71	14.29
D1	13	69.23	76.92	76.92	0
D2	24	75	87.5	91.67	50
T1	16	56.25	100	100	100
T2	4	0	75	25	25
T3	10	100	100	100	40
Overall	74	66.27	90.54	87.84	45.95

Table 15. Convergence time results for different values of the GSDTO’s parameters.

R-Parameter		Exploration Percentage		Population Size		Iterations Count		C-Parameter
Values	Time	Values	Time	Values	Time	Values	Time	Values	Time
0.05	3.353	5	3.257	10	0.5	10	0.662	0.1	3.371
0.1	3.145	10	3.717	20	1.523	20	0.244	0.2	3.328
0.15	3.018	15	3.442	30	3.095	30	1.146	0.3	3.262
0.2	3.503	20	3.249	40	4.933	40	2.038	0.4	3.246
0.25	3.251	25	3.048	50	6.28	50	2.935	0.5	3.302
0.3	3.114	30	3.049	60	7.704	60	3.848	0.6	3.259
0.35	3.071	35	3.044	70	9.253	70	4.786	0.7	3.289
0.4	3.06	40	3.081	80	10.817	80	6.649	0.8	3.22
0.45	3.037	45	3.039	90	12.324	90	7.463	0.9	3.285
0.5	3.041	50	3.039	100	13.881	100	10.226	1	3.304
0.55	3.028	55	3.059	110	15.439	110	11.123	1.1	3.259
0.6	3.037	60	3.067	120	17.962	120	11.995	1.2	3.208
0.65	3.129	65	3.058	130	19.292	130	13.011	1.3	3.278
0.7	6.341	70	3.034	140	20.21	140	13.798	1.4	3.277
0.75	6.272	75	3.041	150	21.582	150	15.892	1.5	3.226
0.8	6.082	80	3.048	160	23.173	160	16.864	1.6	3.213
0.85	4.279	85	3.042	170	25.767	170	17.083	1.7	3.218
0.9	4.57	90	3.06	180	30.937	180	17.872	1.8	3.211
0.95	3.34	95	3.058	190	35.502	190	19.415	1.9	3.195
1	3.794	95	3.042	200	35.077	200	20.267	2	3.186

Table 16. Minimum Fitness results for different values of the GSDTO’s parameters.

R-Parameter		Exploration Percentage		Population Size		Iterations Count		C-Parameter
Values	Fitness	Values	Fitness	Values	Fitness	Values	Fitness	Values	Fitness
0.05	−11.2816	5	−9.1386	10	−9.6506	50	−7.5286	0.1	−8.0656
0.1	−11.2816	10	−9.1376	20	−12.3506	100	−6.9926	0.2	−8.0656
0.15	−11.8186	15	−11.2846	30	−11.2826	150	−8.6006	0.3	−6.9926
0.2	−11.8206	20	−10.7486	40	−11.2856	200	−8.6026	0.4	−8.0626
0.25	−11.2836	25	−11.2806	50	−10.2126	250	−8.0656	0.5	−8.0656
0.3	−11.2846	30	−10.7486	60	−12.3586	300	−8.0656	0.6	−6.9926
0.35	−10.7476	35	−10.7486	70	−12.3586	350	−9.6756	0.7	−8.0656
0.4	−11.2856	40	−11.2856	80	−12.3586	450	−8.0656	0.8	−8.0616
0.45	−10.7486	45	−11.2846	90	−11.8226	500	−9.1386	0.9	−6.9926
0.5	−11.2846	50	−10.2126	100	−12.3586	650	−10.2126	1	−8.0656
0.55	−11.2846	55	−11.2856	110	−12.3586	700	−8.6026	1.1	−8.0656
0.6	−12.3556	60	−11.8206	120	−12.3586	750	−10.2126	1.2	−8.0656
0.65	−11.2846	65	−11.2836	130	−12.3586	800	−10.7486	1.3	−9.1386
0.7	−10.7476	70	−11.2846	140	−12.3586	850	−9.1386	1.4	−10.2116
0.75	−10.7466	75	−11.2856	150	−12.3586	900	−9.6756	1.5	−11.2836
0.8	−12.3546	80	−10.7476	160	−12.3586	950	−10.7486	1.6	−9.1386
0.85	−11.2366	85	−12.3556	170	−12.3586	1000	−10.2126	1.7	−10.2116
0.9	−11.8096	90	−11.2716	180	−12.3586	1050	−10.7486	1.8	−12.3576
0.95	−12.3196	95	−11.2746	190	−12.3586	1150	−9.6756	1.9	−11.2846
1	−12.3306	95	−11.8096	200	−12.3586	1200	−10.2126	2	−12.3586

Table 17. Results of regression analysis for the GSDTO’s parameters.

	Convergence Time		Minimum Fitness
Parameters	R Square	Significance F	R Square	Significance F
R-Parameter	$7.85 \times 10^{- 1}$	$1.44 \times 10^{- 3}$	$8.85 \times 10^{- 1}$	$9.16 \times 10^{- 4}$
Exploration Percentage	$6.03 \times 10^{- 1}$	$1.67 \times 10^{- 5}$	$6.40 \times 10^{- 1}$	$3.91 \times 10^{- 3}$
Population Size	$8.06 \times 10^{- 1}$	$1.70 \times 10^{- 9}$	$9.80 \times 10^{- 1}$	$4.59 \times 10^{- 3}$
Iterations Count	$8.23 \times 10^{- 1}$	$1.70 \times 10^{- 11}$	$7.44 \times 10^{- 1}$	$1.44 \times 10^{- 5}$
C-Parameter	$4.06 \times 10^{- 1}$	$1.24 \times 10^{- 6}$	$9.57 \times 10^{- 1}$	$9.16 \times 10^{- 4}$

Table 18. ANOVA test results for the GSDTO’s parameters of Convergence time and Fitness.

	SS	DF	MS	F (DFn, DFd)	p Value
Treatment (between columns)	11.54	9	1.282	F (9, 190) = 2483	p < 0.0001
Residual (within columns)	0.0981	190	0.0005163	-	-
Total	11.64	199	-	-	-

Table 19. T-test with one tail was run at a significance level of 0.05 for different values of the GSDTO’s parameters.

	Convergence Time
	R-Parameter	Exploration Percentage	Population Size	Iterations Count	C-Parameter
Theoretical mean	0	0	0	0	0
Actual mean	0.7775	0.593	0.8025	0.828	0.421
Number of values	20	20	20	20	20
One sample t test
t, df	t = 93.03, df = 19	t = 48.00, df = 19	t = 72.62, df = 19	t = 93.97, df = 19	t = 38.47, df = 19
p value (two tailed)	<0.0001	<0.0001	<0.0001	<0.0001	<0.0001
p value summary	****	****	****	****	****
Significant (alpha = 0.05)?	Yes	Yes	Yes	Yes	Yes
How big is the discrepancy?
Discrepancy	0.7775	0.593	0.8025	0.828	0.421
SD of discrepancy	0.03738	0.05525	0.04942	0.0394	0.04894
SEM of discrepancy	0.008357	0.01235	0.01105	0.008811	0.01094
95% confidence interval	0.7600 to 0.7949	0.5671 to 0.6189	0.7794 to 0.8256	0.8096 to 0.8464	0.3981 to 0.4439
R squared (partial eta squared)	0.9978	0.9918	0.9964	0.9979	0.9873
	Minimum Fitness
	R-Parameter	Exploration Percentage	Population Size	Iterations Count	C-Parameter
Theoretical mean	0	0	0	0	0
Actual mean	0.885	0.6412	0.9764	0.759	0.937
Number of values	20	20	20	20	20
One sample t test
t, df	t = 122.0, df = 19	t = 87.20, df = 19	t = 188.8, df = 19	t = 57.81, df = 19	t = 80.10, df = 19
p value (two tailed)	<0.0001	<0.0001	<0.0001	<0.0001	<0.0001
p value summary	****	****	****	****	****
Significant (alpha = 0.05)?	Yes	Yes	Yes	Yes	Yes
How big is the discrepancy?
Discrepancy	0.885	0.6412	0.9764	0.759	0.937
SD of discrepancy	0.03244	0.03289	0.02313	0.05871	0.05231
SEM of discrepancy	0.007255	0.007353	0.005172	0.01313	0.0117
95% confidence interval	0.8698 to 0.9002	0.6258 to 0.6566	0.9656 to 0.9872	0.7315 to 0.7865	0.9125 to 0.9615
R squared (partial eta squared)	0.9987	0.9975	0.9995	0.9943	0.997

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

El-kenawy, E.-S.M.; Albalawi, F.; Ward, S.A.; Ghoneim, S.S.M.; Eid, M.M.; Abdelhamid, A.A.; Bailek, N.; Ibrahim, A. Feature Selection and Classification of Transformer Faults Based on Novel Meta-Heuristic Algorithm. Mathematics 2022, 10, 3144. https://doi.org/10.3390/math10173144

AMA Style

El-kenawy E-SM, Albalawi F, Ward SA, Ghoneim SSM, Eid MM, Abdelhamid AA, Bailek N, Ibrahim A. Feature Selection and Classification of Transformer Faults Based on Novel Meta-Heuristic Algorithm. Mathematics. 2022; 10(17):3144. https://doi.org/10.3390/math10173144

Chicago/Turabian Style

El-kenawy, El-Sayed M., Fahad Albalawi, Sayed A. Ward, Sherif S. M. Ghoneim, Marwa M. Eid, Abdelaziz A. Abdelhamid, Nadjem Bailek, and Abdelhameed Ibrahim. 2022. "Feature Selection and Classification of Transformer Faults Based on Novel Meta-Heuristic Algorithm" Mathematics 10, no. 17: 3144. https://doi.org/10.3390/math10173144

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Selection and Classification of Transformer Faults Based on Novel Meta-Heuristic Algorithm

Abstract

1. Introduction

2. Materials and Methods

2.1. Distribution of the Data

2.2. Machine Learning

2.3. Dipper Throated Optimization (DTO)

2.4. Gravitational Search Algorithm (GSA)

3. Proposed Methodology

3.1. Proposed GSDTO Algorithm

3.2. Proposed Binary GSDTO Algorithm

3.3. Proposed GSDTO+LSTM Based Model

4. Experimental Results

4.1. Feature Selection Scenario

4.2. Classification Scenario

4.3. Validation and Discussion

5. Sensitivity Analysis of the GSDTO Parameters

5.1. One-at-a-Time Sensitivity Analysis

5.2. Regression Analysis

5.3. Statistical Significance

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI