Pattern Recognition Based Auto-Reclosing Scheme Using Bi-Directional Long Short-Term Memory Network

Precise knowledge of secondary arc extinction instant and fault nature (temporary or permanent) is necessary for auto-reclosing after a single line-to-ground fault. Existing intelligent reclosing schemes rely on the extraction of appropriate features using a signal processing module (SPM) during online data monitoring. The value of features varied greatly under different operating scenarios as well as the computational burden is greatly enhanced owing to SPM which significantly impacts the performance of the auto-reclosing scheme. Hence, in this study bi-directional long short-term memory (Bi-LSTM) network is designed which integrates feature extraction and classification process. Thus, the proposed scheme is directly incorporated into the incoming voltage data without using any SPM/ filtering technique. The open-source test system provided by the developers of Hydro-Quebec, Canada is used for training and testing. Around 4860 different signals are collected by varying power system parameters and secondary arc conditions to develop dataset A. The Bi-LSTM model is tested under no noise, low noise of SNR 30, and high noise of SNR 10. To ensure the efficacy of the proposed scheme, uni-directional long short-term memory (U-LSTM), gated recurrent unit (GRU), and machine learning models are also trained on the same dataset. Later, for validation, a second dataset B is developed by varying surge impedance loading, frequency-dependent transmission lines, and arc resistance. Then the efficiency of pre-trained artificial neural networks (ANNs) is validated on this unseen dataset. The testing and validation on both datasets confirm superior efficiency of Bi-LSTM in comparison to U-LSTM, GRU, and other models.


I. INTRODUCTION
The statistics have depicted that approximately 80% of the transmission line faults are single phase-to-ground (SPL) faults with transient nature on high voltage transmission lines and occur owing to atmospheric discharges [1], [2]. The occurrence of such faults is followed by the opening of The associate editor coordinating the review of this manuscript and approving it for publication was Arturo Conde . circuit breakers (CBs) of the compromised phase. This phase is reclosed only after the fault removal and secondary arc extinction. Traditional relays have fixed dead time after the opening of CBs and reclosing attempt is performed after this time [3], [4]. However, the extinction instant of secondary arc is highly variable and dependent on meteorological factors i.e., wind, snow, and humidity, etc., resulting in a high probability of unsuccessful reclosing owing to fixed dead time. These unsuccessful reclosing attempts severely disturb the transient stability of power system equipment [5]. Hence, it is important to recognize the fault type (temporary or permanent) and precise interval of arc extinction for taking the reclosing decision [6].
Over the past few years, several researchers have presented single-phase-auto-reclosing (SPAR) schemes to minimize power outages and ensure system reliability. In [7], the total harmonic distortion (THD) of the faulty phase voltage is compared with the threshold after a pre-set time interval to recognize the fault nature. It is observed that in the case of temporary fault the amplitude of THD is higher as compared to the permanent fault which assists in taking decisions. In [8], wavelet packet transform (WPT) algorithm is used in the signal processing module (SPM) during online data monitoring (ODM). That SPM is used for the extraction of harmonics from compromised phase voltage signal. Then energy coefficients are developed from those extracted harmonics which are further compared with an adaptive threshold to recognize fault nature. The authors of [9], used an unscented Kalman filter (UKF) in SPM to estimate voltage harmonics. These harmonics are used to develop two indices to monitor the extinction time of secondary arc and fault type. The synchro-squeezing wavelet transform (SWT) technique is used in [10], to filter the sub-synchronous components from the faulty phase voltage. These components have a very low frequency ranging from 5 Hz to 55 Hz and appear after the secondary arc extinction. In [11], harmonics present in uncompromised terminal voltage are initially magnified using time-time transform (TT-transform). These harmonics have high amplitude during secondary arc interval and are used to estimate the secondary arc extinction. Although the abovementioned harmonics-based schemes provide satisfactory results under discussed scenarios. However, amplitude of harmonics greatly varied under varying surge impedance loading (SIL) of the transmission line eventually increases risk of false decision making. Furthermore, during ODM process, high computational burden is offered by SPM during harmonics estimation which results in delayed reclosing.
Few researchers have used voltage/current phasor information from single or both ends of the transmission line for taking reclosing decisions [12]. The authors of [13], recognized fault type using single-ended voltage magnitude derivative of compromised phase. Further, the combination of voltage and angle derivatives is used to determine the quenching time of the secondary arc. The analysis of voltage phasors in the modal domain using the Clarke matrix is presented in [14]. That modal domain conversion of faulty terminal phasor is used in developing criteria which are eventually used to take reclosing decisions. In [15], a phasor measurement unit-based algorithm is developed. The voltage and current phasors are collected from both terminals of the transmission line through the communication channel. Then data before and after secondary arc extinction is compared to develop a criterion of reclosing. In [16], the authors developed a mathematical model of transmission line using voltage magnitude and angle values of healthy phases. That developed model is then compared with the actual voltage phasor collected from both ends of the transmission line. The observation has confirmed that estimated values are close to actual values after secondary arc extinction and assist in taking reclosing decision. Although voltage/current modelbased techniques provides suitable results in the absence of noise. However, as these models are developed without considering noise, hence high SNR can lower the model accuracy. Furthermore, the communication-based schemes required fast communication channel to take appropriate action without delay.
Researchers have also applied intelligent artificial neural networks (ANNs) to solve the issue of reclosing owing to their high accuracy and reliability. The authors of [17], presented particle filter and convolutional neural networkbased reclosing technique. In this research, initially nonlinear state space model is designed to estimate the harmonics of the compromised phase through the particle filter. Then dataset is developed by simulating various scenarios which are further used as an input to the convolutional neural network for training and testing purposes. This work reported detection accuracy of 100% under noise-free conditions and 95.9% under high SNR. In [18], the authors used discrete wavelet transform (DWT) along with long short-term memory (LSTM) to predict fault nature and the instant of secondary arc extinction. DWT is used as SPM to extract useful features which are further used by LSTM to classify the fault. Furthermore, one more LSTM is used to predict the secondary arc extinction. This technique reported precision of 99.20% in recognizing the fault. Similarly, least square error based digital filter is used in [19], to extract features from faulty phase signal. These features are input to the support vector machine (SVM) classifier for recognizing faults. This research did not report any performance accuracy. In [20], FFT and Prony analysis-based techniques are used to extract useful components from the faulty phase. These features are used by Levenberg Marquardt (LM) and error backpropagation (EBP) algorithms to train the network. Further, the Taguchi method is used for optimization in these algorithms. An accuracy of 99% is reported for LM and 94.79% for EBP. Even though these techniques provide high accuracy on the trained datasets. However, the accuracy of such schemes needs to be elaborated under unseen datasets. Moreover, these techniques have SPM to extract features from the voltage/current signals. These SPMs offer a high computational burden during online data monitoring and are responsible for reclosing delays.
As depicted through the literature review there are several efforts directed toward reliable single-pole auto reclosing, some challenges still exist, including low accuracy towards unseen circumstances, delayed detection under noisy/noisefree conditions owing to the filtering process, high computational complexity, and imprecise recognition of secondary arc extinction under unusual very short and long-time intervals. To bridge the research gaps, this paper introduces novel bi-directional long short-term memory (Bi-LSTM) networks. The following are key features of the proposed scheme.
1. The novel variant version of LSTM is designed which architecturally introduces forward and backward LSTM layers and is known to be bi-directional LSTM (Bi-LSTM) which solved complex non-linear unseen classification problems with superior accuracy. 2. The proposed Bi-LSTM scheme provides superior performance under no noise, low noise of SNR 30, and high noise of SNR 10. Also, to compare the performance with other ANN techniques, unidirectional long short-term memory (U-LSTM), gated recurrent unit (GRU), support vector machine (SVM), and decision tree (DT) are trained on the same dataset and then validated on another dataset. The performance metrics have shown the outstanding performance of Bi-LSTM on both datasets. 3. In contrast to previously existing ANN schemes, the proposed algorithm once trained does not require any feature extraction module before the ANN module for online grid data monitoring. Thus, computational cost and time delay involved during detection is significantly reduced. 4. The proposed scheme does not require any local/global thresholds, linear/non-linear filters, and communication modules for taking the decision.

II. PRELIMINARIES OF THE MATHEMATICAL MODEL FOR TRANSIENT AND PERMANENT FAULT A. PATTERNS OF POST-ARC INSTANTANEOUS TERMINAL VOLTAGE FOR TRANSIENT FAULT
The general architecture of the extra high voltage transmission line with shunt and the neutral reactor is depicted in Figure 1. During the normal state the instantaneous value of current and voltage at phase P is given as: where I P and V P depicts the peak value of phase P; φ is the angle and ω is the angular frequency. After the secondary arc is extinguished, the terminal voltage comprised of electrostatic coupling voltage from the other two healthy phases and electromagnetic voltage is very small that it is ignored. The relationship is given as [21], Here The compromised phase terminal voltage consists of many frequency oscillations whose amplitude is near to the power frequency component and phase opposite to it. The instantaneous value is given as: Eventually, the instantaneous value of the terminal voltage after secondary arc extinction is given as: Thus, specific patterns are observed after the secondary arc extinction as depicted in Figure 2.

B. INSTANTANEOUS TERMINAL VOLTAGE FOR PERMANENT FAULT
When the permanent fault occurs, the capacitance discharges immediately, and only electromagnetic coupling exists which is given as [21] V m = I Q + I R Z mut L (8) where I Q and I R are the currents of the corresponding phases in phase P. Z mut indicates the mutual inductance and L is  the distance from the measurement point to the fault point. Eventually, the faulty phase for permanent fault is given as In contrast to the temporary fault, only a single frequency component exists in the permanent fault as seen in Figure 3. These patterns present strong candidate to identify temporary fault with arc extinction and permanent fault.

III. BI-LSTM BASED RECLOSING MECHANISM A. THEORETICAL BACKGROUND OF BI-LSTM
The recurrent neural networks (RNNs) are capable to learn the implicit non-linear relationships present in the datasets by recurring processing the sequence input [22]. The traditional architecture of RNN with 'K' layers is depicted in Figure 4. The relationship between the data input x n at the time step n and the output O n is represented by the following equations where g n K denotes the activation function of the l th layer at n, and l = 1, 2, . . . ., K .b y and b x represents the bias terms, V K , W l and U n are the weight matrices, and h n l denotes l th layer sharing state vector. At every iteration, RNN parameters get updated to minimize the loss function L(O n , y n ), where y n indicates desired output. Although RNN learns the temporal features through its sharing feature in the state vector, however, it cannot capture long-term dependencies owing to vanishing gradient problems during backpropagation in the model training.
Keeping insight, into the limitations of RNN, an improved architecture of RNN known as LSTM is presented. LSTM overcomes the problem of vanishing gradient by introducing a memory unit cell in its structural model. This unit has the property to add or delete the new inputs. There are three controlling gates that are responsible for unit operation by controlling data flow [23]. The input gate takes previous net outputs and new necessary inputs, forget gate removes useless memories from the state vector. Lastly, the output gate determines the new output of the corresponding unit. Although limitations associated with RNN are addressed by LSTM however, it is capable to capture only forward dependencies. In addition, various researchers have proved that output is not the only product of preceding inputs but the result of complex correlations. In contrast to U-LSTM, Bi-LSTM incorporates two-way sequence learning, forward direction learning as well as backward direction learning to capture the irregularities and hidden features present in the input sequence data [24]. The forward LSTM takes the information of past data and the backward LSTM captures future dependencies and relations to use the information of time n − 1 and time n + 1 at an instant n.
Let the data sequence of input is x = {x 1 , x 2 . . . . . . .x n }, the hidden layer data sequence is h = {h 1 , h 2 . . . . . . .h n }, thus at time n, the computational process of the memory block is represented as: VOLUME 10, 2022 In contrast, Bi-LSTM output layer is dependent on the output at time n + 1, n + 2, n + 3, . . . ., n + N along with the input at time n and output at time n − 1, n − 2, n − 3, . . . ., n − N . In this Figure 5, w 1 are the matrices of the weight between the input and forward layer w xg , w xi .w xf and w xo . The w 2 consist of matrices of weight between the input and the backward layer w bk xg , w bk xi .w bk xf and w bk xo , w 3 consist of matrices of weight in the forward layer w hg , w hi .w hf and w ho , w 4 includes the matrices of weight in the backward layer w bk hg , w bk hi .w bk hf and w bk ho , w 5 is the matrices of weight between the forward layer and the output, w 6 indicates the matrices of the weight between backward layer and the output. w bk is the backward layer weight with same meaning as w in the forward layer. Similarly, the equations for the backward direction are given as: s bk_n = s o bk(n+1) f bk_n + i o bk_n g bk_n (21) o bk_n = σ (w bk_xo x n + w bk_ho h n+1 + b bk_o ) The output of the memory block contains the h bk n and h n that are also known to be the Bi-LSTM layers. This is the basic architecture of Bi-LSTM which is further optimized for the proposed auto-reclosing strategy as discussed in the next subsection.

B. PROPOSED BI-LSTM ARCHITECTURE FOR AUTO-RECLOSING SCHEME
The framework of auto-reclosing scheme is depicted in Figure 6. The first step consists of voltage data acquisition for training, testing and validation. The dataset A composed of . , x p signals, sampled at 3840 Hz, where Lis the total number of signals that is 4860, and x p represents the p th sample with value 4609 in the current research.
. , x p signals, where J represents the total number of signals which is 486 in the validation set. Both datasets are labeled by permanent fault as 0 and the temporary fault with arc extinguished as 1. Further, dataset A is segregated to train the proposed Bi-LSTM architecture. The first layer is comprised of an input sequence layer in which the length of the signals is provided. Following the input sequence layer is the Bi-LSTM layer with 850 neurons in the hidden layer. The cells and hidden states are updated by the hyperbolic tangent 'tanh' function assigned at the state activation level. Further, sigmoid function σ (c) = (1 + e −c ) −1 is applied as a gate activation function. In contrast to random weight initialization, glorot weight initializer is used to enhance the robustness and avoid the risk of exploding or vanishing gradient problems. This function also known as Xavier initializer recommends sampling the weights of the layer in a way that preserves the input variance, and it remains constant as information flows through the network. Further, a fully connected layer is linked which multiples the input matrix with the weight matrix and adds a bias vector. Later rectified linear unit (ReLU) layer activation function is used. This layer assists in handling non-linear and interaction effects. After that Batch normalization layer is connected which permits layers to learn independently of other layers thus reducing the sensitivity of network initialization and speed-up the training process. Afterward, the SoftMax activation function is applied which converts the set of numbers to the probabilities set with each corresponding to a relative scale and the sum of probabilities equals 1. Finally, the classification layer is used for recognizing temporary faults with arc extinction and permanent fault. The dropout layers are added to the architecture where necessary. U-LSTM, GRU, and other models are also trained on the same dataset and their performance evaluation is discussed in the next sections.

A. TEST SYSTEM UNDER STUDY
The test system is composed of generators with a rating of 4200 MVA, double circuit transmission with a line length of 200 km and carrying two shunt compensators with individual capacity of 200 MVAR each line. In Figure 7 X Gen is the generator, SC 1 , SC 2 etc., indicates the shunt compensators, EQ net signifies equivalent network with short circuit capacity of 20 GVA. The specifications of the test system for training, testing and validation are detailed in Table 1. Further arc model designed by the developers of Hydro Quebec and online available on the MATLAB website [25] is used to ensure efficacy of the proposed technique.

B. DATASET FOR OFFLINE TRAINING
In practical scenarios, SLG faults can occur at any TL length and at any system condition. Keeping in sight. dataset A is developed by changing fault location between 10% to 90%, shunt compensation 50% to 90% and secondary arc 0.1 to 100 as shown in Table 2.
In set A, the frequency-independent transmission line model which applies Bergeron's traveling wave method is simulated to collect data of temporary fault with arc extinguished (TFAE) and permanent fault (PF). The data collected in set A is also corrupted with low and high noise of SNR 30 and SNR 10 respectively to test the accuracy of the proposed scheme. The dataset is split into 90% training VOLUME 10, 2022  and 10% testing. The offline training of Bi-LSTM, U-LSTM, GRU and other machine learning models is performed on MATLAB using machine learning and deep learning toolbox. Further, the processor used is Intel R Core i5-3570 CPU @ 3.40 GHz 3.80 GHz and installed RAM of 16 GB.

C. TESTING ACCURACY OF THE MODEL
The testing accuracy of the Bi-LSTM model is depicted in Figure 8. As observed in the absence of noise, proposed model correctly identified all the labeled signals with a temporary fault with arc extinguished and permanent faults with 100% accuracy. Further, when the model is subjected to the SNR 30 dataset, the accuracy of TFAE remains 100% and the accuracy of PF reaches 98.76. Thus 1.23% of signals are misclassified under PF. In addition, under high noise of SNR 10, TFAE yield an accuracy of 98.76%, and PF correctly identifies 95.06% of signals. To elaborate further, performance metrics are computed from the following formulas and given in Figure 9.
The following are formulas for performance metrics on which accuracy, precision, recall, and F1-score are computed.   Table 1 is taken into consideration. In FDTLs, line parameters are converted to a rational model using the vector fitting algorithm. This model expresses asymmetric aerial as well as submarine lines with high accuracy. It is a built-in block present in Simulink which is used to implement the model.

2) SURGE IMPEDANCE LOADING (SIL)
Surge Impedance loading (SIL) is defined as the power which TLs carry when magnetic field energy owing to current equates to the static field energy because of voltage. It is a useful benchmark to analyze the load-carrying capability of TLs. In this research, dataset B consists of varying SIL information to validate proposed reclosing strategy.
The SIL of the line-2 is calculated by using available data of positive sequence capacitance and inductance of frequency-dependent transmission line (provided in Table 1) and line-to-line voltage, as follows:   So, line 1 has also SIL of 2122 MW. Hence theoretically maximum power transferred from both transmission lines is 4244 MW. Thus, the active power of the generator is adjusted in such a way that after computation of the load flow tool in power GUI the power transferred at the receiving end corresponds to 15%, 30%, 45%, 60%, 75%, and 90% SIL as given in Table 3.
The following are the cases taken in the validation dataset. In total 480 cases are simulated, and 6 cases are replicated to make equal to testing case as given in Table 4.

E. VALIDATION ACCURACY OF THE MODELS
The proposed Bi-LSTM model along with U-LSTM, GRU, SVM, and DT pre-trained on set A are evaluated on dataset B through accuracy, precision, recall, and F1-score. It is observed from Figure 10 that Bi-LSTM achieves a  dataset, however on the validation dataset they yield only 50% and 49.3% accuracy respectively. For the precision metric, the Bi-LSTM method achieves the best outcome with a 91.3% rate, followed by GRU and U-LSTM with rates of 83.7% and 77.14% respectively. SVM and DT fared worse with a precision rate of 50% and 49.38% respectively.
Further for the recall metric, Bi-LSTM, GRU, U-LSTM, and SVM attain a rate of 100% and DT (49.3%). Finally, for the F1-score, Bi-LSTM outperforms all other models and yields 95.4%, as compared to 91.1% for GRU and 87.09% score for U-LSTM. Again, SVM and DT show significantly poor performance and yield only 66.6% and 49.38% scores respectively. The confusion matrix for individual accuracy is also provided in Figure 11. These results depicted that the performance metrics of SVM, and DT significantly degraded under unseen circumstances. This is because the proposed approach for the models is to automatically extract features from the raw data signal. However, these two discussed models depict significantly lower performance owing to a lack of capturing temporal and hidden dependencies in the data. Similarly, U-LSTM and GRU provide better performance than SVM and DT however, the score was unsatisfactory. Compared to all, Bi-LSTM provides better performance owing to the property of capturing long-term temporal dependencies in forward as well as backward direction.

F. DISCUSSION AND COMPARISON
The comparison of the proposed technique is performed with several techniques as depicted in Table 5. It is observed that the proposed strategy has no SPM computational burden in comparison to all other mentioned techniques. This computational burden appears while online data monitoring and severely affect the performance of the reclosing technique. Further, according to the authors' best knowledge, no scheme in the literature has performed validation of classifiers on an unseen dataset, that is trained on a different dataset. This confirms that proposed model can handle a large variety of transmission lines having different characteristics without needing training for each transmission system. In addition, 100% testing accuracy and F1-score is shown in the absence of noise. The proposed model works does not require any communication channel or threshold which depicts its promising reliability for the modern power systems.

V. CONCLUSION
In this paper, a bi-directional long short-term memory network is proposed for precise detection of secondary arc extinction and recognizing the type of fault. The results have shown that Bi-LSTM has shown 100% overall accuracy under no-noise, 99.38% under SNR 30, and 96.91% under SNR 10. Further to ensure the efficacy of the proposed scheme, Bi-LSTM, U-LSTM, GRU, SVM, and DT were trained on dataset A and evaluated on dataset B with different system characteristics. The results have shown that Bi-LSTM achieved an overall accuracy of 95.27% in comparison to 90.33% yield of GRU, 85.19% of U-LSTM, 50%, and 49.4% of SVM and DT respectively. Further, the following are the superiority of the proposed scheme in comparison to other schemes • No need for signal decomposition/feature extraction technique during online monitoring of signal.
• Independence from setting the thresholds.
• High performance under high noise as well as under low noisy conditions. Furthermore, no linear/nonlinear filter is required for signal filtering under noisy conditions.
• Low computational complexity enhances the speed of fault detection.
• Performs well under unseen data of a wide range of scenarios i.e., surge impedance loading, arc resistances, and transmission line models.
• The scheme uses data from the single end of the transmission line, thus no communication channel is required.