Bayesian Deep Neural Network to Compensate for Current Transformer Saturation

Current transformer saturation has a negative effect on the operation of IEDs, resulting in their malfunction. Here, we present a technique to compensate for saturated waveforms using Bayesian Deep Neural Network (BDNN) comprising Deep Neural Network (DNN) and Bayesian optimization (BO). DNN, which utilizes stacked denoising autoencoder (SDAE) and Backpropagation (BP), is employed to optimize deep learning structure. Unlike the conventional neural network, which is a shallow network or random-initialize weights, the SDAE calculates optimal weights for each hidden layer and BP uses them to fine-tune which yields results with high performance for CT saturation compensation. To improve the empirical search of training hyperparameters, Bayesian optimization is adopted to decide training-related vectors such as batch size, learning rate, and number of neurons. Finally, the performance of the proposed approach was evaluated on an overhead transmission line which is imported from PSCAD/EMTDC with the different scenarios of fault inception angle, remnant flux, and voltage system. Therefore, numerical cases of saturation were comprehensively evaluated to demonstrate the performance of the proposed algorithm. A comparative analysis was shown to demonstrate that the proposed BDNN is superior to artificial neural network (ANN), and least square error (LES) technique.


NOMENCLATURE
The main parameters used throughout this paper are summarized here.

I. INTRODUCTION
Current transformers (CTs) are used to convert a highvoltage primary current to a relatively lower-voltage secondary current that can be easily read and isolate the circuit from fault conditions. Reference [1] presents a detailed discussion of CT functionality and provides many useful examples. Fault current, system complexity, and size may rise due to the expansion of the interconnected power systems. Therefore, CT saturation is inevitable because of the insignificant increase in fault current. CT saturation is a significant problem for power systems as it provides the incorrect current magnitude to CT. High fault currents, X/R ratio, and remnant flux are mostly contributing factors to CT saturation which in turn may lead to malfunctions in the protective relay [10]. Therefore, a robust CT saturation compensation scheme is required to clear it rapidly and reliability in order to avoid the system disturbance and equipment from damage. This problem has been addressed in many approaches in recent years [2][3][4][5][6][7][8][9][10][11][12][13][14]; one such algorithm is a conventional neural network. The undistorted waveform was reproduced by Artificial neural network (ANN) from the saturated behavior; however, this approach was merely focused on saturation compensation without considering the remnant flux [2]. Another ANN-based method in [3] for CT saturation compensation was presented which included the remnant flux in the core and other influence parameters for CT saturation. However, the training structure in [2,3] was empirically decided that is very time-consuming. A modified version of Adaptive Neuro Fuzzy Inference System (ANFIS) with least-square method and gradient descent, was applied on CT saturation to compensate saturated fraction [4]. However, the drawback of this algorithm is computation burden when there is large input dimension. In [5], CT saturation was successfully removed by estimating the magnetizing current from the negative value of the second difference function and adding it to the measured secondary current. However, this approach is limited by its reliance on the magnetization curve of a specific CT. A hybrid algorithm comprising the partial sum (PS) and multistage least-square (MLS) methods was proposed to address for DC offset and CT saturation problems, and was able to successfully solve these issues with a quicker response time [6]. Alternatively, two-level compensation filters can be used to compensate for saturation effects and reproduce an undistorted waveform from fault current and an inrush current with a low error [7]. The integration CT saturation detection and compensation was proposed in [8] based on sample-based extraction from the identified unsaturated samples using Kalman filter and simply reconstructed with wave shape properties and fault current characteristics. Another compensation approach utilized a least-error square (LES) filter to estimate the phasor parameters of the CT secondary current, and CT burden [9]. In [10], the unsaturated portions are extracted from distorted secondary current and then utilize the least square curve fitting method to estimate parameters for compensating CT saturation. However, due linearization by Taylor series expansion, it produces some error when the time constant is small. The application of the wavelet transforms for compensating the saturated signal was proposed in [11]. Extended Kalman filter was used to detect and compensate CT saturation by using the current sample points during the unsaturated regions to estimate an appropriate model [12]. With this estimated model, it reconstructed the signal from unsaturated behavior. The LES filter and lookup table (LUT) was jointly applied to solve DC offset and CT saturation problem which renders superb results in the presence of noise and harmonic [13]. The author of [14] proposed a new algorithm to detect and compensate for CT saturation based on derivative of secondary current and Newton's backward difference. However, this algorithm is really relied on low-pass filter characteristic to detect saturation.
Recent advances in deep learning has been widely used in power systems and energy applications because of its robustness, speed, and powerful learning capacity. As proposed in Reference [18], DC offset was generated with the random noise and several harmonics. Different scenarios such as voltage level, and inaccurate time constant, were considered to display the performance of the proposed method. Thus, results outperform the conventional filters in terms of speed and accuracy. The author of [19] used the unsupervised learning feature extraction to classify severity of CT saturation and it yielded superb results compared to conventional regression methods. Reference [20] highlighted a combining model of unsupervised feature learning and convolution neural network (CNN) to detect and classify faults based on the three-phase voltage and current signals. The discrimination between inrush current and magnetizing current in power transformer using the CNN, fast GRNN, and CLGNN is proposed in [21][22][23][24]. Although the method in [24] is useful to compensate for CT saturation, it requires detecting the start and end of CT saturation prior to compensation.
To the best of our knowledge, SDAE has rarely been applied on energy and power systems. One related work is a SDAE model for CT saturation compensation with the empirical selection of the training hyperparameters studied in [17]. This paper developed DBNN-based method which uses deep neural network (DNN) and Bayesian optimization (BO) to expand the capability of compensating for CT saturation. On the basis, DNN's structure is constructed based on SDAE in an unsupervised manner to extract the feature of the input during the pre-training and initialize the appropriate weights for the structures. The reconstruction of unsaturated waveform is obtained by performing backpropagation on trained model from SDAE in a supervised manner with a given label to compensate for CT saturation. Moreover, the training hyperparameters are optimized via BO to reduce difficulty for the empirical selection of training hyperparameters [29]. Lastly, we validate BDNN on simulated data from PSCAD/EMTDC with real power system parameters on a typical Korean transmission line and compared with conventional methods (ANN, and LES), study the impact of BDNN under various CT saturation conditions, and compare with other intelligent methods and conventional methods from Ref. [10]. To summarize, this paper makes the following contributions: • Propose a BDNN framework for CT saturation compensation for the first time using SDAE to extract feature that can accurately correct the distorted waveform and handle noise problem without using low-pass filter. • Adopt Bayesian optimization to search for training hyperparameters.
• Compensate for CT saturation without the detection of CT saturation. • Provide a consistency and high accuracy on different voltage in power systems The remainder of this paper is organized as follows. Section II reviews the CT saturation literature, Section III describes the proposed compensation method which comprises of adopting pre-training and fine-tuning, and data preparation. In section IV, the simulation results and performance evaluation were presented through many cases of the problem after training and the comparative study with other methods is also displayed. Section V presents concluding remarks.

II. PROBLEM STATEMENT
This section first highlights the CT saturation formulation and then introduces CT saturation datasets generation for training procedure.

Figure 1. A Simplified equivalent circuit of CT
A simplified equivalent circuit of a CT with a pure resistive burden is given in Fig. 1. Fig. 2 shows an example of CT saturation with primary, secondary and magnetizing current. Under the normal operation, the magnetizing current is negligible because the exciting voltage is less than kneepoint. When the exciting voltage exceeds the knee-point, CT begins to experience saturation which negatively causes the magnetizing current to increase according to the B-H hysteresis curve of the CT core. Therefore, primary and secondary current are no longer proportionally equal due to the induced magnetizing current. The saturation severity is determined by the excitation current magnitude, and the saturation duration is a function of the X/R ratio. In this paper, we simplified the equations for CT saturation from [6]. The fault inception angle determines whether the saturation is positive or negative. The primary current is the sum of iAC and iDC, and it can be expressed as follows: λ(t) is proportional to iS and it can be expressed as follows. By using the magnetizing curve, (2) can be simply rewritten as: However, the value iM(t0) is approximately zero before saturation begins as illustrated in Fig. 2 before fault occurs and iM(t) at the first saturation instant can be subsequently given by: iS is the difference between iP and iM and it is given in (5).

B. CT SATURATION DATASET FOR TRAINING PROCEDURE
BDNN approach is a data-dependent algorithm which requires many training datasets to obtain best results. Saturated and unsaturated data are generated by welldocumented sheet from the IEEE Power System Relaying Committee (PSRC) [26]. Core-induced flux and magnetizing current are thoroughly computed in the excel sheet. Different saturation cases were obtained by modifying some key parameters (primary fault current, load resistance, X/R ratio, DC offset units, and remnant flux). The saturation becomes severe at the fault inception angle of 0 0 , and remnant flux of 60%, and lasts longer when X/R is 18. The training parameters in Table 1 are used to generate training input samples and it includes many types of saturation ranged from light to severe saturation. The training data for this study is 15680 saturation cases which accumulates approximately 11038720 datasets. The saturated current with random noise is used as an input, and the unsaturated current is used as the network label. Moreover, Anoise is added to the input Asignal and its magnitude is generated by modifying a signal-to-noise ratio (SNR), which can be expressed in Equation (6). Harmonic from 2 nd to 5 th order was also considered in the training dataset and its magnitudes are shown in Table 1.

A. THE FRAMEWORK OF DENOISING AUTOENCODER
A denoising autoencoder is an unsupervised ANN which uses nonlinear feature extraction to reconstruct clean inputs from noisy inputs, and efficiently compress and decodes data [27]. In a simple autoencoder, input x ∈ Rn (x0, x1, x2, …, xn) includes in the training dataset. The input is then encoded to low dimension and it is restored to its original structure in the decoding part. The training uses the BP to minimize the reconstruction error between input and output until desired epoch is reached. After the training convergence, the encoder model and extracted features (f1, f2, …, fn) are saved which is used to train other autoencoders. Given the input x, the output vector of the autoencoder x is mathematically expressed as follows.
where fen is Leaky ReLu activation function for encoding layer. Linear function is the decoding layer activation function fde for regression task. By incorporating Adam optimizer, the training can be proceeded. Root mean square error (RMSE) is used to minimize the error between the input and output.
To yield the result from autoencoder, we minimize cost function L by iteratively updating weights and bias values by using backpropagation [25]. The result of autoencoder is obtained when L is converged to a certain iteration.

B. ESTABLISHMENT OF PROPOSED BDNN
Traditionally, achieving deeper neural network structure was a very challenging task because the computation error in backpropagation process tends to dramatically increase when the structure becomes very complex (vanishing gradient problem). To strengthen the conventional neural network, authors of [15,16] presented a new way of network training by adopting an unsupervised pre-training instead of manually selection the neural network parameters. The algorithm employed a layer-by-layer unsupervised learning based on the deep belief network (DBN) in which an unsupervised greedy layer-wise training was proposed to provide an optimization for the deeper structures. All layer parameters that are initialized during pre-training are tuned in the final stage in order to significantly achieve great results. Instead of using DBN, SDAE is adopted to construct deeper networks and it yields improved results [27]. Unlike conventional deeper networks, SDAE is used to reduce the complexity of error estimation by forming hidden layer one at a time. In a deeper structure, training efficiency is affected by initial weights; therefore, this problem can be alleviated by adopting SDAE. The main idea is to train one layer at a time by minimizing the reconstruction error. The feature of the i-th hidden layer is used as input for the (i+1)-th hidden layer. The first autoencoder is trained in a bottleneck fashion with initial weights and bias (w1, b1). The input x with the random noise is then transformed into a low dimension through encoding function and restored back to its original dimension in the decoding layer. The optimal is obtained when the error function (L) reaches the minimum and it is shown in (7). After converging to the minimum, the hidden layer, which is so called abstract features, is stored and used it as the input for the second autoencoder. After removing the decoding layerx in the first autoencoder, a new hidden layer 2 h and output ˆ1 h are stacked onto the first autoencoder as shown in Fig. 3. Using a similar process, many autoencoders were successively stacked together to form a deeper network structure. This process of using stacked autoencoders is commonly referred to as pre-training because it resembles as Restricted Boltzmann Machine (RBM). Lastly, prior layer is trained with the given label at the output layer to reconstruct unsaturated signal. All optimal SDAE weights and bias (wi, bi, and i = 1,2,…,n) which are obtained during the pre-training were fine-tuned by backpropagation algorithm to achieve significant results in the fine-tuning. Fig. 3 presents the proposed methodology in depth. Table 1 concretely generates one-dimension saturated waveform. Due to different variation of input magnitude, BDNN might produce large error and renders output inconsistently; therefore, it requires to be normalized to the same magnitude to accelerate DBNN training and improve generalization of neural network [31]. The normalization formula is given as follows:

C. PRE-PROCESS OF INPUT DATASETS
where xi is the ith sample of the input data. After normalization is conducted, a moving-window algorithm is applied to input normalized datasets to form DBNN input training matrix. Selecting m and s are vital to reconstruct unsaturated waveform. In this study, we set m 64 that is equal to number of samples per cycle with s of 1.
Considering an input x = {x1, x2,….., xi}, where i is the input index. By using x, input training matrix can be formed as follows where N is the number of signal index.

D. DEEP LEARNING HYPERPARAMETERS TUNING
Searching for the optimal training hyperparameters is normally performed by randomly choosing a set of hyperparameters which is very time-consuming to attain these hyperparameters. In recent years, grid search and random search are the most common optimization for hyperparameter tuning in machine learning and deep learning. However, as the dimension of tuned hyperparameters increases, the search of optimal hyperparameters increases exponentially. Random search is proposed to deal with the problem occurred in the grid search where the random combinations of the hyperparameters are used to find the best solution for the model. For a huge dataset, it is time-consuming to achieve the optimal hyperparameters in the random search. Therefore, Bayesian optimization (BO) comes in as a tool to efficiently tune machine learning and deep learning hyperparameters [29] that chooses the hyperparameters giving in more optimal solution. Thus, BO is chosen as a hyperparameter tuning for SDAEs during the pre-training to solve this searching difficulty. Bayesian optimization (BO) is an effective global optimization of black-box functions which is based on a probabilistic model (gaussian process) to measure the objective function in search space. Bayesian surrogate model helps to represent the underlying objective function of the problem and acquisition model selects the next evaluation point based on prior knowledge. Our goal with respect to BO was to identify the best combination of hyperparameters xt for an objective function f(x), the combination that maximizes the output of a given search space X. The search space is designed to be a 3-dimension vector (learning rate, batch size, and number of neurons). Then, a surrogate model is built for optimization process to estimate a set of hyperparameter for SDAE with an initial hyperparameter set. After that, we choose Expected Improvement (EI) for the acquisition function which optimizes the locations in the search space to generate the next samples for evaluation. EI chooses the next point xn in the search space X to evaluate that yields the smallest error. After several iteration, the best training hyperparameters can be obtained from BO.
where, f * is the maximum value that f(x) has experienced during the optimization process. The flowchart of the proposed BDNN is given in Fig. 4.

A. DATA GENERATION FOR TESTING PROCEDURE
To evaluate BDNN efficiency, a typical three-phase overhead transmission line as shown in Fig. 5 is modelled in PSCAD/EMTDC which generates cases for BDNN testing. A CT model in [30] was utilized to generate saturation data for training in PSCAD/EMTDC with a ratio of 2000:5 (C400, R2=0.61Ω) and a resistive burden of VOLUME XX, 2017 9 3.42Ω. Phase A current at the relay point is collected to evaluate with various fault inception angle and remnant flux. The sampling frequency is set to 3840Hz or 64 samples cycle in a 60Hz system. Then, the imported signal will pass through the moving-window technique in (10) to create the testing datasets for BDNN. When pre-processing is correctly configured, the training process is conducted on a graphics processing unit (NVIDIA GeForce GTX 2080 Ti), and it is carried out using a python-based version of TensorFlow (Google LLC) [25].

B. TRAINING HYPERPARAMETERS DETERMINATION
We varied the number of hidden layers when determining the optimal deep learning structure for saturation compensation. Five hidden layers were stacked during the pre-training described in section II. Fig. 6 shows that compensation was optimal with three or four hidden layers. However, due to computation burden and time efficiency, the suitable number of hidden layers is 3. Then, other optimal hyperparameters for the training framework afforded by BO are given in Table 3.

C. IMPACT OF BDNN FOR CT SATURATION
In this subsection, we investigate an impact analysis on 345kV overhead transmission system on the maximum DC offset occurs on the line. We consider scenarios with and without remnant flux. The result of CT saturation with and without remnant flux are illustrated in Fig. 7 and 8. Estimated outputs from the BDNN-, ANN-, and LES-based techniques are denoted as iBDNN, iANN, and iLES, respectively. We can observe in Fig. 7 that BDNN completely compensate for CT saturation throughout the whole cycle despite a slight and undershoot during the second fault cycle. Small oscillations are evident a fault, reflecting the use of moving-window algorithm and a sudden change in fault magnitude. However, this small error does not severely compromise the accuracy of the proposed BDNN. Another observation is that BDNN produces a slight undershoot in the second fault cycle after a fault occurs and it is acceptable in this study. By investigating the error (iError) of the proposed BDNN, the error line is shown where it yields the largest error between 0.2284s to 0.2472s. Furthermore, BDNN yields the error approximately 0.21 with remnant flux of 0%. Based on these observations, BDNN exhibits a great performance with less sensitivity to abovementioned issues and rapid convergence. To further evaluate the efficiency during heavy saturation, the remnant flux is increased to 60% that produces extreme saturation on the fault angle of 0 0 as shown in Fig. 8. We observe in Fig. 8 that CT saturates in the first fault cycle that is approximately at 0.2219s. Similarly, BDNN estimates the correct magnitude of the unsaturated waveform. In addition, the use of movingwindow technique on phasor estimation is insignificant as we discussed above. The error line generates a large swing when there is distortion in current from first half of fault cycle to the end of the second fault cycle and the error decay to nearly zero after the second fault cycle. Therefore, BDNN produces remarkable outputs even there is a heavy saturation occurred in the system. Next, we investigate the performance of BDNN on saturation variation. It is vital for the proposed to estimate an accurate result at any given saturation variation occurred in the power systems. It is obvious that saturation severity decreases in proportional to the decrease of DC offset magnitude. Fig. 9 shows compensation results on fault angle of 45 0 with 60% remnant flux. In this case, the effect of moving-window algorithm yields more oscillation prior to fault. We can observe from these figures that BDNN also estimates magnitude with slight error after the fault. Likewise, the most noticeable error appears in the second fault cycle in which it produces an apparent undershoot on 345kV system. It is very important that the proposed BDNN can compensate for CT saturation in different voltage system. Fig. 10 presents the compensation result for another different voltage system with the severe saturation in the 154kV test system. As shown in Fig. 10, it is observed that BDNN can also estimate the correct magnitude in every cycle even it experiences the heavy saturation. Thus, we can conclude that the proposed BDNN performs well regardless voltage system because of dataset normalization. Data normalization plays a significant role in our proposed BDNN algorithm.

D. COMPARATIVE STUDY
To investigate the performance of ANN and LES filter, a comparative study is conducted. Figs. 11 shows the comparison results of ANN, LES, and BDNN on the most severe saturation having the fault angle of 0 0 and remnant flux of 60%. As we observe in Fig. 11, the oscillation of ANN prior to fault is slightly apparent than BDNN and LES. As the fault occurs, ANN tends to produce less error than LES. As displayed in Fig. 10, LES apparently yields a few noticeable oscillations from the first to third fault cycle. It can be assumed that LES produces error when saturation is severe and the convergence of LES is achieved 3 cycles after a fault. Compared to LES, ANN requires 2 cycle to converge. Although ANN yields a similar output like the proposed BDNN, it produces a noticeable oscillation before a fault occurs and undershoot in the second fault cycle. Therefore, ANN and LES do not cope well with CT saturation compensation as they produce some ripples and overshoot in some cases. Thus, the performance of BDNN gives better suppression on CT saturation when there is an appearance of CT saturation in the power systems. The comparison of other saturation cases is given in Table 4.  To evaluate the accuracy of CT saturation compensation for each algorithm, the estimation mean error (μ) and its standard deviation (σ) are computed and given in (13) and (14), respectively, where xi is the reconstruction error between estimated and actual waveform. Table 4 summarizes estimation mean error and standard deviation of BDNN compared to ANN, and LES for several remnant flux and fault angle on a different voltage system. ANN generated mean error between 0.15 and 0.94 while LES returned error from 0.11 to 0.81 According to Table 4, BDNN reaches the largest error of 0.39 on 345kV system in the case of 0 0 fault angle with 60% remnant flux. ANN produced largest standard deviation of 0.95 on 345kV system in case of 0 0 fault angle with 60% remnant flux. BDNN achieved the smallest standard deviation between 0.12 and 0.59. Due to sudden change in magnitude after the fault occurrence and the use of moving-window technique, BDNN and ANN approximates the incorrect magnitude a cycle before a fault. However, BDNN shows less sensitive to the sudden increase after it experiences fault and provides quicker convergence than ANN as depicted in Figs. 6-10. ANN tends to produce a large swing one cycle before fault occurrence due to the moving-window effect. Thus, BDNN can compensate the effect of CT saturation quickly and give less ripple to the outputs even with the various fault magnitude. The effectiveness of BDNN and ANN can be implemented on other CTs that have similar characteristics. This fact was shown in some studies using intelligent methods for protective relays [3], [4].

V. Conclusion
This paper presents a current transformer saturation compensation method which is of BO and DNN. DNN structure is established through stacking denoising autoencoders and backpropagation. By employing this method, the network appropriately obtains initial weights for each hidden layer which reduces the computation complexity in the deeper structure and provide an easy method to build deep networks. Moreover, the utilization of SDAE is to make the model to suppress the noise effects in the real application without using the low-pass filter. Bayesian optimization perfectly optimizes the training hyperparameters which takes less time than other available optimizations for deep learning. The performance of BDNN is evaluated on simulated data from PSCAD/EMTDC on the variation of saturation such as different fault angles, remnant flux, and power system level. The results show that BDNN can compensate CT saturation in various scenario regardless of fault types and fault current magnitude. Compared with ANN and LES, the reconstruction error of BDNN achieves the least error and its performance is relatively stable with different voltage system. Although the slight error of implementing movingwindow algorithm is notably seen before a fault, this effect gives less influence on the phasor estimation. The limitation of the proposed BDNN is generalized to only some of the specific current transformer. However, we intend to develop a compensation model using BDNN which can work for all kinds of current transformer. Our future work is to implement this proposed BDNN on the real time. The capability of CPU provided in AM572x, which is the hardware platform considered for our implementation in future, is 40GMAC per core (80G FLOP per core). The floating-point operation for the proposed BDNN is 1,201,104 that is sufficient for CPU of real-time devices to calculate the neural network implementing on the real time.