Wavelet-Like Transform to Optimize the Order of an Autoregressive Neural Network Model to Predict the Dissolved Gas Concentration in Power Transformer Oil from Sensor Data

Dissolved gas analysis (DGA) is one of the most important methods to analyze fault in power transformers. In general, DGA is applied in monitoring systems based upon an autoregressive model; the current value of a time series is regressed on past values of the same series, as well as present and past values of some exogenous variables. The main difficulty is to decide the order of the autoregressive model; this means determining the number of past values to be used. This study proposes a wavelet-like transform to optimize the order of the variables in a nonlinear autoregressive neural network to predict the in oil dissolved gas concentration (DGC) from sensor data. Daubechies wavelets of different lengths are used to create representations with different time delays of ten DGC, which are then subjected to a procedure based on principal components analysis (PCA) and Pearson’s correlation to find out the order of an autoregressive model. The representations with optimal time delays for each DGC are applied as input in a multi-layer perceptron (MLP) network with backpropagation algorithm to predict the gas at the present and future times. This approach produces better results than choosing the same time delay for all inputs, as usual. The forecasts reached an average mean absolute percentage error (MAPE) of 5.763%, 1.525%, 1.831%, 2.869%, and 5.069% for C2H2, C2H6, C2H4, CH4, and H2, respectively.


Introduction
The transformer is one of the most important devices in the electricity distribution process, and reliable power distribution depends largely on the failure-free operation of this equipment. The failure of the transformer during operation can bring a significant loss of revenue to the utility, possible environmental damage, explosion and fire risks, and expensive costs of repair or replacement [1,2]. In the case in which these devices fail, operational life expectancy and reliability may change over the years and electricity to consumers may be interrupted. Therefore, the analysis of the condition and maintenance of the transformer are extremely important to ensure stable reliability of electricity [1,[3][4][5].
When the power transformer is in normal operation, the insulating oil and solid insulating material will gradually deteriorate and a small amount of gas will be decomposed, including mainly hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ), ethane (C 2 H 6 ), carbon monoxide (CO), and carbon dioxide (CO 2 ). On condition of internal transformer failure occurring, the emergence speed of these gases is accelerated [6]. So, one of the most important tools for power transformer condition monitoring and internal fault diagnosis is the transformer oil gas chromatography test, known as dissolved gas analysis (DGA) [7][8][9][10].
Several studies have addressed the creation of power transformer condition monitoring systems based on DGA. Many techniques for predicting the concentration of gases have been proposed, such as wavelet least squares, support vector regression, neural network, deep learning, fuzzy model, and long short-term memory (LSTM), just to name a few.
In general, artificial intelligence techniques have been widely used to develop more accurate diagnostic tools based on DGA data [5,[9][10][11][12][13][14]. In [9], for example, a new approach for diagnosing transformer failure was created based on gas rate and support vector machine (SVM). First, on the basis of the International Electrotechnical Commission Technical Committees (IEC-TC) 10 database, optimal dissolved gas rates are obtained by genetic algorithm designed for simultaneous DGA rate selection and SVM parameter optimization. In that work, three traditional methods were used: SVM DGA, backpropagation neural network (BPNN) DGA, and IEC criteria, and three-key IEC gas proportions with SVM and back propagation neural network were employed to compare accuracy. The SVM technique also served as a basis for the approaches in [11,13,15]. The authors in [13] have used the least squares support vector machine (LS-SVM) for dissolved gases forecasting (H 2 , CH 4 , C 2 H 2 , C 2 H 4 , and C 2 H 6 ) and assessing incipient faults of transformer polymer insulation. Meanwhile, in [15], a new approach has been proposed to combine technical wavelet regression with LS-SVM for the prediction of dissolved gases in power transformers immersed in oil. In [10], the authors have used a fuzzy inference system (FIS) to determine absolute concentrations of free and dissolved transformer oil, total dissolved combustible gases, total combustible gases, proportions of some gases with each other, and gas rates increasing to detect the decomposition of transformer isolation papers. A similar approach has been proposed in [5], in which an adaptive neuro fuzzy inference (ANFIS) system was employed to estimate the transformer isolation degradation rate with the input variables H 2 (hydrogen), CH 4 (methane), N 2 (nitrogen), O 2 (oxygen), CO (carbon monoxide), CO 2 (carbon dioxide), C 2 H 6 (ethane), C 2 H 4 (ethylene), C 2 H 2 (acetylene), and TDCG (total dissolved combustible gas).
In general, these numerous studies have used artificial intelligence techniques as regression to predict gas concentration or faults in power transformers. More specifically, the use of prediction models in connection with the wavelet transformed has been addressed in some recent works to improve the forecast [13][14][15]. Despite satisfactory results, those approaches may not be the most efficient in predicting future values of the variable of interest, especially for a multi-step ahead forecast. Several empirical studies show that learning long-term time dependencies can be difficult for gradient-descent algorithms, which are more effective, converge faster, and generalize better in nonlinear autoregressive neural network models than in other neural networks [14,[16][17][18][19][20]. Autoregressive models based upon neural networks specify that the output variable depends, in a non-linear way, on its own past values and on a stochastic imperfectly predictable term. Thus, the prediction of future values of the output variable can be realized from its past and present values. Additionally, the prediction model can also consider present and past values of one or more auxiliary external variables, resulting in a nonlinear autoregressive model with exogenous variables.
In this sense, the authors in [14] proposed a combination of a nonlinear autoregressive neural network model with the discrete wavelet transform, resulting in a high-accuracy multi-step ahead forecast of in-oil gas concentrations. The authors investigated the use of different wavelet functions and different time delays in the autoregression model, but they did not assess how different delays in external series can influence the values of the output series.
In fact, the definition of the optimal input and output delays is one of the main limitations of an autoregressive model. In general, in multidimensional models with n external variables, equal variables delays are adopted. This means that the prediction of the output value at time t + 1, y(t + 1), is performed using the past outputs y(t), y(t − 1), . . . , y(t − d y ) and the past observations u i (t), u i (t − 1), . . . , u i (t − d u ) of the external variables u i as inputs, i = 1, . . . , n. In addition, the adoption of many inputs can increase the complexity of the forecasting model and reduce its accuracy. Thus, some difficulties and limitations remain despite the advances, motivating research for new models to be conducted.
The investigation of the use of different time delays in external series that influence the output does not seem to have received the necessary attention, especially considering that there is a strong correlation between the concentrations of different gases and failures in transformers. This work seeks to contribute to overcome this limitation by proposing a wavelet-like transform to optimize the order of the factors in an autoregressive neural network model, with some exogenous variables, to predict the dissolved gas concentration in power transformer oil.
The main objective of this work is to determine the optimal delay for each input and for the output to create an autoregressive model with a reduced number of inputs and with competitive precision in relation to the literature. The hypothesis is that wavelet-like approximations of the external variables and the output variable incorporate the temporal memory of the autoregressive model. In addition, the selection of the best approximation for each variable determines the ideal delay for each input while reducing the size of the model, as each sample of the approximation is calculated considering a time window of the series.
Consequently, the contributions of the proposed approach can be stated as follows: 1.
Development of an approach based on a wavelet-like transform that determines the optimal delay for each external variable and for the output variable in an autoregressive prediction model; 2.
A prediction model with high precision as it focuses on the trend of the input signals from the noise-free approximations calculated by the wavelet transform; 3.
Expansion of knowledge of the temporal relationship between gases underlying degradation process of the insulating oil and solid insulating material; 4.
Reduction of the number of input variables in the autoregression model when using the approximations resulting from transformations with wavelets of different lengths, which already consider the time delay determined for each variable.
The remainder of this paper is organized as follows. The related theory is discussed in Section 2-dissolved gas-in-oil analysis in Section 2.1, discrete wavelet transform in Section 2.2, and nonlinear autoregressive exogenous model in Section 2.3. In Section 3, materials and methods will be presented, followed by the results in Section 4, discussion in Section 5, and finally the conclusion in Section 6.

Dissolved Gas-In-Oil Analysis
Power transformers are one of the most important devices for the electrical system, and more than 90% of transformers are immersed in oil [21]. Dissolved gas-in-oil (DGO) is a simple parameter used to monitor energized power transformers and assess the condition of power transformers (PTE) [22][23][24].
Because of the thermal and electrical stresses experienced by the insulation of operating transformers, paper and oil decomposition occurs, generating gases that dissolve in the oil and reduce its dielectric strength. Thus, concentrations of various gases dissolved in the transformer oil owing to the decomposition of the oil and paper insulation [22].
There are a couple of DGA-based interpretive methods for detecting power transformer failure that use a relation between two gases to determine a possible problem. The methods are gas key; IEC ratios; the graphical representation, IEC 60599, Duval; and Doernenburg, Rogers, among others [4,11,21]. As an example, in the following, we show three of these methods and their respective gas concentration levels. Table 1 shows the fault description for gas concentration, problem description, and normal and abnormal values for each gas concentration. In Table 2, we have fault diagnosis by the Dornenburg ratio method (R1, R2, R3, R4) and fault type, while Table 3 shows fault classification using IEC ratio codes (C 2 H 2 /C 2 H 4 , CH 4 /H 2 , C 2 H 4 /C 2 H 6 ) and fault type [2,24].

Discrete Wavelet Transform
Wavelet transform (WT) is a widespread signal processing technique. In the last decades, several algorithms of compactly supported wavelet have been created by mathematical analysis and signal processing communities. In fact, several works attempt to motivate and explain the basic ideas behind wavelets, what makes them so successful in many applications in different areas, as well as some limitations [25][26][27][28].
Mostly, WT is used for trend analysis, correlation and coherence between two time series, cross-spectral analysis, and space-based verification wavelets, while there are limited applications related to forecasting [29].
Discrete wavelet transform (DWT) is any wavelet transformation for which wavelets are discretely sampled, meaning this is a discrete set of the wavelet scales and translations [29]. The basic idea of this technique is to exploit the correlation present in most real-life signals to build a sparse approximation [14]. So, DWT possesses many favorable properties that are useful for researchers in the time series data mining field [1].
The dependency on discrete one-dimensional wavelet transform is presented in Equations (1) where H(ω) is the transfer function of high-pass filter, G(ω) is the transfer function of low-pass filter, and they filter the signs of low and high frequency keeping important information from the original signal. h n , g n are the coefficients depending on the mother wavelet; A m,k is an approximated profile; D j,k is the detail profile; j, m are the decomposition level of the wavelet transform; ψ j,k (t) is mother wavelet and ϕ m,k (t) is the scaling function.
Daubechies wavelets (DW), dbN, offer a family of orthogonal transformations, where N refers to the number of vanishing moments, which generally vary from db2 to db22 and have the ability to accurately approximate constant and linear functions and a relatively simple form [1,27].

Nonlinear Autoregressive Exogenous Model
A time series is any set of observations organized in time; usually measurements are made at evenly spaced times, for example, daily pollution values, monthly temperature values, daily values of electricity consumption, and daily stock exchange indices, among others [30].
An exogenous nonlinear autoregressive model (NARX) in time series modelling is a nonlinear model that has exogenous inputs, in which the model relates past values of the same series and current and past values of the driving series (exogenous), that is, externally determined series that influence the series of interest, as defined in Equation (4) for l exogenous variables: where y is the output variable; u i , i = 1, . . . ,l are externally determined variables; d ui is the order of variable u i ; ε is the error term; and F is some non-linear function, such as a polynomial for example. There is a trend toward the adoption of computational techniques and many effective attempts have been developed, such as the following: the authors of [14] have used NARX to predict gas concentration in power transformer oil. Meanwhile, the authors of [5] and [10] have created an autoregressive model using ANFIS models to detect and isolate, as well as perform transformer paper expected life estimation.
As can be seen, there are a couple of autoregressive models that have been used to assist in the prediction of DGA and the health status of the power transformer [7]. However, in all the researched papers, the order of the factors is always the same, that is, d y = d u1 = · · · = d ul in Equation (4).

Materials and Methods
The proposed approach relies on a wavelet-like transform to optimize the order of the factor (gas concentrations) in a nonlinear autoregressive model with exogenous variables. This means to define the optimal order d y , d u1 , · · · , d ul for each gas concentration.
Thus, the approach proposed has the following steps: step 1, gas concentration acquisition and data normalization; step 2, Kaiser-Meyer-Olkin (KMO) and Bartlett test; step 3, wavelet-like decomposition of gas concentration; step 4, Pearson's correlation; step 5, standardized regression coefficients; step 6, a model using principal components analysis (PCA) to select the principal component; step 7, calculation of contribution rate for each wavelet decomposition level; and, finally, step 8, prediction using the best time delay as input in a multi-layer perceptron (MLP) network. All these steps are illustrated in Figure 1 and described in detail as follows. Step 1: Usually, interpretation techniques such as Duval triangle are applied to the information on the concentration of gases in the transformer oil, which is collected using an equipment such as Morgan Calisto, Luman Sense Smart DGA, General Electric (GE) Transfix, Qualitrol DGA 150, or others [8].
Initially, this work collected a set of 190 historical oil-dissolved gas data from a transformer equipped with a GE Kelman-Transfix (GE-General Electric, Sao Paulo, Brazil) and GE Intellix BMT 330 (GE-General Electric, Sao Paulo, Brazil). In this stage, the variables pointed out by [10][11][12][13] are C 2 H 2 , C 2 H 4 , C2H 6 , CO, CO 2 , CH 4 , O 2 , and H 2 . However, H 2 O and combined gas concentrations were added as input, resulting in ten variables. Before the next step, all the data were normalized between 0 and 1.

Step 2:
The KMO test is applied to verify the measure adequacy sampling for each variable in the model [31] and the Bartlett test to test the hypothesis that the correlation matrix is an identity matrix, which would indicate that variables are unrelated, and thus unsuitable for structure detection [32]. KMO (1977) is a criterion for identifying whether a factor analysis model being used is adequately fitted to the data, testing the overall consistency of the data [31]. Meanwhile, Bartlett's sphericity test is a technique created by Maurice Stevenson Bartlett in 1937, which indicates the strength of the relationship between variables.
Step 3: At this stage, DWT is used in two forms. In the first one, each gas concentration is decomposed keeping level of decomposition in 1 while changing the wavelet from db2, db4, . . . , to db20, in order to create smooth approximations of the original gas concentration using the low frequency filters. Additionally, the wavelet transform is applied in the gas concentrations in reverse chronological order so that each sample of the approximation is created with values passed from the original signal.
Considering m samples from a time series in reverse chronological order, that is, the most recent samples at the beginning, S = (s t , s t−1 , · · · , s t−k , · · · , s t−m+k , · · · , s t−m+2 , s t−m+1 ), and a low pass wavelet filter H of length k, H = (h 0 , h 1 , · · · , h k−1 ), k m, Equation (5) defines the application of the transform to the signal S to create an approximation S dbk = (s t , s t−1 , · · · , s t−(m/2)+1 ) with time delay k − 1, as proposed in this work, Approximations S dbk , k = 2, 4, . . . , 20, with half the length of the original signal, m/2, for each Daubechies wavelets from db2 to db20, are created, resulting in 10 approximations for each time series S. Here, we have 190 samples of each gas concentration.

Step 4:
Unlike the authors of [33], who have used Pearson's correlation coefficient between the constant characteristic parameter and the candidate of the variable characteristic parameters to verify the concentration of gas that presents the best correlation to electrical faults, this work uses the Pearson's correlation to calculate a relationship between the various approximations created for gas concentrations with different time delays (wavelets of different lengths). Thus, this step results in a matrix X with 110 columns and 190 rows, such that the 110 columns represent the time t, t − 2, through t − 20 of each gas concentration, which generates 110 input variables.

Steps 5 and 6:
In these steps, we apply PCA in the matrix A created from the relation between inputs Xj (gas concentrations delayed at time t − 2 to t − 20 according to wavelet-like transform) and output Yi (a gas concentration in time instant t). So, the values of A are calculated as standardized regression coefficients a ij (Equation (6)) for each input and output, describing the relationship between the concentration of a given gas and the approximations created for all other gases in different time delays generated by the wavelet transform. Therefore, a square matrix is created for each gas concentration, in which the PCA is applied to select the main components that represent at least 99% of the original data variation, generating a supervised PCA (SPCA), according to [34,35].
Step 7: The contribution of each time delay is calculated as follows: C i ( j) = p i=1 A T j A j λ i , in which A represents the input data, λ are corresponding eigenvalues, A' is the representation of A in the principal component space, and p is the most important principal component [35].
Step 8: An MLP neural network is trained with the Levenberg-Marquardt backpropagation algorithm with 100 epochs, 1 input layer, 1 hidden layer, and 1 output layer. The neurons in the hidden layer were used following two approaches-the first one following [36], which propose a method using N h = 2 n -1, and the second following [37], proposing N h = (4n 2 +3)/(n 2 -8), where N h corresponds to the best neurons numbers and n is the number of input parameters.
Unlike [3], we normalize the input data between -1 and 1 for applying a population-based metaheuristic algorithm to optimize the structure of the MLP neural network with back propagation algorithm. We propose using the optimal time delays made with the wavelet as input in an MLP with a backpropagation algorithm.
In order to test the temporal relationship between gases underlying the degradation process of the insulating oil and solid insulating material, five gas concentrations were chosen, as the main methods basically used by the IEC ratios and Rogers and Dornenburg ratios are C 2 H 2 , C 2 H 4 , C 2 H 6 , CH 4 , and H 2 to identify possible power transformer faults. Figure 2 shows an example of the neural network architecture to predict gas concentration C 2 H 4 , where the optimal time delays for C 2 H 2 , C 2 H 6 , CH 4 , and O 2 are selected according Pearson's correlation and PCA.
Regarding the output, 95 samples related to odd days were selected to create the matrix A, as well as to train and test the forecasting model, as it is necessary to put the input and output data with the same length. Similar experiments were carried out for the other gases: when the output is C 2 H 2 , the inputs are the approximations of C 2 H 4 , C 2 H 6 , CH 4 , and O 2 with their respective optimal delays defined by the proposed approach; when the output is C 2 H 6 , the inputs are the wavelet approximations of C 2 H 2 , C 2 H 4 , CH 4 , and O 2 ; and so on.

Results
Firstly, we evaluated the results of the KMO and Bartlett test. In Table 4, the KMO test indicated 0.743, while at the same time, the Bartlett test indicated 0; for this reason, these data are suitable for data structure detection, while the Bartlett test indicates that a factor analysis may be useful with your data. The next stage shows the resulting of selection and contribution rate of decomposition of each variable. Table 5 shows the contribution rate per gases concentration (normalized).
Each variable presents a different importance rate and first order, C 2 H 6 has Wavelet db20, which means that all gas concentration have to delay in time instant t − 20, while db8 has more impact in CH 4 , db18 in O 2 , and so on (see Table 5). The level of correlation of the time delays for each gas concentration in is shown Table 6, wherein the values marked in bold and italics are the time delays that have higher correlation with C 2 H 2 , C 2 H 6 , C 2 H 4 , H 2 , and CH 4 .
The following shows the results of the neural network prediction for two gases concentrations using 8 and 15 neurons in the hidden layer, as the methods DGA IEC ratios, as well as the Rogers and Dornenburg ratios, basically use the following to analyze the potential problems in power transformers: CH 4 gas, H 2 , C 2 H 2 , C 2 H 4 , and C 2 H 6 . The results presented in Table 7 show us an average MAPE for two days of 1.525% for C 2 H 6 and 1.831% for C 2 H 4 . Meanwhile, Figure 3 compares the predicted values with the real values for the five gas concentrations. As can be seen, the selection of the optimal time delay in gas concentration can improve prediction accuracy, when comparing predictions with the input variables at the same time t − 2 and t − 4 (Table 7). There are a couple of DGA-based interpretive methods for detecting power transformer failure that use a relation between two gases to determine a possible problem. The methods are gas key; IEC ratios; the graphical representation, IEC 60599, Duval; and Doernenburg, Rogers, among others [4,11,21]. As an example, in the following, we show three of these methods and their respective gas concentration levels. Table 1 shows the fault description for gas concentration, problem description, and normal and abnormal values for each gas concentration. In Table 2, we have fault diagnosis by the Dornenburg ratio method (R1, R2, R3, R4) and fault type, while Table 3 shows fault classification using IEC ratio codes (C2H2/C2H4, CH4/H2, C2H4/C2H6) and fault type [2,24].

Discussion
This study aimed to study the optimal time delay of each gas concentration impacting the gases H 2 , CH 4 , C 2 H 2 , C 2 H 4 , and C 2 H 6 (Tables 6 and 7), in which a DGA technique subsequently be used to detect the defect in the power transformer.
The approach using a wavelet-like transform and SPCA shows the contribution rate of different time delays of each gas concentration, which differs from the proposal of recent works, such as, for example [13,14]. In [14], for example, despite testing different wavelet functions and different delays, all models adopted the same time delay for external variables. Here, the approach shows the rate and order of importance and wavelet-like order for ten gas concentrations ( Table 5), indicating that db20 (t − 20), db8 (t − 8), and db18 (t − 18) are the three most important time delays for the gas concentrations C 2 H 6 , CH 4 , and O 2 , respectively. This result shows that the effect that a given gas suffers from other gases varies differently over time for each gas.
We have used Pearson' s correlation to consideration the impact of each time delay as using different time delays t − 2 to t − 20 in each gas concentration, showing, for example, that to predict the concentrations of C 2 H 2 , the best time delays for the other gas concentrations are as follows: t − 12 for C 2 H 4 , t − 6 for C 2 H 6 , t − 10 for CH 4 , and t − 8 for H 2 . It is important to highlight that a traditional autoregressive model that adopts the same delay for all variables would not have identified this relationship. In addition, this is a very important result for calibrating monitoring systems, as it indicates that any variation in C 2 H 4 , for example, will take about 12 units of time to reflect on the concentration of C 2 H 2 . A similar analysis applies to other gases.
A similar kind of relationship of different gases has been studied in [38] and [33]. In [38], the authors have studied a correlation between the five gas concentrations, by applying the value of grey relational grade to reveal the relationships between gas features. Those authors show that the grey relation analysis is efficient in selecting and removing redundant features from the set of input variables. However, it does not consider any time delay in sampling the input series of gas concentrations. On the other hand, the authors in [33] have used correlation coefficients of gas concentration CO as a constant characteristic parameter for the correlation of time series analysis and H 2 , CH 4 , C 2 H 2 , C 2 H 4 , and C 2 H 6 as characteristic variable parameters to be used to distinguish electrical faults from thermal faults.
However, approaches based on autoregressive models apply the same order for all input variables and do not take into account the time delay relationship between gas concentrations. Notwithstanding, we have seen that the optimal selection of the time delay for each concentration of gas affects the output.
Regarding forecast accuracy, this approach shows some better predictions than [33,38,39] (see Table 8). It is important to highlight the low computational cost of the proposed model, because it takes a matter of seconds to run. In the example above regarding the prediction of the C 2 H 2 concentration, instead of input 12, 6, 10, and 8 passed values of the gases C 2 H 4 , C 2 H 6 , CH 4 , and H 2 , respectively; according to in Equation (4), we simply use the corresponding four approximations created by the wavelet-like transform for each exogenous gas.

Conclusions
This work presented an approach based on the wavelet transform to determine the ideal time delay for each gas concentration and use it as a regression model in an MLP network. The application of a wavelet-like transform creates sparse approximations of gas concentrations with different time delays, and enables us to define an accurate and computationally efficient prediction model. The prediction model has high precision as it focuses on the trend of the input signals from the noise-free approximations calculated by the wavelet transform. Moreover, it reduces of the number of input variables in the autoregression model when using the approximations resulting from transformations with wavelets of different lengths, which already consider the time delay determined for each variable.
In addition to simply assessing the relationship of different gases, the proposal makes it possible to identify the relationship between a specific gas and delayed approximations of all other gases. This is a fundamental result for monitoring systems, as it indicates that variations in different gases take different times to be reflected in the concentrations of other gases under observation. As an example, it was observed that variations in ethane affect acetylene concentrations more quickly than variations in ethylene. So, the approach extends the discussion of the temporal relation between gas concentrations, providing an expansion of knowledge of the temporal relationship between gases underlying degradation process of the insulating oil and solid insulating material.
Thus, wavelet-like approximations of the external variables were an effective tool to incorporate temporal memory of the autoregressive model and to determine the ideal delay for each input. As future research, we suggest investigating other feature selection techniques to evaluate the results of Pearson's correlation and PCA.