Inversion Method for Transformer Winding Hot Spot Temperature Based on Gated Recurrent Unit and Self-Attention and Temperature Lag

The hot spot temperature of transformer windings is an important indicator for measuring insulation performance, and its accurate inversion is crucial to ensure the timely and accurate fault prediction of transformers. However, existing studies mostly directly input obtained experimental or operational data into networks to construct data-driven models, without considering the lag between temperatures, which may lead to the insufficient accuracy of the inversion model. In this paper, a method for inverting the hot spot temperature of transformer windings based on the SA-GRU model is proposed. Firstly, temperature rise experiments are designed to collect the temperatures of the entire side and top of the transformer tank, top oil temperature, ambient temperature, the cooling inlet and outlet temperatures, and winding hot spot temperature. Secondly, experimental data are integrated, considering the lag of the data, to obtain candidate input feature parameters. Then, a feature selection algorithm based on mutual information (MI) is used to analyze the correlation of the data and construct the optimal feature subset to ensure the maximum information gain. Finally, Self-Attention (SA) is applied to optimize the Gate Recurrent Unit (GRU) network, establishing the GRU-SA model to perceive the potential patterns between output feature parameters and input feature parameters, achieving the precise inversion of the hot spot temperature of the transformer windings. The experimental results show that considering the lag of the data can more accurately invert the hot spot temperature of the windings. The inversion method proposed in this paper can reduce redundant input features, lower the complexity of the model, accurately invert the changing trend of the hot spot temperature, and achieve higher inversion accuracy than other classical models, thereby obtaining better inversion results.


Introduction
Electricity plays an important role in the national economy [1][2][3], and regarding transformers, as the core components of power transmission and distribution, their safety and stability are crucial for ensuring the normal operation of the power system.The thermal condition of power transformers is a vital indicator for assessing insulation performance.If the winding hotspot temperature is too high, it will affect the equipment's voltage withstand capability and mechanical strength, leading to breakdown accidents [4,5].Therefore, the inversion of transformer winding hotspot temperature is crucial for predicting equipment operating conditions effectively and enabling predictive maintenance to achieve precise operation and maintenance.
Sensors 2024, 24, 4734 2 of 18 The existing research methods for obtaining transformer hot spot temperature mainly include the direct measurement method, thermal circuit model method, numerical simulation method, empirical formula method, etc.However, these methods have some limitations in use due to reliability, efficiency, accuracy, and other issues.
Due to its remarkable ability to extract features from various complex data, machine learning has been widely applied in the study of transformer winding hotspot temperature.To date, many scholars have conducted relevant research on the prediction of winding hotspot temperature [6][7][8][9].Deng et al. [10] predicted the hotspot temperature of a 10 kV transformer using support vector regression (SVR) based on primary feature temperature points obtained from thermal field calculations on the transformer casing.Sun et al. [11] combined actual operating temperature, load, the transformer cooling method, and ambient temperature data from monitored 35 kV dry-type transformers, optimizing SVR models using particle filtering for predicting transformer winding hotspot temperature.Wei et al. [12] considered factors such as sunlight and external wind speed affecting oilimmersed transformers, using experimental data and neural networks to model winding hotspot temperature, and studied the optimization of neural network structures and algorithms.Comparisons between measured and computed values using the recommended IEC60076 algorithm and the authors' proposed neural network algorithm showed that values calculated using the neural network algorithm were closer to the measured values.
However, most of the aforementioned studies focused on model construction.The data they used mostly consisted of relevant data at the current moment of the transformer, with little attention paid to the hysteresis between the surface temperature of the transformer tank and the winding hotspot temperature (for example, the aforementioned studies by Deng et al. [10], Sun et al. [11], and Wei et al. [12]).Neglecting hysteresis may result in insufficient temperature feature information, thereby reducing the accuracy of the model inversion.Moreover, the interactions between different factors were ignored, and little research was conducted on the influence of different feature quantities on model performance (for example, the aforementioned studies by Sun et al. [11] and Wei et al. [12]).Inputting all monitoring data into the prediction model may increase complexity and affect performance.To overcome this issue, a correlation analysis should be conducted before inputting data into the model, selecting parameters most favorable for the prediction target to reduce redundancy and improve accuracy.
To address the aforementioned issues, this paper proposes a method for the inverse estimation of winding hotspot temperature in oil-immersed power transformers.Firstly, temperature rise experiments are conducted to collect raw data for inversion research, including temperatures at the inlet and outlet of the radiator, ambient temperature, top oil temperature, winding hotspot temperature, and surface temperature of the transformer tank.To tackle the issue of insufficient temperature feature information, the surface temperature data of the transformer tank are collected in the form of temperature images, fully considering the hysteresis of the data.Secondly, the mutual information algorithm is employed to analyze the correlation between each input feature quantity and the output feature quantity, obtaining the optimal feature set to address the problem of feature redundancy.Finally, based on the Gated Recurrent Unit (GRU), an inversion network is constructed.To overcome the relative weakness of the GRU in handling long-distance dependency relationships, Self-Attention (SA) is introduced for optimization, constructing a GRU-SA network for the inverse estimation of transformer winding hotspot temperature, achieving a high-precision estimation of transformer winding hotspot temperatures.

Analysis of Influencing Factors on Winding Hotspot Temperature
The inversion of winding hotspot temperature in transformers essentially involves establishing a functional relationship between its input feature vector and output feature vector.The influencing factors of winding hotspot temperature directly determine the accuracy of the model constructed, making the selection of influencing factors crucial.A thermal circuit model for power transformers has been proposed in the literature [13], as illustrated in Figure 1.
Sensors 2024, 24,4734 vector.The influencing factors of winding hotspot temperature directly determin curacy of the model constructed, making the selection of influencing factors cruci A thermal circuit model for power transformers has been proposed in the li [13], as illustrated in Figure 1.In Figure 1, P 1wdn , P 2wdn , and P 3wdn represent the winding losses of phases A, B, and C, respectively; P mp , P fe , and P tank represent the losses of the clamps, iron core, and tank, respectively; C 1wdn , C 2wdn , and C 3wdn represent the thermal capacity of the phase A, B, and C windings, respectively; C mp , C fe , and C tank represent the thermal capacity of the clamps, iron core, and tank, respectively; R 1wdn-oil , R 2wdn-oil , and R 3wdn-oil represent the thermal resistance of the phase A, B, and C windings to oil, respectively; R mp-oil , R fe-oil , and R tank-oil represent the thermal resistance of the clamps, iron core, and tank to oil, respectively; R tank-oil and R oil-radiator-air represent the thermal resistance of the tank and oil cooler to air, respectively; θ 1wdn-hs , θ 2wdn-hs , and θ 3wdn-hs represent the winding hotspot temperatures of phases A, B, and C, respectively; θ top-oil , θ mp , and θ fe, θ tank represent the top oil temperature, and the temperatures of the clamps, iron core, and tank, respectively.
The thermal resistances of the A, B, and C three-phase windings to oil, R 1wdn-oil , R 2wdn-oil , and R 3wdn-oil , are represented by a thermal resistance R wdn-oil .The heat capacities C 1wdn , C 2wdn , and C 3wdn of the A, B, and C three-phase windings are represented by the heat capacity C wdn .The winding losses P 1wdn , P 2wdn , and P 3wdn of the A, B, and C three-phase windings are represented by a winding loss P wdn .Then, the three-phase branches on the left side of the thermal circuit model can be merged into one branch, resulting in a simplified thermal circuit model, as shown in Figure 2.
curacy of the model constructed, making the selection of influencing factors cruci A thermal circuit model for power transformers has been proposed in the li [13], as illustrated in Figure 1.From Figure 2, we can derive two differential equations as follows in (1) and (2): In Equations ( 3) and (4), T represents the sampling interval, i.e., the sampling period, and k denotes the index of discrete data.Thus, the differential equations can be discretized as shown in Equations ( 5) and ( 6): Upon the simplification of Equations ( 5) and ( 6), we obtain Equations ( 7) and ( 8): , Equations ( 7) and ( 8) can be expressed as Equations ( 9) and (10): where K 1 ~K7 can be estimated using regression algorithms.Substituting Equation (10) into Equation (9) yields Equation ( 11): From Equations ( 9) and (11), it can be observed that the transformer winding hotspot temperature is related to the top oil temperature, ambient temperature, tank temperature, and the temperature of the tank at the previous time step.Equation (11) also demonstrates the hysteresis effect of the tank temperature on the winding hotspot temperature.
To reduce the top oil temperature, oil is typically circulated into coolers for cooling.It is evident that the cooler is the most crucial device for dissipating internal heat in the transformer.Therefore, the variations in temperatures at the outlet and inlet of the cooler are closely related to the changes in the transformer winding hotspot temperature.
Hence, this paper considers the top oil temperature, ambient temperature, temperatures at the outlet and inlet of the cooler, tank temperature, and the temperature of the tank at the previous time step as feature parameters for the inversion of transformer winding hotspot temperature.

Inversion Method
In this section, based on the advantages of the mutual information (MI), Self-Attention (SA), and Gated Recurrent Unit (GRU) algorithms [14][15][16], we propose a transformer winding hotspot temperature inversion method.By leveraging the MI, SA, and GRU algorithms, we aim to enrich the information content of the inversion input features while eliminating feature redundancy, thereby ensuring good inversion results.
The specific data collection, training, and inversion process are illustrated in Figure 3.
To reduce the top oil temperature, oil is typically circulated into coolers for cooling.It is evident that the cooler is the most crucial device for dissipating internal heat in the transformer.Therefore, the variations in temperatures at the outlet and inlet of the cooler are closely related to the changes in the transformer winding hotspot temperature.
Hence, this paper considers the top oil temperature, ambient temperature, temperatures at the outlet and inlet of the cooler, tank temperature, and the temperature of the tank at the previous time step as feature parameters for the inversion of transformer winding hotspot temperature.

Inversion Method
In this section, based on the advantages of the mutual information (MI), Self-Attention (SA), and Gated Recurrent Unit (GRU) algorithms [14][15][16], we propose a transformer winding hotspot temperature inversion method.By leveraging the MI, SA, and GRU algorithms, we aim to enrich the information content of the inversion input features while eliminating feature redundancy, thereby ensuring good inversion results.
The specific data collection, training, and inversion process are illustrated in Figure 3. Step 1: Design temperature rise experiments and collect relevant temperature raw data using fiber optic sensors, thermocouples, and infrared thermal imagers.Step 1: Design temperature rise experiments and collect relevant temperature raw data using fiber optic sensors, thermocouples, and infrared thermal imagers.
Step 2: Clean the raw data set by handling abnormal outliers and missing values.Extract temperature information from infrared images to obtain the top oil temperature, ambient temperature, the maximum, minimum, and average temperatures of the tank side and top as input features for inversion, and the winding hotspot temperature as the output feature for inversion.
Step 3: Utilize the SA module to optimize the GRU network and construct the SA-GRU network for inversion.
Step 4: Employ the MI method to select input feature parameters, reducing the dimensionality of input features and obtaining the optimal feature subset for inversion.
Step 5: Divide the dimensionality-reduced dataset into two sequence sets, serving as the training set and test set, respectively, and input them into the SA-GRU network for training and testing.
Step 6: Analyze the inversion results and compare them with the measured results to evaluate the model performance.

Data Collection for Temperature Rise Experiment
In this study, temperature rise tests were conducted on an SFZ-40000/110 oil-immersed 110 kV transformer on-site.The main parameters are shown in Table 1.Through the placement of fiber optic sensors at internal relevant positions, the winding hotspot temperature and top oil temperature are measured, as depicted in Figure 4. Fiber optic probes for measuring the top oil temperature are placed on the wire clamp at the fixed neutral point.Since the hotspot temperature of the high-voltage winding of the transformer is generally higher than that of the low-voltage winding, this study considered the hotspot temperature of the low-voltage winding as the hotspot temperature of the entire winding.According to the literature, the position of the transformer winding hotspot temperature is approximately at 90% of the winding height.Therefore, fiber optic probes are placed at 90% of the low-voltage winding to measure the winding hotspot temperature [17,18].mensionality of input features and obtaining the optimal feature subset for inversion.
Step 5: Divide the dimensionality-reduced dataset into two sequence sets, serving as the training set and test set, respectively, and input them into the SA-GRU network for training and testing.
Step 6: Analyze the inversion results and compare them with the measured results to evaluate the model performance.

Data Collection for Temperature Rise Experiment
In this study, temperature rise tests were conducted on an SFZ-40000/110 oil-immersed 110 kV transformer on-site.The main parameters are shown in Table 1.Through the placement of fiber optic sensors at internal relevant positions, the winding hotspot temperature and top oil temperature are measured, as depicted in Figure 4. Fiber optic probes for measuring the top oil temperature are placed on the wire clamp at the fixed neutral point.Since the hotspot temperature of the high-voltage winding of the transformer is generally higher than that of the low-voltage winding, this study considered the hotspot temperature of the low-voltage winding as the hotspot temperature of the entire winding.According to the literature, the position of the transformer winding hotspot temperature is approximately at 90% of the winding height.Therefore, fiber optic probes are placed at 90% of the low-voltage winding to measure the winding hotspot temperature [17,18].The ambient temperature is obtained by placing thermocouples at the four corners of the transformer, and the temperatures at the inlet and outlet of the cooler are also measured by placing thermocouples near the inlet and outlet.The ambient temperature is obtained by placing thermocouples at the four corners of the transformer, and the temperatures at the inlet and outlet of the cooler are also measured by placing thermocouples near the inlet and outlet.
To increase the information content of the tank temperature, infrared thermal imagers were used to obtain temperature images of the top and side of the tank, which contained temperature information for the entire surface.The temperature information in the thermal images was more abundant, facilitating better analysis of the potential correlation between tank temperature and winding hotspot temperature.Two infrared thermal imagers were used to collect temperature images of the top and side of the tank, as depicted in the layout diagram in Figure 5.
According to the specifications of the temperature rise test in IEC 60076-2 [19], three load levels of 50%, 75%, and 100% were set.Experimental data were collected every 5 min.

Data Preprocessing
The experimental raw data were subjected to data cleaning to remove outliers and missing values.To standardize the format of input feature quantities, temperature image data were converted into temperature sequence data.
tained temperature information for the entire surface.The temperature informati thermal images was more abundant, facilitating better analysis of the potential co between tank temperature and winding hotspot temperature.Two infrared ther agers were used to collect temperature images of the top and side of the tank, as in the layout diagram in Figure 5.According to the specifications of the temperature rise test in IEC 60076-2 [1 load levels of 50%, 75%, and 100% were set.Experimental data were collected ever

Data Preprocessing
The experimental raw data were subjected to data cleaning to remove outl missing values.To standardize the format of input feature quantities, temperatur data were converted into temperature sequence data.
Initially, the temperature images and the coordinates of the positions of the t and top were inputted into the Segment Anything Model (SAM) to segment the t and top parts [20].Subsequently, the temperature matrix corresponding to the r temperature image was obtained, providing the temperature values of all point tank side and top.Finally, the maximum, minimum, and average temperature v each point on the side and top were extracted as their temperature features, the taining the maximum, minimum, and average temperature values of the tank side at each sampling time.
Since four thermocouples were used to measure the ambient temperature du experiment, there were four ambient temperature values at each sampling time.erage value of these four values was taken as the ambient temperature at the curr In addition to the experimental measurements of the temperatures at the i outlet of the cooler and the top oil temperature, a total of 16 feature quantities w tained, including the maximum, minimum, and average temperature values of side and top, the maximum, minimum, and average temperature values of the t and top at the previous time step, ambient temperature, temperatures at the inlet let of the cooler, and top oil temperature, as candidate input feature quantities.Th ing hotspot temperature was considered as the output feature quantity.For ease sentation, each feature was given a name, as shown in Table 2, and some sample presented in Figure 6.The entire data processing process took a week.Initially, the temperature images and the coordinates of the positions of the tank side and top were inputted into the Segment Anything Model (SAM) to segment the tank side and top parts [20].Subsequently, the temperature matrix corresponding to the resulting temperature image was obtained, providing the temperature values of all points on the tank side and top.Finally, the maximum, minimum, and average temperature values of each point on the side and top were extracted as their temperature features, thereby obtaining the maximum, minimum, and average temperature values of the tank side and top at each sampling time.
Since four thermocouples were used to measure the ambient temperature during the experiment, there were four ambient temperature values at each sampling time.The average value of these four values was taken as the ambient temperature at the current time.
In addition to the experimental measurements of the temperatures at the inlet and outlet of the cooler and the top oil temperature, a total of 16 feature quantities were obtained, including the maximum, minimum, and average temperature values of the tank side and top, the maximum, minimum, and average temperature values of the tank side and top at the previous time step, ambient temperature, temperatures at the inlet and outlet of the cooler, and top oil temperature, as candidate input feature quantities.The winding hotspot temperature was considered as the output feature quantity.For ease of representation, each feature was given a name, as shown in Table 2, and some sample data are presented in Figure 6.The entire data processing process took a week.

Feature Selection
Using all feature parameters for inversion may lead to feature redundancy and increase the complexity of the inversion model.Before inversion, conducting a correlation analysis between the input and output feature quantities and selecting feature quantities with a high correlation with the output feature can effectively improve the performance of the inversion model.
The mutual information (MI) algorithm is a commonly used feature selection algorithm used to evaluate the correlation between features and the target variable.It measures the degree of dependence between them by calculating the mutual information between the feature and the target variable, thereby determining the most relevant features.
The basic idea of the MI algorithm is to measure the difference between the joint probability distribution of the feature and the target variable and their respective margina

Feature Selection
Using all feature parameters for inversion may lead to feature redundancy and increase the complexity of the inversion model.Before inversion, conducting a correlation analysis between the input and output feature quantities and selecting feature quantities with a high correlation with the output feature can effectively improve the performance of the inversion model.
The mutual information (MI) algorithm is a commonly used feature selection algorithm used to evaluate the correlation between features and the target variable.It measures the degree of dependence between them by calculating the mutual information between the feature and the target variable, thereby determining the most relevant features.
The basic idea of the MI algorithm is to measure the difference between the joint probability distribution of the feature and the target variable and their respective marginal probability distributions to quantify their correlation.If there is high mutual information between the feature and the target variable, it indicates a high degree of dependence between them, and the feature is useful for predicting the target variable.
The mutual information between two discrete random variables X and Y can be defined as follows [21]: In Equation (12), p(x,y) represents the joint probability distribution of X and Y, while p(x) and p(y) are the marginal probability distribution functions of X and Y, respectively.
In this study, the MI algorithm was utilized to select features among sixteen input features, obtaining the optimal feature subset that maximizes the accuracy of inversion.The feature selection process is illustrated in Figure 7.
The specific steps are as follows: (1) data preprocessing: determine sixteen parameters including environmental temperature, the temperatures of the radiator outlet and inlet, the maximum temperature of the oil tank side surface, etc., as candidate input feature parameters; (2) calculate the mutual information (MI) values between the candidate input Sensors 2024, 24, 4734 9 of 18 feature parameters and winding hotspot temperature separately; (3) arrange candidate input feature parameters in descending order of MI values; (4) select 1-16 input features according to the order and input them into the inversion network for training, recording the inversion errors corresponding to the different numbers of input features; (5) select the combination of input features corresponding to the minimum inversion error as the optimal feature set for the inversion target.
between the feature and the target variable, it indicates a high degree of dependence between them, and the feature is useful for predicting the target variable.
The mutual information between two discrete random variables X and Y can be defined as follows [21]: In Equation ( 12), p(x,y) represents the joint probability distribution of X and Y, while p(x) and p(y) are the marginal probability distribution functions of X and Y, respectively.
In this study, the MI algorithm was utilized to select features among sixteen input features, obtaining the optimal feature subset that maximizes the accuracy of inversion.The feature selection process is illustrated in Figure 7.The specific steps are as follows: (1) data preprocessing: determine sixteen parameters including environmental temperature, the temperatures of the radiator outlet and inlet, the maximum temperature of the oil tank side surface, etc., as candidate input feature parameters; (2) calculate the mutual information (MI) values between the candidate input feature parameters and winding hotspot temperature separately; (3) arrange candidate input feature parameters in descending order of MI values; (4) select 1-16 input features according to the order and input them into the inversion network for training, recording the inversion errors corresponding to the different numbers of input features; (5) select the combination of input features corresponding to the minimum inversion error as the optimal feature set for the inversion target.

GRU-SA Inversion Network
The GRU (Gated Recurrent Unit) [22] is a variant of recurrent neural networks (RNNs) that can be utilized for processing sequential data such as text, speech, and time series.It was proposed by Cho et al. in 2014 [22].Compared to traditional RNN models, GRU exhibits stronger modeling capabilities and better gradient propagation properties.

GRU-SA Inversion Network
The GRU (Gated Recurrent Unit) [22] is a variant of recurrent neural networks (RNNs) that can be utilized for processing sequential data such as text, speech, and time series.
was proposed by Cho et al. in 2014 [22].Compared to traditional RNN models, GRU exhibits stronger modeling capabilities and better gradient propagation properties.The GRU network enables bidirectional information propagation between layers, allowing for the persistence of information, thus endowing the network with long-term memory capabilities.Figure 8 illustrates the specific structure of the GRU [23].
Sensors 2024, 24, 4734 10 of 19 The GRU network enables bidirectional information propagation between layers, allowing for the persistence of information, thus endowing the network with long-term memory capabilities.Figure 8 illustrates the specific structure of the GRU [23].From the figure, it can be observed that the GRU is equipped with two gates: the reset gate rt and the update gate zt, both of which are used for information filtering.Firstly, the GRU will calculate the reset gate and update gate based on the current input and the previous hidden state.Then, the GRU will update the memory unit based on the gate control signal.The reset gate rt controls how many previous hidden states should be forgotten, From the figure, it can be observed that the GRU is equipped with two gates: the reset gate r t and the update gate z t , both of which are used for information filtering.Firstly, the GRU will calculate the reset gate and update gate based on the current input and the previous hidden state.Then, the GRU will update the memory unit based on the gate control signal.The reset gate r t controls how many previous hidden states should be forgotten, and if the reset gate is close to 0, it ignores the previous information and it only focuses on the current input.If the value of the reset gate is close to 1, the previous information is retained and combined with the current input information.While the update gate z t determines how much new information should be retained and is combined with the input signal to generate the candidate hidden state, if the value of the update gate is close to 0, then the previous information is retained and used as the new memory unit.If the value of the update gate is close to 1, it means that the previous information is replaced with the current input information as the new memory unit.The update gate z t then filters and determines how much of the candidate hidden state ĥt and the previous state h t−1 should be preserved, thereby deciding the amount of past state information retained in the current state h t .Finally, the GRU outputs the hidden state of the current moment based on the updated memory unit.The entire process is formulated as follows [24]: (1) Update Gate: (2) Reset Gate: (3) Filtering and Memory of Previous Information and Current Input to Obtain Hidden State: (4) Update Memory to Obtain Current State: Whereas w and b represent the weights and biases of the network, f (•) and tanh denote the activation functions, and ⊙ refers to the Hadamard product.
Although GRU can handle sequential data and capture dependencies, it is relatively weaker in understanding long-range dependencies due to the lack of a Self-Attention mechanism.Additionally, the model's expressive capacity of GRU is relatively limited, leading to deficiencies in handling complex tasks.To address these issues, this paper introduces SA to optimize the GRU.
SA weighs each element in the input sequence and uses the weighted results as feature representations to capture the correlations between different positions in the sequence.The computational formula for the Self-Attention mechanism is as follows [25][26][27]: in which, Q, K, and V represent the query vector, key vector, and value vector, respectively; softmax indicates the normalization of the attention weights; and d k represents the dimension of the vectors.The network structure after introducing SA is shown in Figure 9, and this network is referred to as the GRU-SA network.
The input sequence undergoes feature extraction via multiple GRU layers, followed by global contextual modeling through the SA layer.Finally, a linear layer is applied for transformation.The input dimension of the linear layer is the same as the hidden layer dimension in the GRU module, with an output dimension of 1.This effectively captures the temporal information, dependency, and global semantics in the sequence, yielding the final output.
tively; softmax indicates the normalization of the attention weights; and dk rep dimension of the vectors.
The network structure after introducing SA is shown in Figure 9, and this referred to as the GRU-SA network.The input sequence undergoes feature extraction via multiple GRU layer by global contextual modeling through the SA layer.Finally, a linear layer is transformation.The input dimension of the linear layer is the same as the hi dimension in the GRU module, with an output dimension of 1.This effective the temporal information, dependency, and global semantics in the sequence, y final output.

Evaluation Metrics for Inversion Model
In order to reflect the dispersion of errors, the mean squared error (MSE) The mean absolute error (MAE) is employed to measure the average error.The of determination (R 2 ) and the mean absolute percentage error (MAPE) are use the fitting degree of the model.To comprehensively evaluate the performa model, this study employs these four metrics: MSE, MAE, and MAPE shou mized, while R 2 should be maximized for better model performance.The com formulas are represented in Equations ( 18)-( 21) [28,29]: Illustrates the architecture of the GRU-SA network.

Evaluation Metrics for Inversion Model
In order to reflect the dispersion of errors, the mean squared error (MSE) is utilized.The mean absolute error (MAE) is employed to measure the average error.The coefficient of determination (R 2 ) and the mean absolute percentage error (MAPE) are used to assess the fitting degree of the model.To comprehensively evaluate the performance of the model, this study employs these four metrics: MSE, MAE, and MAPE should be minimized, while R 2 should be maximized for better model performance.The computational formulas are represented in Equations ( 18)-( 21) [28,29]: Here, n denotes the number of samples, and x(i), x(i), and x(i) represent the actual value, predicted value, and mean value of the i sample, respectively.

Experimental Validation
Collect a set of data every 5 min according to the method in Section 3.1.After data preprocessing using the method mentioned in Section 3.2, a total of 471 sets of data were obtained.Among the 10 sets of data collected every 50 min, take the first set as the test set, that is, divide all data into a training set and a testing set at a ratio of 9:1.Within the training set, 20% is randomly selected as the validation set.To enhance the training efficiency of the model, all data are normalized.
The experimental setup included an Intel Core i5-7400 CPU and Windows 10 operating system.The program was written in Python, utilizing the PyTorch (https://pytorch.org/accessed on 17 July 2024) deep learning framework, with Python version 3.8.

Consideration of Lagged Data for Inversion Effectiveness
To demonstrate the inversion of the transformer winding hotspot temperature considering the lagged effect of the oil tank temperature, two sets of input feature parameters are employed for inversion experiments in this section: one set does not consider the temperature of the tank at the previous time step, comprising ten input features; the other set considers the temperature of the tank at the previous time step, comprising a total of sixteen input features.Table 3 presents the error analysis results for the two sets of experiments.From Table 3, it can be observed that when considering the lagged effect of the oil tank surface temperature, the inversion curve becomes closer to the actual curve, and the error metrics also improve.Specifically, MSE, MAE, R 2 , and MAPE increase by 47.83%, 0.76%, 1.84%, and 3.44%, respectively, indicating better model performance.This improvement is attributed to the enrichment of information features in the input features after considering the lagged temperature data, which establishes potential associative relationships between these information features and the inversion target parameters.By incorporating this information into the model's input features, the model can better utilize such associative relationships and learn more accurate and effective feature representations.Consequently, the model can better understand and capture both local and global features in the input data, thereby accomplishing the inversion task more effectively.
Thus, incorporating the lagged effect of the oil tank surface temperature into the input features significantly enhanced the performance of the inversion model for predicting transformer winding hotspot temperatures, providing more accurate and reliable inversion results.

Feature Dimensionality Reduction
To determine which input features have greater correlation with the winding hotspot temperature, the mutual information (MI) method is employed to calculate the correlation between the input features and the output feature.The MI values of the 16 input features with the winding hotspot temperature are illustrated in Figure 10.

Feature Dimensionality Reduction
To determine which input features have greater correlation with the winding hot temperature, the mutual information (MI) method is employed to calculate the correla between the input features and the output feature.The MI values of the 16 input feat with the winding hotspot temperature are illustrated in Figure 10.Each feature contributes differently to the improvement of model performanc order to reduce the redundancy of input features and the complexity of model trai while ensuring model accuracy, the GRU-SA network is employed as the inversion work according to the order calculated by the MI.The inversion errors of models  Each feature contributes differently to the improvement of model performance.In order to reduce the redundancy of input features and the complexity of model training while ensuring model accuracy, the GRU-SA network is employed as the inversion network according to the order calculated by the MI.The inversion errors of models with different numbers of input features are computed.Figure 11 illustrates the relationship between the inversion errors and the number of input features.Each feature contributes differently to the improvement of model performance.In order to reduce the redundancy of input features and the complexity of model training while ensuring model accuracy, the GRU-SA network is employed as the inversion network according to the order calculated by the MI.The inversion errors of models with different numbers of input features are computed.Figure 11 illustrates the relationship between the inversion errors and the number of input features.From Figure 9, it can be observed that for the inversion task of winding hotspot temperature, selecting the first 13 input features yields optimal results.Therefore, the optimal feature set for winding hotspot temperature is as follows: To validate the effectiveness and superiority of the MI method for feature dimensionality reduction, a comparison is made between the MI method, the no-feature selection method, and the Spearman method [30].The inversion error results for different optimization methods are presented in Table 4.  From Figure 9, it can be observed that for the inversion task of winding hotspot temperature, selecting the first 13 input features yields optimal results.Therefore, the optimal feature set for winding hotspot temperature is as follows:

RO, TOT, RI, T-AVE, T-MAX, T-AVE-P, T-MAX-P, AT, B-MAX, T-MIN, T-MIN-P, B-AVE, B-MAX-P
To validate the effectiveness and superiority of the MI method for feature dimensionality reduction, a comparison is made between the MI method, the no-feature selection method, and the Spearman method [30].The inversion error results for different optimization methods are presented in Table 4. From Table 4, it can be observed that compared to the method without feature selection, the RMSE, MAE, R 2 , and MAPE metrics are improved by 12.5%, 4.9%, 0.25%, and 6.01%, respectively, when using the MI method.Compared to the Spearman method, the respective metrics are improved by 11.27%, 3.4%, 0.22%, and 3.81%.
The MI method considers the non-linear relationships between features when selecting features, which enables it to effectively capture the complex associations between input features and target variables.Therefore, after using the MI method for feature selection, the model can conduct inversion more accurately, resulting in higher inversion accuracy.
The inversion results using the optimal feature set are illustrated in the following figure .From Figure 12, it can be observed that the trend of the inversion value is very close to the actual value.Hence, the MI method can eliminate redundancy in input features, further enhancing model performance.

Parameter Optimization of the Model
Through multiple experiments, it was found that the parameters hidde num layers in the model had a significant impact on the model's performan size defines the size of the hidden layer in the model, which determines the m resentational and learning abilities.A larger hidden size can increase the net pressive power but also increases the model's complexity and computational layers defines the number of layers in the GRU unit of the model.Increasing t of layers can enhance the model's expressive power, enabling it to better captu resent complex sequence patterns and features, but it also increases the model' ity and training time.
To find the optimal parameter combination, the particle swarm optimiz rithm [31] was employed to optimize the parameters of the GRU-SA model, wi imum MSE value as the optimization target.The range was set to [32, 128] for h and [1,4] for num layers.The optimization results are shown in Figure 13.

Parameter Optimization of the Model
Through multiple experiments, it was found that the parameters hidden size and num layers in the model had a significant impact on the model's performance.Hidden size defines the size of the hidden layer in the model, which determines the model's representational and learning abilities.A larger hidden size can increase the network's expressive power but also increases the model's complexity and computational cost.Num layers defines the number of layers in the GRU unit of the model.Increasing the number of layers can enhance the model's expressive power, enabling it to better capture and represent complex sequence patterns and features, but it also increases the model's complexity and training time.
To find the optimal parameter combination, the particle swarm optimization algorithm [31] was employed to optimize the parameters of the GRU-SA model, with the minimum MSE value as the optimization target.The range was set to [32,128] for hidden size and [1,4] for num layers.The optimization results are shown in Figure 13.
rithm [31] was employed to optimize the parameters of the GRUimum MSE value as the optimization target.The range was set to and [1,4] for num layers.The optimization results are shown in F From the above results, it can be observed that when settin model performance reaches its optimum.
To further demonstrate the superiority of the proposed met parative experiments were conducted with existing typical met LSTM, and the original GRU.To ensure the fairness of model c feature set obtained by the MI algorithm was used as the input fea swarm algorithm was used to optimize the parameters of each m sults are shown in Figure 14, and the inversion errors of each mod From the above results, it can be observed that when setting 1 to 64 and 2 to 2, the model performance reaches its optimum.
To further demonstrate the superiority of the proposed method in this paper, comparative experiments were conducted with existing typical methods, namely the RNN, LSTM, and the original GRU.To ensure the fairness of model comparison, the optimal feature set obtained by the MI algorithm was used as the input feature set, and the particle swarm algorithm was used to optimize the parameters of each model.The inversion results are shown in Figure 14, and the inversion errors of each model are shown in Table 5.
Sensors 2024, 24, 4734  It can be observed that compared with other typical methods, the inversion re the proposed method in this paper exhibit the smallest deviation from the actual enabling better inversion of the target parameters.From Figure 14, it can be seen th pared with other typical methods, the inversion value of this method and the actu  It can be observed that compared with other typical methods, the inversion results of the proposed method in this paper exhibit the smallest deviation from the actual values, enabling better inversion of the target parameters.From Figure 14, it can be seen that compared with other typical methods, the inversion value of this method and the actual value had the closest trend, indicating that the proposed method can more accurately invert the trend of winding hot spot temperature changes.Compared with the RNN method, the proposed method in this paper showed improvements of 79.32%, 35.51%, 5.12%, and 32.27% in terms of the MSE, MAE, R 2 , and MAPE, respectively.Compared with the LSTM and GRU, the proposed method showed improvements of 84.2%, 41.95%, 7.31%, 3.95%, and 61.04%, 24.37%, 2.05%, and 19.38% for each indicator, respectively.
In order to compare the training and evaluation time of different models, train and evaluate each model optimized by the parameters using the optimal feature set.The time required for each model is shown in Table 6.According to Table 6, the training and evaluation time for each model was within 10 s, and the speed was relatively fast.Relatively speaking, the GRU-SA proposed in this article took a shorter amount of time and had higher accuracy than the LSTM and GRU.Although the GRU-SA took slightly longer than the RNN, it could greatly improve inversion accuracy with a slight sacrifice of speed.
Compared with traditional models, when using the proposed method in this paper for sequence modeling tasks, the model demonstrated stronger understanding and modeling capabilities of input sequences, better handling of long-term dependencies, nonlinear relationships, and information loss issues, resulting in higher performance in sequence modeling tasks.Therefore, the transformer winding hotspot temperature inversion method proposed in this paper based on the GRU-SA can achieve better inversion results while reducing model redundancy and complexity.
In order to compare and analyze the trend of changes in the actual and inverted values of hot spot temperatures, this study calculated the temperature difference at each time interval of the two temperature values, as shown in Figure 15.
Sensors 2024, 24, 4734 17 of 19 modeling tasks.Therefore, the transformer winding hotspot temperature inversion method proposed in this paper based on the GRU-SA can achieve better inversion results while reducing model redundancy and complexity.
In order to compare and analyze the trend of changes in the actual and inverted values of hot spot temperatures, this study calculated the temperature difference at each time interval of the two temperature values, as shown in Figure 15.From Figure 15, it can be seen that the trend of actual values and inversion values was basically consistent, indicating that the method proposed in this paper can accurately invert the trend and temperature values of winding hot spot temperature changes.From Figure 15, it can be seen that the trend of actual values and inversion values was basically consistent, indicating that the method proposed in this paper can accurately invert the trend and temperature values of winding hot spot temperature changes.

Conclusions
This paper proposed a method for transformer winding hotspot temperature inversion, from which the following four conclusions are drawn: (1) Considering the lagged data of the oil tank temperature could enrich the information content of the data and more accurately invert the transformer winding hotspot temperature.(2) Using the MI algorithm for feature selection and dimensionality reduction of input features could reduce model redundancy and complexity, thereby improving the inversion accuracy of the model.(3) The GRU-SA inversion network proposed in this paper, by introducing SA into the GRU, enabled the network to better capture the correlation between different positions in the sequence data, and compared with traditional networks, the GRU-SA exhibited better performance in inversion tasks.(4) The effectiveness of the proposed method was validated through field temperature rise experimental data of a 110 kV transformer.
In future research, more environmental factors can be considered to study the hot spot temperature inversion method for transformers with more complex operating environments.

Figure 1 .
Figure 1.Thermal circuit model of transformer.

Figure 1 .
Figure 1.Thermal circuit model of transformer.

Figure 3 .
Figure 3. Flowchart of the inversion method.

Figure 3 .
Figure 3. Flowchart of the inversion method.

Figure 4 .
Figure 4. Layout of fiber optics: (a) fiber for measuring top oil temperature; (b) fiber for measuring winding hotspot temperature.

Figure 4 .
Figure 4. Layout of fiber optics: (a) fiber for measuring top oil temperature; (b) fiber for measuring winding hotspot temperature.

Figure 5 .
Figure 5. Layout diagram of infrared thermal imagers: (a) top of the tank; (b) side of the ta

Figure 5 .
Figure 5. Layout diagram of infrared thermal imagers: (a) top of the tank; (b) side of the tank.

Figure 6 .
Figure 6.Sample of experimental data.

Figure 6 .
Figure 6.Sample of experimental data.

Figure 7 .
Figure 7. Flowchart of feature selection process.

Figure 7 .
Figure 7. Flowchart of feature selection process.

Figure 9 .
Figure 9. Illustrates the architecture of the GRU-SA network.

Figure 10 .
Figure 10.MI calculation results between input features and winding hotspot temperature.

Figure 10 .
Figure 10.MI calculation results between input features and winding hotspot temperature.

Figure 10 .
Figure 10.MI calculation results between input features and winding hotspot temperature.

Figure 12 .
Figure 12.Inversion results with optimal feature set.

Figure 13 .
Figure 13.MAE values of the inversion model under different paramete

Figure 13 .
Figure 13.MAE values of the inversion model under different parameter settings.

Figure 14 .
Figure 14.Comparison of results from different inversion methods.

Figure 14 .
Figure 14.Comparison of results from different inversion methods.

Figure 15 .
Figure 15.Trend of changes in actual and inversion values of hot spot temperatures.

Figure 15 .
Figure 15.Trend of changes in actual and inversion values of hot spot temperatures.

Table 3 .
Model errors with different input features.

Table 4 .
Comparison of errors with different optimization methods.

Table 4 .
Comparison of errors with different optimization methods.

Table 5 .
Performance comparison of different inversion methods for hotspot temperature in

Table 5 .
Performance comparison of different inversion methods for hotspot temperature inversion.

Table 6 .
Training and evaluation time for each model.