Application of machine learning in determining and resolving state estimation anomalies in power systems

The state estimation (SE) process is one of the most important and efficient tools in achieving this goal. However, the occurrence of anomalies in the power grid, such as false data injection (FDI), can significantly impact the accuracy of the SE results. FDI results in reduced accuracy of SE results, potentially putting the power system in a critical situation. This paper addresses the limitations of simple residual-based algorithms in detecting FDI and proposes a scenario that utilizes machine learning (ML) models such as auto-encoder (AE), long short-term memory auto-encoder (LSTM AE), and 1-D convolutional neural network auto-encoder (1-D CNN AE) algorithms. The proposed method aims to offer a more effective FDI detection approach in the power grid, exhibiting higher detection accuracy. Also, the paper introduces the LSTM variational auto-encoder (LSTM VAE) algorithm for reconstructing anomalous data. By utilizing the LSTM VAE, anomalous data can be transformed to closely resemble the original data with an acceptable level of accuracy. Moreover, in critical situations involving FDI, the power system can maintain normal operation by employing the proposed method to reconstruct anomalous data. Finally, the performance of the presented methods is evaluated on the IEEE 14-bus, 30-bus, and 118-bus test networks. The results are then presented and discussed to demonstrate the effectiveness of the proposed method.


Introduction
UNLIKE conventional power grids, smart power grids with new communication infrastructure and information technology could increase the power production capacity of renewable energy resources and the network reliability.However, the presence of stable and reliable electricity is necessary in the new power grid structure [1], [2].Generally, a smart grid is an electrical grid that is integrated with a data communication network (i.e., a cyber-physical system) where the necessary data for operating the power grid is collected and analyzed in real-time [3]- [4].On the other hand, meeting the increasing demand for reliable and economical electricity services requires real-time management and keeping track of electrical grid operations [5].
One approach to effectively control and monitor the power grid involves the use of state estimation process (SEP) [6][7][8].SEP allows for the estimation of key system state variables such as voltage amplitudes and phase angles.These variables play a crucial role in the operation of the energy management system (EMS) [9], which in turn impacts various factors within the electrical grid, including economic dispatch and contingency analysis.To gather operating data within power grids, smart meters and remote terminal units (RTUs) are installed on buses and power grid lines.These devices facilitate the collection of system operating data.In the context of electrical grids, the supervisory control and data acquisition (SCADA) system acts as a communication channel between the RTUs and the control center (CC).Its primary function is to enable real-time monitoring, management, and tracking of the electrical grid.Once the necessary data is collected through SCADA, the SEP can be implemented within the CC.This integration allows for efficient control and monitoring of the power grid.
A communication network that transmits data between the control center (CC) and smart meters increases cyber threats in the power grid [10]- [11].There are several cyber threats against the power grid, such as load alteration [12], topology [13], line disconnection [14], and false data injection (FDI) attack.However, because FDI has a detrimental impact on the electrical network performance, considerable efforts have been devoted in the literature to detect this kind of cyber threat.[15].Liang et al. are pioneers in introducing the FDI [16].The purpose of the FDI implementation is to modify the state estimation (SE) variables by manipulating the collected data in RTUs [17]- [18].If the FDI is successfully executed, it could result in power outages and system blackouts [19] In addition to the defective performance of electrical grids, such as inadequate prices in the electricity market, there is also the issue of system impermanence [15,17,18].Furthermore, as shown in [16], the conventional bad data detection (BDD) mechanism can be bypassed by an unobservable FDI.According to [20], This issue can lead to power transmission overload.The implementation of FDI requires power grid information, and in this study, we assume that all network information is available.Other works, such as [21], have proposed self-attention generative adversarial models to bypass the need for network information in FDI detection.
To counter FDI on a power system, various solutions can be implemented.One effective approach involves enhancing the security of smart meters to safeguard against FDIs.Additionally, protecting the information related to smart meters and network topology is crucial when implementing FDIs on the power grid, as it helps mitigate potential threats.If the aforementioned solutions prove inadequate in preventing FDIs, it becomes necessary to explore methods for identifying these threats.In previous research, several defensive approaches have been extensively studied, focusing on the detection of FDIs using the linear DC formulation of the SEP.Researchers in [22] proposed the use of Recurrent Neural Networks for FDI detection in the DC SEP.Researchers in [23] introduced the implementation of the Kalman filter to protect the DC SEP against FDIs.Another study [24], evaluated the effectiveness of a deep learning (DL) model for FDI detection in power grids.Furthermore, [25] and [26] proposed FDI detection mechanisms based on an auto-encoder (AE) neural network and a robust state estimator constructed using variational auto-encoder (VAE), respectively, in the DC SEP.However, it is important to consider that practical power grids typically employ the AC SEP, which is employed to manage and monitor electrical grids.Consequently, the aforementioned FDI detection approaches may not yield satisfactory performance in such scenarios.
In the detection of FDIs in power networks, [27] employs the kullback-leibler distance (KLD).The KLD measures the dissimilarity between two probability distributions.A higher numerical value of KLD indicates that the probability distribution of anomalous data differs significantly from the historical data, thus indicating the presence of FDIs.However, the accuracy of this method relies on choosing an appropriate threshold value to separate the probability distributions.Alternatively, [28] Introduces a transformation-based strategy for detecting FDIs in power grids, this approach utilizes techniques from image processing that are designed to enhance image quality.The method presented in [28] demonstrates superior accuracy compared to the approach proposed in [27] for FDIs detection.
The utilization of machine learning (ML) and DL algorithms in detecting anomalies, such as FDI, within the electrical grid, has become more widespread.This can be attributed to the advancements in ML and DL algorithms, as well as the increased availability of smart meters and operational data.In [29], the application of a support vector machine (SVM) is explored for FDI detection, while [30] investigates its use for anomaly detection within the network.Additionally, [31] conducts a comparative study to evaluate the performance of FDI detection approaches based on SVM, artificial neural network (ANN), and extreme learning machines.However, it is important to note that these methods are supervised or semi-supervised learning approaches, which necessitates the availability of labeled data for their application in power networks.Obtaining labeled data for training ML models in practical power systems can be prohibitively expensive [32].Furthermore, in the system, the normal data significantly outweighs the labeled anomalous data.Therefore, methods that rely solely on labeled anomalous data for training may not perform effectively compared to techniques that are based on normal data.The generation of anomalous data incurs a high cost, and it is improbable that all possible states resulting from FDI are covered by the available anomalous data points within the training set.As a result, supervised models may struggle to effectively distinguish between different intensities and impacts of FDIs on the system.
In [33], a unsupervised learning method known as AE is introduced for the purpose of detecting FDIs in power networks.One of the key advantages of unsupervised learning techniques is that they solely rely on normal data for training, eliminating the need for labeled anomalous data.These methods have the capability to detect various anomalies without requiring specifically tagged data during the training process.Another notable approach in anomaly detection is the utilization of long short-term memory auto-encoder (LSTM AE), as demonstrated in [34], specifically for multivariate time series analysis.It should be noted that supervised or semi-supervised learning methods necessitate the presence of labeled data for both normal and anomalous instances.However, acquiring such labeled data can be a costly endeavor.Furthermore, in [35], a comprehensive summary of cyber-physical threat identification and mitigation schemes within the smart grid domain is presented.This summary involves an in-depth review of cutting-edge research pertaining to the subject matter.Consider the scenarios where FDI is implemented on the power system or when the existing data, particularly the state variables, deviate from their original values due to various circumstances.In such situations, the utility would need an effective tool to ensure the continuous and reliable operation of the electrical grid.In most studies that propose methods to detect FDI, it is suggested to replace anomalous data with available historical data after detecting the FDI.However, it is worth noting that replacing anomalous data with historical data in critical situations can be challenging, particularly when the precise time of FDI implementation is unknown.In other words, once FDI is implemented, the power system enters a critical situation where there is no evidence to determine the appropriate historical time interval for replacing the anomalous data.Consequently, using exact data from a specific historical time can create operational challenges for the power system.In this regard, in [36] a reconstruction of measurement anomalous data using deep denoising AE is proposed.Additionally, in [37] a new operating state reconstruction scheme to automatically filter out possible cyber threats in smart grid is proposed.To address this issue, we employ the long short-term memory variational auto-encoder (LSTM VAE) in this study.It will be demonstrated that by implementing the LSTM VAE, anomalous data can be transformed to closely resemble the original data with a satisfactory level of accuracy.The transformed data, referred to as operating data, can then be utilized for system management.In [38], the AE is employed to reconstruct data that cannot be transmitted to the CC due to measurement device failures in the power grid.The key distinction between AE and VAE is that AE learns a latent space representation of the data, while VAE learns a distribution of the latent space representation.In other words, utilizing LSTM VAE can generate output data of higher quality.By generating new data samples from the probability distribution in the decoder, data that closely resembles the input data can be produced.Furthermore, LSTM can capture long-term dependencies between time steps of the data.The major contributions of this paper are to: It is noteworthy that several case studies are simulated on the IEEE 14-bus, 30-bus, and 118-bus test systems to evaluate the performance of the proposed methods for FDI detection and anomalous data reconstruction on these test systems.
In the first step, we simulated the AC SEP model in a normal state to collect normal data.Then, we implemented FDI on the AC SEP model and collected the anomalous data.In the next step, we trained the models (AE, LSTM AE, 1-D CNN AE, one-class SVM) using the normal data.After the training process, we evaluated the ability of the proposed method to detect FDI using the anomalous data, employing the proposed methodology.Additionally, we trained the LSTM VAE with the normal data and utilized it to reconstruct the anomalous data.
The remainder of this paper is structured as follows.Section II presents the SEP model in AC power systems and FDI modeling.Section III provides details on the methods and algorithms utilized in this study.In section IV, the results of FDI implementation on the power system are presented, along with the proposed method for FDI detection.Section V presents the results of the proposed methods for detecting FDI.Finally, sections VI and VII include the scenario introduced for reconstructing anomalous data and the corresponding results, respectively.

SEP with AC power flow model
Energy management functions such as economic dispatch, load forecasting, and optimal power flow depend on obtained information from power system SEP [39].In a power system with N bus buses and m smart meters, the non-linear relationship between the SCADA system smart meter data vector and the system state variables is obtained from the following equation [6]: where T is the measurement vector obtained from the SCADA system, which contains the active/reactive power injections in network buses, and power flows in transmission lines. Moreover, T is the system state vector, including N bus voltage magnitudes (V) and N bus − 1 phase angles (θ).Note that the θ on the slack bus is always zero.Also, e SE is the vector of measurement errors indicating the measurement error of all smart meters.This can be modeled by a Gaussian distribution, with the mean value of zero and the standard deviation equal to one-third of a given percentage of maximum error about the mean value (99.7% coverage factor).R is a diagonal matrix consisting of the variance of the error of the smart meters.Furthermore, h(x SE ) is a nonlinear function that models the dependency between the state variables and the measured data [40].The weighted least squares (WLS) method is used to obtain the power grid state variables.In this regard, WLS minimizes the objective function in (2) [28,41]: where σ 2 i is the variance of the i th smart meter error.State variables are resulted through the following equation: [42] After finishing the SEP, the BDD process is implemented by the Chi-Square test [6] [28], according to the following equation: According to the formula, a bad data state can be detected when the value of J(x SE ) is greater than τ.τ is a threshold value that models the limits of the system operation under normal state.

FDI and BDD
BDD techniques are used to detect faulty measurements and bad data in the AC SEP system [43].These techniques are, based on the residual value result from the following equation: In a normal state of the system operation, the estimated variable values and the actual values are almost equal.Therefore, in this case, the values resulting from the function h(x SE ) must be adjacent to the measured values.On the other hand, when r is greater than the threshold value, it means that the measured values are different from the value of the function h(x SE ) and, as a result, there are bad data or faulty measurements in the electrical grid.
To perform FDI, it is assumed that the hacker has complete information on the power system historical data and transmission line parameters such as system admittances and topology information [44]- [45].If FDI happens in the system, the corresponding residual from the FDI is as below: where z SE bad = z SE +a and x SE bad = x SE +c denote the manipulated measurements and state variables.The FDI can pass BDD if a = h(x SE bad ) − h(x SE ) [46].Nevertheless, if we consider a model of FDI that aims to alter the real/reactive power of bus 5 in the IEEE 14 bus test system depicted in Fig. 1, the hacker must possess knowledge of the state variables of buses 1, 2, 4, and 6, which are connected to bus 5.Under these circumstances, BDD will no longer detect anomalies, and it will be necessary to provide methods to diagnose anomalies.Fig. 1.IEEE 14-bus test system.

Proposed detection methodology
In this section, the FDI detection scenario is introduced.

New methods for threshold value determination and FDI detection
To detect FDI using the unsupervised learning algorithm, it is necessary to consider a threshold value to be able to detect anomalous data based on the amount of detection error between the input and output data.References [33] and [47] provide methods for determining this threshold value.
Therefore, in this study, we introduce a new method through which the threshold value can be determined for the detection of normal and anomalous data.To determine the threshold value, we utilized the mean absolute error (MAE) during the training process.
To detect FDI, as the first step, one of the unsupervised learning algorithms (AE, LSTM AE, or 1-D CNN AE) is trained using all regular input data.This ensures that the cost reaches its lowest value during the training process.Once the training process is complete, the MAE between the input and output is calculated for all regular data.The largest value of MAE is then set as the threshold value.Once the algorithm receives a new input, the MAE between the input and output of the algorithm is subtracted from the mean error obtained during the first 50 epochs of the training process.This resulting value is then divided by the standard deviation (SD) of the error during those first 50 epochs.Finally, the FDI is detected if the resulting value exceeds the specified threshold.
According to Algorithm 1, there are three factors for determining whether the data is normal or abnormal.The first factor is the threshold value, which is determined based on the MAE value of the algorithm's inputs and outputs.The second and third factors are the average and SD of the 50 losses per epoch during the training process.The following sections discuss the validation of the defined method through FDI detection using AE, LSTM AE, and 1-D CNN AE in the proposed Algorithm 1.

Experimental settings
In this section, the basic concepts of the methods used in this study to detect FDI, FDI implementation, FDI detection method using proposed detection methodology, evolution metrics, and architecture details of models are described.

AE
AE is an unsupervised learning algorithm that consists of two main parts: an encoder that maps the input data into the latent space and reduces the dimensionality of the input data, and a decoder that maps the data from the latent space to the output.Encoder: The g φ (x input ) function maps the input layer to a hidden layer (latent space) as follows: In this formulation, w and b represent the weight matrix and bias vector, respectively.Note that φ = {w,b}, w ∈ R n×m , and b ∈ R n×1 ; where m and n represent the number of input data and hidden units, respectively.Moreover, x input k ∈ R m×1 is the k th vector of the input samples x input ∈ R m×Ns , and k ∈ {1, 2, …, N s }, where N s is the number of input samples.Finally, ϕ(.) is an activation function.
Decoder: The f θ (z) function maps back the hidden latent space z to a reconstruction x recon. in the input space, i.e., x recon.= f θ (z).This mapping can be shown by the following equation: Optimal parameters of the AE such as φ = {w, b}, and θ = {w′, b′}, obtained by minimizing the mean square error between x input and x recon.using the backpropagation process.
where N s is the number of input samples.Furthermore, x input and x recon.denote the input samples and reconstructed samples, respectively.Finally, Fig. 2 shows the AE algorithm.

VAE
VAE is an unsupervised learning algorithm.The difference between AE and VAE is that AE learns a latent space representation of the data, while VAE learns a distribution of the latent space representation.Therefore, the loss function in VAE consists of two parts: the mean squared error to minimize reconstruction error and another part modeled with KLD to ensure the compressed latent variable follows a Gaussian distribution [48].KLD is a method used to compare distributions.Previously, VAE, like AE, was used for various applications such as natural language processing, image forecasting, and speech recognition [49] [50] [51].
In the VAE modeling, the encoder maps high-dimensional inputs x to the low-dimensional latent space layer z, and the decoder is a generative model that wants to produce new M data (x recon.).This M new data is similar to the original M data (x) and has the same distribution of the original input data p(x) ≈ p(x recon.).Note that x recon. is generated by the decoder from a continuous random latent and variable z represents the structure behind the data.The marginal likelihood p φ (x)can be calculated by: Where φ represents the parameter of the generative model; p(z) is the distribution associated with z.In this paper, p(z) is modeled with the standard normal distribution, i.e., p(z) ∼ N(0,1).The goal is to minimize Fig. 2. Architecture of an AE.
the error between x and x recon., which is equivalent to maximizing the probability distribution of p φ (x).
Using Bayes' rule [44]: Since the continuous random latent variable zand the parameter φ are unknown, the p φ (z|x) cannot be determined analytically.In VAE, the encoder can be modeled with q θ (z|x) which is a neural network with θ parameters.
The marginal likelihood can be written as below [52]: The first term, D KL (q θ (z|x (k) )||p φ (z|x (k) )), is always positive.Hence the second term L(φ, θ, x (k) ) is called the "lower bound" on the marginal likelihood of the data-point i and can be written as [52]: As a result, L(φ, θ, x (k) ) can be written as: )) where N s is the number of input samples, x and x recon.denote the input data samples and reconstructed samples, respectively.The term D KL (q θ (z|x)||p φ (z)) is the KLD between the approximated posterior distribution, q θ (z|x), and the prior distribution of the latent variable z which is modeled by the Gaussian distribution p φ (z) ∼ N(0, 1).Minimizing this term makes q θ (z|x) closer to p φ (z) ∼ N(0, 1), which has the effect of regularization.A negative sign before term Е z∼qθ(z|x) [ log p φ (x|z)] means that when the total loss L(φ, θ; x) is being minimized, this term is maximized, which is equivalent to minimizing the reconstruction error.Finally, Fig. 3 shows the VAE algorithm.

LSTM AE
LSTM is suitable for time series data as it allows the network to retain long-short term dependencies between data at a given time from many time steps before.Therefore, LSTM learns the order of data points and the dependencies between time steps.Each LSTM is composed of a set of units called LSTM cells.The main idea behind LSTM is the cell state, which retains valuable information across the entire temporal space.It has the form of a chain of repeated modules of neural networks, where each module includes four control gates: the forget gate, the input gate, the tanh layer, and the output gate.These gates control the flow of data into and out of each module.The forget gate determines what information should be discarded or removed from the cell state.The input gate determines which information should be added to the cell state in the current time step.The output of the input gate, along with a vector of new candidates created by the tanh layer, are combined and added to the cell state.Ultimately, the output gate controls the cell state and generates the output of the LSTM cell.The concept of the LSTM cell state and its four gates allows for the retention of long-term information.Visual representation of the LSTM cell can be seen in Fig. 4.
When LSTM cell receives information x t in state t, using the following sigmoid function, it decides which of the old information should be forgotten.
Where h t− 1 is the output in state t − 1, W f and b f are the weight matrices and the bias of the forget gate.Then, x t is processed before storing into the cell state.The value i t is determined in the input gate along with a vector of candidate values ct generated by a tanh layer at the same time to be updated in the new cell state c t : Where (W i ,b i )and (W c ,b c )are the weight matrices and biases of input and memory cell state, respectively.Finally, the equations of the output gate are as follows.
Where W o and b 0 are the weight matrix and the bias of output gate.
In LSTM AE, both the encoder and the decoder are LSTM networks.LSTM's ability to learn patterns in long sequences makes it suitable for tasks such as time series forecasting and anomaly detection.LSTM AE is trained using only normal data.When input data contains anomalies, the algorithm generates output data that deviates from the input.Consequently, the error compared to the normal state is higher, indicating the presence of anomalies.As mentioned, LSTM AE can be utilized for time series prediction and anomaly detection.In this study, the algorithm is specifically employed for anomaly detection.Its structure is illustrated in Fig. 5.

1-D CNN AE
A convolutional neural network (CNN) is a specialized structure of ANN that is widely used in image and video processing.It consists of several convolutional and pooling layers, followed by one or more fullyconnected layers.The convolutional layers are used to detect meaningful and complex features.The pooling layers reduce the dimensionality of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer, while the fully-connected layers connect every neuron in one layer to every neuron in another layer.In this study, we utilized a 1-D CNN, which applies sliding convolutional filters to 1-D input.The main difference between 1-D CNN and 2-D CNN is that the former uses 1-D arrays instead of 2-D matrices for both kernels and feature maps.The structure of the 1-D CNN AE is illustrated in Fig. 6.
According to Fig. 6, the 1-D CNN AE employed in this study comprises solely 1-D convolutional and fully-connected layers.The 1-D CNN AE (e.g.LSTM AE) is trained exclusively using normal data.After training, when anomalous data is fed into this algorithm, its output will diverge from the input.Consequently, the error between the input and output will be substantial in such cases.

FDI implementation
Assuming that we want to implement FDI on bus 5 of the IEEE 14 bus test system, and we possess complete information about the test system, including all the power injections and power flow data.In this scenario, since we have access to the necessary data, the state variables and x SE can be estimated.
According to the T , therefore, c should be determined as follows: c = [(0, ., (T × θ 5 ), …, 0), (0, 0, …, 0)] T (21) h(x SE bad ) is determined by using the following equations: Where in these equations, P i , and Q i represent the active and reactive power injection at bus i, respectively.P i,j , and Q i,j represent the active and reactive power flow from bus i to j respectively.V i and V j are the voltage magnitude at buses i and j, respectively, θ ij is the phase angle differences between the voltage angles at buses i and j, Y ij refers to admittance between bus i and j, g si is the shunt conductance at bus i, g ij is the conductance of the transmission line between buses i and j, b ij is the susceptance of the transmission line between buses i and j, b si represents the shunt susceptance at bus i.
As mentioned in section II B, the FDI can pass BDD if a = h(x SE bad ) − h(x SE ) is satisfied, and using z SE bad = z SE + a, z SE bad can be determined.
In the next step, it is necessary to access all measurements related to the bus 5 and connected buses to bus 5.By manipulating the values of bus 5 and other buses connected to bus 5, using z SE bad values, the FDI implementation can be completed.
After implementing FDI on the bus 5 in the IEEE 14 bus test system, the faulty bad data is transmitted to the CC.Subsequently, after performing the SEP, the state variables such as phase angle and voltage amplitude are estimated.In Figs.7-9, the variations in the phase angle, voltage amplitude, and the power flow under normal and anomalous conditions are illustrated.

FDI detection using proposed detection methodology
In this section, the impact of the proposed method for separating normal and anomalous data is analyzed.
Considering the AE, in the first step, we trained the AE using eleven months of normal data with a resolution of one hour.The data from the last month of the year was also included for the test case.After the training process with normal data, we input the test data into the AE's input.Then, we obtained the detection error (i.e., MAE) of the test data.Fig. 10 illustrates the detection error for each normal and anomalous sample in the test data.According to Fig. 10, it can be observed that the detection error of anomalous samples is higher than that of normal samples.The detection error for normal samples is small and is in proximity to the threshold value.Additionally, the algorithm misclassifies some normal data as anomalous data.For a clearer representation, Fig. 11 displays only the detection error of normal test data samples.
Previously, the detection error of the implemented AE was studied.In the following, the detection error of the AE is compared with the detection framework proposed in Algorithm 1. Figs. 12 and 13 display the detection error of the normal and anomalous test data samples, as well as the detection error of the normal test data samples after implementing Algorithm 1, respectively.Based on Figs. 12 and 13, it can be observed that the detection error of the normal test data samples, which  originally had values close to zero, has now become negative (less than zero).Consequently, the detection error of the anomalous test data samples has increased, while the threshold value remains constant.In other words, the implementation of Algorithm 1 results in an increased detection error for the anomalous samples in the test data, while it decreases for the normal samples.This approach enhances the distinction between the detection error of normal and anomalous data samples, Fig. 8. Voltage amplitude of buses in the IEEE 14-bus test system, before and after the implementation of FDI on bus 5.     enabling the separation of normal and anomalous data using the threshold value.

Evaluation metrics
The F1-score (F1-s) and accuracy metrics are used to evaluate the developed models.F1-s is the harmonic average of Precision and Recall; where Precision (P) = (True Positive)/ (True Positive + False Positive) and Recall (R) = (True Positive)/ (True Positive + False Negative).The F1-s metric is obtained using the following equation:

Architecture details and training process information of models used for FDI detection
All the algorithms (AE, LSTM AE, and 1-D CNN AE) were trained using eleven months of normal data with a one-hour resolution.A summary of the algorithm models trained with the IEEE 14 bus test system is provided in Tables 1-3.In the AE, the exponential linear unit (ELU) and linear activation functions are utilized in all hidden layers and the last layer, respectively.Furthermore, in the 1-D CNN AE, the rectified linear unit (ReLU) and sigmoid activation functions are employed in all hidden layers and the last layer, respectively.Lastly, in the LSTM AE, the ReLU activation function is used in all layers, and the last layer does not have an activation function.According to the Tables 1-3 All models were trained with a learning rate of 0.001.The MAE was used as the loss function.Additionally, each of the three algorithms was trained for 100 epochs, and the data was normalized using Min-Max scaler.Fig. 14 illustrates the losses of the models throughout the training process.ADAM [53] was used as the optimizer to determine the optimal learning parameters.For training and testing, we considered the IEEE 14-bus, 30-bus, and 118-bus test systems.The number of neurons in the input and output layers of the test systems is determined by the size and number of power grid parameters, resulting in different values for the IEEE 14-bus, 30-bus, and 118-bus test systems.In general, the inputs to the algorithms, including AE, LSTM AE, 1-D CNN AE, and LSTM VAE, will include parameters such as injection power at buses, power flow between buses, voltage magnitude, and phase angle of the buses, and the overall output will determine the normal or FDI state of the input data.

Architecture details and training process information of models used for anomalous data reconstruction
The number of neurons in the input and output layers of the LSTM  VAE is equal and depends on the amount of data per input, which is determined by the size of the test system.The ReLU activation function is used in all hidden layers, and the hyperbolic tangent (tanh) activation function is used in the last layer.The LSTM VAE model is trained using the Adam optimizer with a learning rate of 0.0001 and 100 epochs.
Eleven months of data are used for training, and the data from the last month of the year is used for testing.The input data is normalized using min-max scaler.A summary of the LSTM VAE trained with the IEEE 30bus test system is provided in Tables 7 and 8. LSTM VAE schematic is shown in Fig. 16.

FDI diagnosis results
The proposed Algorithm 1 for FDI recognition was applied on the IEEE 14 bus, 30 bus, and 118 bus test systems.Reference [54] shows the operational characteristics of the IEEE 14 bus, 30 bus, and 118 bus test systems; where the measurement devices are installed.
To account for measurement uncertainty, a deviation following a normal distribution is added to the active/reactive power values obtained from the power flow results.The FDIs are implemented in a manner that alters certain data points in the collected power vector, causing a 10% deviation in the obtained state variable of the target bus from the SEP, as depicted in Fig. 7.Note that the 10% deviation in the state variable (phase angle) of the target bus is calculated relative to the estimated value of the state variable.All FDIs are implemented in a manner that ensures they will not be detected by BDD techniques [34].In this study, the power grid measurement devices have a sampling rate of one hour, and the training process utilizes eleven months of data while one month of data is used for testing.The fluctuation in load demands is adapted from [55]- [56].

FDI detection accuracy using the introduced method
In this part, the accuracy of FDI detection has been checked by applying the AE, LSTM AE, and 1-D CNN AE in the proposed Algorithm1.
Table 4 displays the results of FDI detection on the IEEE 14 bus test system using the AE, LSTM AE, and 1-D CNN AE in the proposed algo-rithm1.The results indicate a substantial accuracy in detecting FDI on bus 5 and bus 2. However, the accuracy of FDI detection on bus 8 is comparatively lower than bus 2 and bus 5.The reason for this is that bus 8 is connected to only one bus, whereas bus 5 and bus 2 are connected to four buses.In simpler terms, since bus 8 has only one connection, there will be fewer changes in the measurements caused by FDI implementation.As a result, the detection error is influenced and becomes less significant compared to the other two cases.Nevertheless, the proposed method with its high accuracy was still able to detect FDI on bus 8.The results indicate that the proposed Algorithm 1 successfully detects FDI with acceptable accuracy when applying AE, LSTM AE, and 1-D CNN AE.Specifically, the accuracy of FDI detection using AE and LSTM AE is slightly higher compared to 1-D CNN AE in this scenario.Table 5 displays the results of FDI detection on the IEEE 30 bus test system.It should be noted that, in this scenario, bus 15, bus 23, and bus 13 are connected to 4, 2, and 1 buses, respectively.Table 6 showcases the results of FDI detection on the IEEE 118 bus test system.In this scenario, bus 103, bus 8, bus 4, bus 27, bus 48, and bus 49 are respectively connected to 4, 3, 2, 4, 2, and 12 buses.Fig. 15 illustrates the average accuracy of FDI detection across various test systems.The presented findings demonstrate that the introduced method possesses a high capability in accurately detecting FDI.
By comparing the modeled algorithms, it can be seen that the LSTM AE has a higher accuracy compared with the other two algorithms in detecting FDI.It should be noted that the difference in FDI detection accuracy by AE, LSTM AE, and 1-D CNN AE is very small.The results show that by applying the AE, LSTM AE, 1-D CNN AE, in the proposed Algorithm 1 the accuracy of FDI detection is higher than the methods presented in [33] and [47].
The methods put forth in this study for FDI detection are unsupervised deep learning approaches.To assess and highlight the efficacy of the proposed methods in FDI detection, the performance of the proposed models is compared with conventional unsupervised methods using the one-class SVM machine learning algorithm.
The AE, LSTM AE, 1-D CNN AE, and one-class SVM techniques are all unsupervised, whereas Gaussian naive bayes (GNB) is a supervised method.In other words, to perform FDI detection using GNB on bus 5 in Table 4, the initial phase requires implementing FDI for eleven months.This model is subsequently trained using both normal and anomalous data from those eleven months, with a specific emphasis on FDI implementation at bus 5. Ultimately, the outcomes of the test data from the last month, wherein FDI was applied to bus 5, are presented in Table 4.As mentioned earlier, supervised methods have limitations.For instance, when this model is trained using all normal and anomalous data from those eleven months, with a specific focus on FDI implementation at bus 5, the accuracy of the model diminishes when it is applied to FDI detection at bus 2. In this specific example, the accuracy would be 0.46%.This means that each supervised model trained using data of FDI implemented at a specific bus cannot be used for FDI detection implemented at other buses.On the other hand, by using unsupervised methods, we can detect specific FDIs implemented at specific buses.Furthermore, when using unsupervised methods, FDIs implemented at different buses can be detected with high accuracy.

Reconstruction of anomalous data
Generating new data samples from the probability distribution in the decoder by LSTM VAE leads to data that closely resembles the input data.Additionally, LSTM can capture long-term dependencies between time steps of the data.It's important to note that when the FDI is implemented on a power system, only a portion of the data deviates from its original values, while the remaining data at any given time exhibit small differences from their primary values.Therefore, when training the LSTM VAE with regular data, this method accounts for a specific normal probability distribution for each of these data points in the latent  space.When a new input containing anomalous data is fed into the LSTM VAE, based on the previous description, only a portion of the input data consists of anomalies, while the remaining data is regular.The LSTM VAE maps this input in the latent space to a normal probability distribution that closely resembles the distribution of regular data for the same input.As a result, by utilizing the LSTM VAE, anomalous data can be transformed to closely resemble the original data.

Results of anomalous data reconstruction using LSTM VAE
As previously mentioned in this study, it is assumed that the sampling rate of measurement devices is once every hour.Assuming that FDI is implemented on the power system after FDI detection using the proposed method, in the worst case, the system may spend several hours in a critical situation.Hence, if a model is presented that can effectively convert manipulated or anomalous data to regular data, or at least get closer to it, it would help take the power system out of the critical situation.In this regard, we first train an LSTM VAE using the SE result and collected power data under normal conditions.After LSTM VAE training, 500 samples were used to test the model.It is important to note that each sample contains state variables and collected anomalous power data.Once each sample containing the anomalous data was fed into the LSTM VAE, the LSTM VAE generated data that was related to the same input at its output.In other words, the LSTM VAE generated output data that was closer to the original data.Fig. 17 displays the average of the encoded normal training data, as well as the normal and anomalous test data, in the latent space.Based on Fig. 17, the LSTM VAE has successfully brought the distribution of the normal test data closer to that of the normal training data, and the distribution of the anomalous test data is also approaching the distribution of the training data.It's important to note that the model utilizes the latent space for reconstructing the data in the decoder section.
To evaluate the LSTM VAE performance, the mean value of the MAE for every 500 samples is used.As the First step, in each sample, the MAE value is calculated for the reconstructed anomalous and regular data.Moreover, the MAE value is calculated for the anomalous and regular According to Figs. 18 and 19, it is evident that the MAE values before reconstructing the anomalous data in bus 8 of the IEEE 14 bus test system and bus 13 of the IEEE 30 bus test system are lower than those of the other buses.As explained in section V.A., this can be attributed to the fact that these buses have fewer connections to other buses, resulting in a smaller error associated with FDI implementation compared to the other buses.

FDI detection and cleaning
Based on the preceding discussion, a question arises as to whether LSTM VAE can be employed in Algorithm 1 to detect FDI.Conversely, it     As mentioned earlier, the detection error is used to assess the applicability of various methods in detecting FDI through the introduced method and algorithms.Fig. 23 illustrates the detection error of the LSTM VAE algorithm when applied to FDI on bus 15 in the IEEE 30 bus test system.The detection errors for anomalous and normal data fall within the same numerical range.Consequently, it can be inferred that the LSTM VAE algorithm is incapable of detecting FDI using the introduced Algorithm 1.In simpler terms, the detection error range for anomalous data overlaps with that of normal data, rendering it impossible for the LSTM VAE algorithm to differentiate between the two based on detection error disparities.
The study also investigates the utilization of AE, LSTM AE, and 1-D CNN AE for reconstructing anomalous data in this section.Specifically, focusing on FDI on bus 5 of the IEEE 14 bus test system, Fig. 24 represents the average MAE for both the anomalous data and the reconstructed data using AE, LSTM AE, and 1-D CNN AE.Upon reviewing the outcomes, it is evident that the 1-D CNN AE algorithm excels at reconstructing the anomalous data.Hence, we will select 1-D CNN AE for further analysis and compare its results with those of LSTM VAE.
According to Figs. 25 and 26, upon comparing the reconstruction results of anomalous data using LSTM VAE and 1-D CNN AE, it can be     concluded that the overall performance of the LSTM VAE is superior to that of the 1-D CNN AE in reconstructing the anomalous data.
Note that based on our proposed models for FDI detection and reconstruction of anomalous data, operation of the smart grid should be based on SCADA, and power system operation data should be collected.Additionally, for training the machine learning models with big data appropriate hardware is needed.Furthermore, similar to [26], [31], [33] the provided model is specifically designed for a scenario where the power network structure remains constant.

Conclusion and future work
This study introduces a novel FDI detection method that utilizes AE, LSTM AE, and 1-D CNN AE within the proposed algorithm1.The main feature of the proposed method is its utilization of an unsupervised learning approach, which removes the reliance on anomalous data during the training process.In contrast, supervised learning algorithms require the presence of both normal and anomalous data for training.This poses limitations, as it can be challenging to generate a sufficient amount of anomalous data for training, potentially hindering the detection of changes in anomalous data through supervised methods.However, unsupervised learning algorithms do not depend on anomalous data for their training process, which offers an advantage.The results demonstrate that the integration of AE, LSTM AE, and 1-D CNN AE in the proposed Algorithm 1 enables the accurate detection of FDI anomalies.Additionally, this study proposes a reconstruction method that utilizes LSTM VAE.The objective of this method is to reconstruct the operating data, closely resembling the original data, based on the available anomalous data.By employing LSTM VAE, the anomalous data can be transformed to closely resemble the original data.In critical situations where FDI is implemented, the power system can maintain its normal operation by utilizing the proposed method to reconstruct anomalous data.To assess the efficacy of the proposed methods, experiments were conducted on IEEE 14 bus, 30 bus, and 118 bus test systems, resulting in positive outcomes aligned with their intended objectives.Future works will concentrate on broadening the scope to encompass the detection of additional cyber threats that could potentially affect the power system.Noted that, based on our proposed models for FDI detection and reconstruction of anomalous data, the operation of the smart grid should be based on SCADA, and power system operation data should be collected.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 7 .
Fig. 7.Voltage angle of buses in the IEEE 14-bus test system, before and after the implementation of FDI on bus 5.

Fig. 9 .
Fig. 9. Power flows in the IEEE 14-bus test system, before and the implementation of FDI on bus 5.

Fig. 10 .
Fig. 10.Detection error for normal and anomalous test data samples.

Fig. 12 .
Fig. 12.Detection error for normal and anomalous test data samples after implementing the proposed Algorithm 1.

Fig. 13 .
Fig. 13.Detection error for normal test data samples after implementing the proposed algorithm1.
, the AE model consists of Dense layers and has 41,048 trainable parameters.The LSTM AE model is composed of LSTM and Time-Distributed layers and has 150,860 trainable parameters, and the 1-D CNN AE model is composed of Dense, Conv1D, and Flatten layers and has 124,584 trainable parameters.

Fig. 14 .
Fig. 14.Training and validation loss during training process for each model.
Figs. 18, 19, and 20 demonstrate the capability of LSTM VAE in reconstructing anomalous data resulting from FDI implementation, leading to a reduction in MAE by approximately 5%.Additionally, Figs.21 and 22 showcase the normal, anomalous, and reconstructed data by LSTM VAE at various buses of the IEEE 14 bus and 30 bus test systems.Respectively, the state estimation data (i.e,voltage and angle of buses) as well as the power flow data in the per-unit format for the three different states are presented in Fig. 21, and, 22.

Fig. 17 .
Fig. 17.Mean values of encoded input in the LSTM VAE latent space.

Fig. 18 .
Fig. 18.Mean of MAE for 500 samples in both pre-and post-data reconstruction by LSTM VAE the IEEE 14-bus test system.

Fig. 19 .
Fig. 19.Mean of for 500 samples in both pre-post-data reconstruction modes by VAE in the IEEE 30-bus test system.

Fig. 20 .
Fig. 20.Mean of MAE for 500 samples in both pre-and post-data reconstrucmodes by LSTM VAE in the IEEE 118-bus test system.

Fig. 21 .
Fig. 21.Voltage amplitude, phase angle and power collected data in normal, anomalous and reconstruction modes for bus 5 in IEEE 14-bus test system.

Fig. 22 .
Fig. 22. Voltage amplitude, phase angle and power collected data in the normal, anomalous and reconstruction modes for bus 13 in IEEE 30-bus test system.

Fig. 23 .
Fig. 23.Detection error for normal and anomalous data by LSTM VAE in bus 15 of the IEEE 30-bus test system.

Fig.
Fig. Mean MAE for 500 samples in pre-and post-data reconstruction modes by AE, LSTM VAE, and 1-D CNN AE in the IEEE 14-bus test system.(FDI on bus 5).

Fig. 25 .
Fig. 25.Mean of MAE samples in both pre-and post-data reconstruction modes by LSTM VAE, and 1-D CNN AE in the IEEE 30-bus test system.

Fig. 26 .
Fig. 26. of MAE for 500 samples in pre-and post-data reconstruction modes by LSTM VAE, and 1-D CNN AE in the IEEE 118-bus test system.
Total params: 71,465 Trainable params: 71,465 Non-trainable params: 0 Fig. 15.Structure of a LSTM VAE.Fig.16.Average accuracy for FDI detection.data, which includes the state variables and the collected power data.Accordingly, the mean values of the results for the IEEE 14, 30, and 118bus test systems are shown in Figs. 18, 19, and 20, respectively.As can be traced, the proposed model appropriately reconstructs the anomalous data.