Short-Term Power Prediction for Renewable Energy Using Hybrid Graph Convolutional Network and Long Short-Term Memory Approach

Accurate short-term solar and wind power predictions play an important role in the planning and operation of power systems. However, the short-term power prediction of renewable energy has always been considered a complex regression problem, owing to the fluctuation and intermittence of output powers and the law of dynamic change with time due to local weather conditions, i.e. spatio-temporal correlation. To capture the spatio-temporal features simultaneously, this paper proposes a new graph neural network-based short-term power forecasting approach, which combines the graph convolutional network (GCN) and long short-term memory (LSTM). Specifically, the GCN is employed to learn complex spatial correlations between adjacent renewable energies, and the LSTM is used to learn dynamic changes of power generation curves. The simulation results show that the proposed hybrid approach can model the spatio-temporal correlation of renewable energies, and its performance outperforms popular baselines on real-world datasets.


I. INTRODUCTION
With the increase of fossil energy consumption and environmental pollution, the effective use of renewable energy has become a hot topic.Wind farms and photovoltaic (PV) power plants are widely used and considered as very promising renewable energies whose permeability is gradually increasing in power systems [1], [2].Although these renewable energies can bring positive environmental and economic benefits, their intermittent and fluctuating natures make it difficult to accurately forecast PV and wind powers, which pose challenges to the safe operation of power systems [3].Therefore, it is necessary to develop accurate forecasting methods of PV and wind power generations to assist the safe operation and economic dispatch of power systems.
In respect of horizons, power prediction can be divided into several groups: long-term power prediction with year scales, medium-term power prediction with month scales, short-term power prediction with hour scales, and very shortterm power prediction with minute scales.Generally, existing methods of short-term power prediction can be divided into three main categories: 1) Physical methods.They are usually developed on the basis of the lower atmosphere and sophisticated meteorological features, such as humidity, pressure, wind speed, and temperature [4].Taking PV power prediction as an example, physical methods of the PV power prediction mainly include [5]: sky imagery methods, satellite imaging methods, and numerical weather prediction methods.Although these methods achieve outstanding performance, they require high computation costs, which seriously limit their applicability.2) Statistical methods.They mainly include [6]: autoregressive moving average (ARMA), autoregressive (AR), and autoregressive integrated moving average (ARIMA), which extract features from lagged time series curves and meteorological factors, and then quantify the nonlinear dynamic relationship between features and powers to obtain forecasts.Compared with physical methods, statistical methods are relatively cost-saving, because they do not require any expensive simulations beyond historical PV and wind powers after being trained offline.However, the forecasting performance of statistical methods usually drops with the increase of the time horizon [7].3) Artificial intelligence (AI)-based methods.The traditional artificial intelligence-based methods mainly include support vector machine (SVM) and multi-layer perceptron (MLP) [8], which ignore the spatio-temporal correlations, so that the change of meteorological features is not restricted by local weather conditions and they cannot predict the power generation curves accurately.Most recently AI-based approaches are proposed to capture strong correlations between renewable energies located in the vicinity, such as long short-term memory (LSTM) [9], convolutional neural network (CNN) [10], and hybrid model [11], which improve the forecasting accuracy of the target site by inputting feature information collected from neighboring sites to the models.
Further, the above-mentioned approaches are commonly used for datasets recorded from Euclidean domains (e.g., images and time series), while the input data of short-term power prediction considering the spatio-temporal correlation of renewable energies should be graph-structured data, which includes a correlation matrix between multiple renewable This paper was accepted the 22nd Power Systems Computation Conference (PSCC 2022).

A,X
energies and their historical power generation curves.Existing methods have difficulties in dealing with the graph-structured data, so they simplify the graph-structured data into Euclidean data by ignoring correlation matrices, which limits the forecasting accuracy [12].Recently, various graph neural networks defined in graph domains have shown convincing performance to handle the complex graph-structured data in different fields [13], such as traffic flow forecast, social recommendation, and drug discovery.The input data of shortterm power prediction considering spatio-temporal correlation of renewable energies belongs to the graph-structured data, so graph neural networks should have the potential for short-term power prediction.
To improve forecasting accuracy, this paper proposes a new graph neural network-based short-term power forecasting approach, which combines the graph convolutional network (GCN) and LSTM to capture the spatio-temporal correlation simultaneously.The key contributions are as follows: 1) Multiple neighboring renewable energies are modeled as a graph, in which the adjacent matrix of nodes represents spatial dependencies.
2) A novel graph neural networks-based hybrid approach is proposed for short-term power prediction for renewable energies.Specifically, the GCN is used to capture the spatial dependence between multiple neighboring wind farms or PV plants, and the LSTM is employed to learn temporal features from the time series curves.
3) The influence of key parameters (e.g. the number of hidden layers, the size of the training epoch, and the choice of the optimizer) on the performance is analyzed, and the constructive suggestions of how to select these parameters in the proposed model are given.
The rest of the paper is organized as follows.Section II formulates the proposed method, and section III presents the process of the proposed method.Numerical experiments are performed and analyzed in section IV.Finally, section V summarizes the paper.

II. METHODOLOGY A. Problem Definition
For short-term power prediction for renewable energies, the goal is to forecast the future power generation curves in a certain period of time given the historical data, such as historical power or meteorological features.Without loss of generality, the power generation curves of multiple wind farms and PV plants are used as an example of historical data in the experiment section.
Definition 1: Graph-structured data G.Specifically, the multiple renewable energies (e.g., Wind farms or PV plants) can be represented as an undirected graph G=(V,E), where each renewable energy is treated as a node v i .Specifically, 12 ( , , , )  is a group of renewable energies.N is the number of renewable energies, and E is a set of edges between these renewable energies.Normally, a matrix NN AR   is utilized to represent the connection relationship between nodes.For traffic flow forecast and social recommendation, the adjacency matrix only contains binary variables, which is equal to 1 if there is a link between nodes and 0 denotes there is no link [14].By analogy, the adjacency matrix can be represented by the correlation matrix between multiple renewable energies.Specifically, this paper employs the absolute value of the Pearson correlation coefficient between nodes to represent the spatial correlation of neighboring wind farms or PV plants [2], and each element in the adjacency matrix is a real number, which ranges from 0 to 1.
Definition 2: Feature matrix NF X  .The historical data of renewable energies is considered as the attribute feature represented by NF X  .F denotes the length of the historical time series.Again, attribute features of each node can be historical power generation curves or meteorological features, and the power generation curves of multiple wind farms and PV plants are used as an example in this paper.
In general, the problem of short-term power prediction for renewable energies can be regarded as learning a complicated neural network f, which projects a feature matrix and an adjacency matrix to the future power generation curves in a certain period of time:     all all all 11 , , , , , , ,1 where h is the length of historical power generation curves; all t X is the set of power generation curves from multiple renewable energies at time t; X  is the predicted powers of the targeted renewable energy at time t+1; and k is the length of future power generation curves needed to be predicted.Obviously, when k is equal to 1, it is a one-step prediction, and when k is greater than 1, it is a multi-step prediction.
The following section will explain how to use the proposed hybrid model to realize the short-term power prediction task.Specifically, the hybrid model includes two parts: a GCN and an LSTM.As shown in Fig. 1, an adjacency matrix A and historical power generation curves collected from past time t-h to current time t are input to the GCN, so as to obtain spatial features of multiple neighboring wind farms or PV plants.Then, the obtained spatial features are used as the input data to the LSTM, so as to capture temporal features by information transmission between renewable energies.Finally, the future power generation curves from time t+1 to t+k are predicted through a dense layer with k unit.The number of units in the dense layer is used to decide whether to make a one-step prediction or a multi-step prediction.

B. Modeling Spatial Correlation with GCN
Modeling the complicated spatial features is a key problem for the short-term power prediction of renewable energies.As shown in Fig. 2(a), despite the traditional CNN can obtain local spatial features of pixel values of the red node along with its neighbors, it can only be used for the data defined in Euclidean domains, i.e., neighbors of each node are ordered and have a fixed size.To consider spatio-temporal correlations, the input data of short-term power prediction includes a correlation matrix between multiple renewable energies and their historical power generation curves, which belong to a graph rather than a 2-dimensional matrix.Different from the data in Euclidean domains, neighbors of each node are unordered and variable in size for the graph-structured data, as shown in Fig. 2 (b).Therefore, the traditional CNN cannot make good use of the correlation matrix between multiple renewable energies and accurately capture spatial features.Recently, the traditional CNN in Euclidean domains has been generalized into the GCN in graph domains, which has shown outstanding performances in many fields [15], including text classification, fault diagnosis, and graph generation.To this end, the GCN is employed to model spatial features in this section.
The existing GCN mainly consists of two categories: spectral-based GCN and spatial-based GCN.Specifically, the former employs the Fourier transform to project the graphstructured data into the Fourier domains, and then the data is projected back to the graph domains after performing convolutional operations.In contrast, the latter directly defines convolutional operations on the graph domains by operating on spatially neighboring nodes.Both spectral-based GCN and spatial-based GCN are constantly developing and improving, it is difficult to say which one is better.Without loss of generality, a popular spectral-based GCN is used as an example to model the spatial features.
As shown in Fig. 3, The GCN can obtain the spatial correlation between the central renewable energy and its surrounding other power generation units by encoding the adjacency matrix and the feature matrix.The mathematical formula of each graph convolutional operation can be expressed as: 11 22 ˆˆ, , where is the feature matrix of the i th graph convolutional layer (The initial feature matrix includes N historical power generation curves from past time t-h to current time t); M is the number of graph convolutional layers; () is the weight matrix of the i th graph convolutional layer; GCN ()   is the activation function of graph convolutional layers; D is the diagonal node degree matrix of the adjacency matrix A; I is an identity matrix; and Â is a new form of the adjacency matrix with self-connection structure.Note that the output features of the last graph convolutional layer are used as the input features of the first LSTM layer.

C. Modeling Temporal Correlation with LSTM
Modeling the complex temporal features is another key problem for short-term power prediction of renewable energies.So far, the recurrent neural network is one of the most widely used methods for short-term power prediction of time series.Nevertheless, the traditional recurrent neural network has gradient vanishing and exploding problems [16], which seriously limit its performance to learn long-term temporal correlations.To address these problems, the LSTM architecture was first proposed to memorize long-term dependence as much as possible in [17], and then further improved by adding an extra forget gate in [18].At present, the LSTM has been the most popular recurrent neural network architecture and has shown convincing performance in many sequential tasks.Therefore, the LSTM is employed to model temporal features in this section.
As shown in Fig. 4, there are three input features for each LSTM unit, which includes hidden state vector  historical power generation curves and shows the ability to capture temporal correlation.The output vectors of the LSTM unit can be obtained through and non-linear transformation and logical operation: where t F is the activation vector of the forget gate; t I is the activation vector of the update gate; o B is the bias vector of the output gate; C B is the bias vector of the cell state; and is the Hadamard product.Note that the output features of the last LSTM layer are used as the input features of the dense layer.

D. Short-term Power Prediction with Hybrid Form
Normally, the outputs of the last LSTM layer are fed to a dense layer which projects the intermediate LSTM outputs to future power generation curves from time t+1 to t+k.The mathematical formula of a dense layer can be expressed as:

III. PROCESS OF THE PROPOSED METHOD
The process of short-term power prediction for renewable energies based on the proposed method is shown in Fig. 5, and the specific steps are as follows: 1) Import and preprocess datasets.For short-term power prediction for renewable energies, the goal is to forecast the future power generation curves from time t+1 to t+k given the historical data, such as historical power or meteorological features.Without loss of generality, the power generation curves of multiple neighboring wind farms or PV plants are used as an example of historical data in the experiment section.Then, the min-max normalization method is utilized to project the historical data into values that vary from 0 to 1.To account for the spatial correlation, the absolute value of the Pearson correlation coefficient between multiple neighboring renewable energies is employed to form an adjacency matrix A for the GCN.Next, a part of samples are selected for the training set and validation set to fit the parameters of neural networks.The remaining samples are used to evaluate the performance of the pre-trained model.
2) Initializing parameters and train the model.To improve the performance of the proposed model, there is a need to explore the suitable structure and parameters before training the model.The parameters of the proposed model mainly include the numbers of middle layers (e.g., graph convolutional layers and LSTM layers), training epoch, and the selection of optimizer and its learning rate (LR).
Generally, the control variable method is utilized to adjust these parameters [15].After initializing the parameters, the back-propagation algorithm is used to update the weights of the model by optimizing the loss function, such as mean absolute error (MAE).When the iteration ends, the pretrained model is used to forecast future power generation curves.
3) Evaluate the performance of models.To evaluate the prediction performance of the proposed model and baselines for the test set, the MAE and the root mean square error (RMSE) are used to evaluate the difference between the real power t Y at time t and the forecasting power ˆt Y at time t.The definitions of these two metrics are shown as: For the RMSE and MAE, the smaller the value is, the stronger the performance of the model is.

A. Data Description and Software Platform
To demonstrate the forecasting superiority of the proposed hybrid model based on GCN and LSTM, two datasets from the National Renewable Energy Laboratory (NREL) in the United States are employed [19], [20].The first dataset includes 2190 wind power generation curves of 16 neighboring wind farms from January 1, 2007 to December 31, 2012, and the second dataset includes 1460 PV power generation curves of 9 neighboring plants from January 1, 2007 to December 31, 2010.The time resolutions of these power generation curves in two datasets are 10 minutes.The samples are divided into the training set, validation set, and test set according to seasons.In each season, the first 80% of the data is treated as the training set, followed by 10% of the data as the validation set, and the rest of the data as the test set.
The programming language is Python. of different models for short-term power prediction are implemented in Spyder 4.1.5with deep learning frameworks (e.g., Keras 2.3.1 and Tensorflow 2.1.0).The parameters of the computer are follows: Intel(R) Core(TM) i5-10210U, the processor base frequency is 1.60GHz, and the crucial laptop memory is 8 GB.

B. Parameters Discussion
The hyper-parameters of the proposed hybrid model mainly include: past time length h, the numbers of middle layers, optimizer and its LR, and training epoch.In this paper, the control variable method is utilized to adjust these parameters through many experiments [15].When one of the parameters is explored, the default values are used for the other parameters: The middle layer consists of 2 GCN layers and 2 LSTM layers.The optimizer is the Adam algorithm, and the LR is 0.001.The training epoch is 500.
As an example of predicting the wind powers for the next 1 hour, Fig. 6 shows the MAE of models with different past time length h.
Normally, one would expect to see a smooth U-shape, but the result appears to be that the random initialization of the parameters in the proposed model also dominates the choice of this hyper-parameter.Generally, the larger past time length h is not the better.When h is 6, the model has the smallest forecasting error.
Further, Table I shows the optimal past time length h corresponding to different forecasting time length k.Normally, the optimal past time length h varies from 6 to 12 for shortterm wind power prediction.For short-term PV power prediction, the optimal past time length h ranges from 8 to 118, and 114 can be considered as a good starting point for PV forecasts for the next 2 to 5 hours.Higher values or lower values may be fine for other PV power datasets.Note that the optimal past time length of the PV power is much larger than that of the wind power, which may be attributed to the strong diurnal trend of the PV power.
In order to explore the appropriate number of middle layers, Table II and Table III show the test set errors of models with different structures for short-term PV and wind power prediction of the next 1 hour.For short-term PV and wind power prediction, the number of middle layers in the hybrid model is not the more the better, since the capacity of the proposed hybrid model is way bigger than what is needed for short-term prediction, which results in a very high error on the test set, i.e., the over-fitting problem.Specifically, 1 GCN layer and 1 LSTM layer are suitable to form the middle layer of the hybrid model for the wind power dataset, and 3 GCN layers and 4 LSTM layers are suitable for the PV dataset.For other datasets, the number of middle layers can be adjusted according to forecasting errors of the validation set.
After initializing the structure of the hybrid model, it needs to select an appropriate optimizer to optimize the loss function.Mainstream optimizers include [21]: adaptive moment estimation (Adam), stochastic gradient descent (SGD), root mean square propagation (RMSProp), adaptive gradient descent algorithm (Adagrad), adaptive delta (Adadelta), adaptive moment estimation extension based on infinity norm (Adamax), and Nesterov-accelerated adaptive moment estimation (Nadam).To find a suitable LR, the Adam algorithm is regarded as an example.The models with different LRs are trained 500 epochs respectively, and their loss functions of the training set are visualized, as shown in Fig. 7.
When LR is greater than 0.1, the loss function of the hybrid model vibrates or even does not decrease.Conversely, too small LR requires more training epochs (e.g., LR=1×10 -5 ), and may lead to never converge (e.g., LR=1×10 -6 ).Generally, LR should not be too large or too small, and a good starting point can be range from 1×10 -4 to 1×10 -2 .After setting a suitable LR, the training epochs can be initialized to 100, which is enough to ensure that the hybrid model has converged.
Further, the hybrid models with different optimizers are trained 30 times respectively, and the average loss functions of the training set are shown in Fig. 8. From Fig. 8, it can be seen that the proposed hybrid model can obtain good performance when Adam, RMSprop, Adamax, and Nadam algorithms are used as optimizers.Specifically, Nadam algorithm is more suitable for short-term wind power prediction compared with other algorithms, while Adam is the optimal optimizer for short-term PV prediction.In addition, it is obvious that the loss functions of SGD, Adagrad, and Adadelta significantly larger than those of other algorithms, which indicates that these three optimizers are not suitable for short-term power prediction based on hybrid models.

C. Comparison and Analysis with Popular Baselines
To illustrate the effectiveness of the proposed method, the hybrid model should be compared with popular baselines, such as MLP, LSTM, CNN, GCN, and the hybrid model of CNN and LSTM.Similarly, the control variable method is utilized to select suitable hyper-parameters through many experiments, as follows: 1) For MLP, the middle layer includes three dense layers, and the numbers of neurons are 30, 25, and 20, respectively [8]. 2) For LSTM, the middle layer includes 3 LSTM layers [9], and the sizes of units are 10, 15, and 10, respectively.3) For CNN, the middle layer includes two 1-D convolutional (Conv1D) layers, two 1-D maximal pooling (MaxPooling1D) layers, a flatten layer, and a dense layer [10].Specifically, the sizes of filters in Conv1D layers are 16 and 1 respectively.The size of the kernel in Conv1D layers is 1, and the pooling size in MaxPooling1D layers is 2. Besides, the unit of the dense layers is 1. 4) For GCN, 3 graph convolutional layers are used as the middle layer [15], and the sizes of filters are 20, 20, and 15, respectively.The output layer is a dense layer with 1 unit.5) For the hybrid model of CNN and LSTM, it has a similar structure to the CNN [11].Specifically, the hybrid model inserts an LSTM layer between the flatten layer and the dense layer of the CNN.The size of the unit in the LSTM layer is 10.
Besides, the training epochs, optimizers, and loss function of baselines are the same as the proposed hybrid model.Each The following conclusions can be drawn from Table IV: 1) A part of neural networks such as the proposed hybrid model and the LSTM, which focus on modeling the temporal features of power generation curves, generally show better forecasting performance than other baselines, such as the MLP.For example, for the 2-hours wind power prediction, the MAE of the proposed hybrid model and the LSTM are reduced by approximately 36.27 and 25.39% compared with the MLP, and the RMSE is approximately 33.71% and 26.13% lower than that of the MLP.This is because the traditional MLP has difficulty in handling non-stationary and complex time series curves.In addition, the forecasting precision of the GCN and CNN is not the highest, since they only account for the spatial features and ignore temporal features of power generation curves.2) It is found that the MAE and RMSE of the proposed hybrid model are smaller than those based on a single model (e.g., LSTM or GCN), which indicates that the proposed hybrid model has the ability to accurately capture spatiotemporal features from power generation curves.For example, for the 1-hour PV power prediction, the MAE of the proposed hybrid model is reduced by approximately 13.64% compared with the GCN that only considers spatial features, and the RMSE is reduced by 13.26%.Compared with the LSTM which only considers temporal features, the MAE and RMSE of the proposed hybrid model are decreased by approximately 12.64% and 9.25% for the 1-hour PV power prediction.3) Note that the hybrid model of the CNN and LSTM has a weaker performance than that of the proposed hybrid model of the GCN and LSTM, because the traditional CNN simplify the graph-structured data (i.e., the input data of short-term power prediction) into the Euclidean data by ignoring correlation matrices, which limits the forecasting accuracy.4) In general, Table IV shows the forecasting results of the proposed hybrid model and popular baselines for 1 hour, 2 hours, 3 hours, 4 hours, and 5 hours on the wind power dataset and PV power dataset.It can be seen that the proposed hybrid model obtains the best forecasting performance under all evaluation indicators for all forecasting time horizons, proving the effectiveness of the hybrid model in spatio-temporal shortterm power prediction of renewable energies.
V. CONCLUSION To improve the forecasting precision of short-term power predictions, a novel graph neural network-based hybrid approach is presented in this paper.After the simulation analysis on two real-world datasets, the following conclusions are obtained: 1) The optimal past time length h varies from 6 to 12 for short-term wind power forecasts.For short-term PV power prediction, the optimal past time length h ranges from 8 to 118, and 114 can be considered as a good starting point for PV forecasts for the next 2 to 5 hours.
2) The number of middle layers in the hybrid model is not the more the better, since the capacity of the proposed hybrid model is way bigger than what is needed for short-term prediction, which leads to the over-fitting problem.The proposed hybrid model can obtain good performance when Adam, RMSprop, Adamax, and Nadam algorithms are used as optimizers.Besides, LR should not be too large or too small, and a good starting point can be range from 1×10 -4 to 1×10 -2 .
3) For the short-term PV and wind power prediction, the proposed hybrid model outperforms popular baselines (e.g., MLP, CNN, LSTM, GCN, and the hybrid model of CNN and LSTM) under different forecasting time horizons.
As a part of graph-structured data, the adjacency matrix of the proposed hybrid model is a fixed correlation matrix, which may be extended to a dynamic graph-structured data through spatial-temporal graph neural networks in future works.Also, the inputs to the proposed model do not involve the time of day and numerical weather prediction information, but they can easily be incorporated in the extension work.

Figure 1 .
Figure 1.The framework of the proposed method.

Figure 3 .
Figure 3. Graph convolutional operation on the graph-structured data.

Figure 4 .
Figure 4.The framework of the LSTM unit.

tOFUFB
is the activation vector of the output gate; t C is the cell input activation vector; s  is the sigmoid function; g  is the hyperbolic tangent function; are weight matrices of the forget gate; is the bias vector of the forget gate; I B is the bias vector of the update gate; matrix of the i th dense layer; () Dense i W is the weight matrix of the i th dense layer; () Dense i B is the bias vectors of the i th dense layer; and Dense ()   is the activation function of the dense layer.

Figure 5 .
Figure 5. Process of the proposed method.

Figure 6 .
Figure 6.The MAE of models with different past time lengths.

Figure 7 .
Figure 7. Loss functions of the hybrid model with different learning rates.(a) Wind farms.(b) PV plants.
While modeling the feature information at the current moment, the LSTM still keeps the dynamic trend of  at time t-1, and feature information t X at time t.Note that t H is considered the output of the LSTM layer.

TABLE I .
THE OPTIMAL PAST TIME LENGTHS OF DIFFERENT TIME LENGTHS

TABLE III .
THE MAE OF THE TEST SET FOR PV POWER PREDICTION

TABLE IV .
THE PREDICTION RESULTS OF THE PROPOSED HYBRID MODEL AND OTHER BASELINES