Short Term Prediction of Wind Speed Based on Long-Short Term Memory Networks

Power utilities, developers, and investors are pushing towards larger penetrations of wind and solar energy-based power generation in their existing energy mix. This study, specifically, looks towards wind power deployment in Saudi Arabia. For profitable deplopement of wind power, accurate knowledge of wind speed both in spatial and time domains is critical. The wind speed is the most fluctuating and intermittent parameter in nature compared to all the meteorological variables. This uncertain nature of wind speed makes wind power more difficult to predict ahead of time. Wind speed is dependent on meteorological factors such as pressure, temperature, and relative humidity and can be predicted using these meteorological parameters. The forecasting of wind speed is critical for grid management, cost of energy, and quality power supply. This study proposes a short-term, multi-dimensional prediction of wind speed based on Long-Short Term Memory Networks (LSTM). Five models are developed by training the networks with measured hourly mean wind speed values from1980 to 2019 including exogenous inputs (temperature and pressure). The study found that LSTM is a powerful tool for a short-term prediction of wind speed. However, the accuracy of LSTM may be compromised with the inclusion of exogenous features in the training sets and the duration of prediction ahead.


INTRODUCTION
Wind amongst other renewable sources, is becoming more popular for both grids connected large applications and isolated grids for small loads. The grid connectivity issues and power grid managements control are getting advanced with time. Saudi Arabia, for example, is targeting a new wind power capacity of 2.0GW by 2030 [1] and is planning to achieve 20% of wind power penetration in its total capacity by the end of 2030 [2]. However, the challenge associated with wind power is its uncertain nature [3,4]. Wind power is mainly affected by wind speed and weather factors such as wind direction, temperature, atmospheric pressure, and relative humidity [5]. The geographical and topographical conditions at the wind farm sites have an influence on the output of the wind energy [6,7]. An accurate information of the wind speed availability, that drive the wind turbines, is crucial for microgrid-siting and later for profitable and proper operation and maintenance of the system [8,9]. The integration of wind power in microgrids where the effects of wind power fluctuations significantly affect the microgrid operation and other distributed generation were studied in [7][8][9][10].
In micro-grids, the integration of large-scale wind power may result in the fluctuations of the output power and can cause disturbances to the power system and may lead to grid failures [11,12]. Consequently, power quality, voltage, and frequency may be seriously affected [5,11].
There are physical parameters of the wind turbines such as pitch, rotor diameter, blade length, and windfarm layout design that greatly affect the power output from the turbine. A study of wind farm layout design is presented in [13], where particle swarm optimization (PSO) algorithm is used to solve wind farm optimization problem. Besides, the authors in [14] studied the optimal location of wind turbines and their per-formance in Iraq considering the costs and maximum possible capacities from the wind turbines in different sites. Furthermore, the steady-state deformation and stress that occur in wind turbine blades were investigated in [15]. A review of studies involving the design, optimization and techniques of different wind turbine blades were reported in [16].
The ability to accurately forecast wind speed can reduce the adverse effects of wind power fluctuations on the power system. Reliable prediction of wind speed ahead of time can allow the power management system enough time to manage the fluctuations in power through other controls. It is reported that wind speed and ultimately wind power forecasting methods are divided into statistical learning methods, physical models, modern machine learning techniques, and hybrid methods. The statistical approaches are simple and use historical data to predict wind speed in future time domain [2]. These are based on statistical time series or machine learning algorithms [17] which may include the artificial neural network (ANN) and support vector machine continuous association rule mining algorithm (CARMA) methods [18][19][20].
The physical approaches use the physical description (solving differential equation of mass, energy, and velocity) of the wind to model the onsite conditions and it is good for long term prediction [10]. However, the physical methods are complex and make use of many inputs resulting in a high computational cost [5]. The hybrid or combination methods on the other hand, combines the predictive ability of various methods to improve the accuracy of the model. Owing to its memory retaining capability and its performance on sequence models, especially in natural language processing (NLP), recurrent neural networks (RNN) and its variants have become the de-facto models in renewable energy forecasting. The Long-Short Term Memory (LSTM) networks particularly have shown tremendous performance in their use for wind speed energy prediction and there is no shortage of work that has been done in this area. However, the vast majority of these applications have been based on certain environments. The major reason for this is the fact that environments play a vital role in how renewable energy systems behave. This in turn have effect on their prediction. As such, each environment is unique and thus often require unique strategy for their predictions. Hence, the need for our research.
In the presence of extensive dataset, a hybrid of Ensemble Empirical Model Decomposition (EEMD), Genetic Algorithm (GA), and LSTM was proposed in [21] as a way to achieve short-term wind speed prediction. The EEMD was utilized to decompose the sequence wind speed data for feature extraction and the GA-LSTM algorithm was applied to the extracted features to predict future wind speed. The feature extraction step culminates in a 38.48% increase in prediction accuracy as opposed to the non EEMD model. The data utilized was extracted from the wind speed data of various sites across the United States. A combination of variational mode decomposition (VMD), singular spectrum analysis (SSA), LSTM, and extreme learning machines (ELM) was adopted in [22] for wind speed forecasting. The SSA step was utilized as a form of feature extraction step to boost the prediction accuracy. Several experiments were performed in which their multi-step algorithm was compared with several other algorithms and achieved better accuracy. The data utilized here is from the samples obtained from one observation site of a wind farm in China between May to December, 2015. Again, the ELM and LSTM was combined with differential evolution algorithm for wind speed forecasting in [23]. In their work, data was gathered from a wind farm in Inner Mongolia, China. Two forecasting models were developed in which one was for short-term prediction of 10 minutes and the other for slightly longer prediction of one-hour ahead. Autoencoders with LSTM [24], two-stage LSTM [25] are some other methods that have been employed in wind speed forecasting. A comprehensive review of the various LSTM and support vector machines (SVM) models utilized for wind speed forecasting is equally presented in [26].
In this study, a short-term prediction of wind speed based on LSTM is proposed. The study presents five different Cases to investigate the predictive ability of LSTM in the short-term prediction of wind speed. Besides, four Scenarios of time step ahead predictions are further proposed to investigate the performance of LSTM network in the presence of exogenous inputs. The strategy adopted in this paper unveils unique characteristics of LSTM in short term prediction of wind speed when the training features are selected in different combination.

DATA PREPARATION
Predication of wind speed requires the use of weather parameters temperature, atmospheric pressure, and relative humidity. Although, highly correlated parameter has the strongest impact on the wind data, it usually not a good practice to include such data in ANN forecast in the presence of other parameters [27]. However, in this study, wind direction data is not considered because of its even extremely high fluctuating nature. Furthermore, it does not add to the wind speed intensity as well. The hourly mean data is obtained from a meteorological station located in Dhahran, Eastern Province, Saudi Arabia and covers a period of 40 years from 1980 to 2019 resulting in 342,624 data points out of which 100,000 have been used for training.  The erratic nature of the whether parameters, hourly variation with time (during January and June months for 2018), is shown in Figs. 1, 2 and 3. Higher magnitudes of wind speed values are observed in June compared to January (Fig. 1). This indicates the wind speed seasonal variability which must be addressed by the models. Higher values of atmospheric pressure measured are observed during winter (January) compared to summertime (June) as shown in Fig. 2. The effect of height (2m and 10m) and the season on ambient temperature values is clearly visible in Figs. 3 (a & b). Two temperature measurements are considered at heights of 10m and 2m because vertical temperature difference causes the movement of the air masses [28,29].

Data Processing
The original data include the wind speed, temperature at two heights, wind direction and pressure on hourly averaged basis. All the column data is scaled to a range of [0, 1] to obtain the best network performance. Since the inputs to the network is the past time data series, so to obtain better prediction accuracy, previous 60 hours measured values are considered. Based on the input data and the models, the future predictions are made under four Scenarios which are 1, 2, 5, and 7 hours ahead for all the cases presented here.
As an instance, Case 2 utilizes wind speed data measured at 50m, and temperatures at 10m, and at 2m to predict the future wind speed. To demonstrate the accuracy and predictability of the models, several batches, each of 60 input data points, are selected randomly from the total of 100,000 data points and next hour data as output.
Although other values of input data points such as 30, 45 and 75 were also tried but 60 gives the best trade-off between accuracy, and training time. This gives an input shape of (99940, 60, 3). The output is the corresponding future wind speed of shape (99940, 1). From this data, 5000 randomly selected data points are used as the testing data and the remaining as training sets, respectively.

Network Training
The network was trained based on backpropagation through time (BPTT) using the Adam optimizer. The mean squared error (MSE) metric is used for the loss function. The goal of the optimizer is to minimize the MSE between the predicted future and the measured values. Furthermore, we desired that the network be retrained several times if the value of mean absolute error (MAE) exceeds a predefined some threshold value (0.5 m/s). If this is not achieved after several trials, then the best of the trained results will be retained. During training, a batch of 128 data points are utilized and the network is trained for 40 epochs. For all the cases, training took on for an average of 28 minutes on an intel Core i7, 8th Generation CPU with 16 GB of RAM running on a windows 10 OS.

METHODOLOGY
Each parameter (wind speed, temperature, and pressure) of the training sample is modeled as time series used as input to the models. The time series samples are made of n observations [x 1 ,x 2 ,…,x n ] which are used to predict the wind speed in future time domain. Five models are developed; each model is trained for one-hour ahead prediction and thereafter-tested 1 hour, 2 hours, 5 hours and 7 hours ahead. This implies that the input time series is a function of the past historical hourly mean values.
The training input data for each sample is modeled as a function F(.), as represented in Equation (1): where i is the number of input parameters being considered, and t is the number of sample observations.

Network Training
ANN has network architecture consisting of neurons, connecting strength, nodes properties, and updating rules [30,31]. The neurons have natural ability to store and figure out experimental knowledge which can be used to validate future occurrences [32]. Some of the unique attributes of the ANN are its capabilities to pro-cesses information with very high speed, mapping, tolerance for faults, robustness and generalization. Thus, an ANN is excellent in performance when it comes to system identification, system modeling, optimization, and prediction [33]. The ANN has been used to solve complex nonlinear engineering problems in real-world [32][33][34][35]. The ANN model is made of parametric components like weights (w ij ), connecting synapses or links, bias (b j ), and activation function f(*). These parameters relate the input x i to the output y j as shown in Fig. 4 and expressed in the following equations. Each input, x i is multiplied by the weight, w ij to give the summation output s j , Equation (2a). The output of the summation applied to the activation function to give the final output signal y j , equation (2b) which is the desired limit or range of the amplitude. Example of activation functions listed in [33] are linear, sigmoid, Gaussian, and Gaussian complements which chosen based on specific problem.  where j and k are the numbers neurons and synapses respectively.
As suitable for this problem, a type of neural network proposed in this study is a recurrent neural network called long-short term memory (LSTM) networks. A typical structure of a LSTM is shown in Fig. 5.

Long-short Term Memory Networks
The LSTM NN was initially proposed in [36] and it is a special type of recurrent network. The LSTM is based on the principle that the status of a current cell can be affected by the status of the previous cells. This is in fact, a recurrent neural network. This description is depicted in Fig. 5. An LSTM cell has an output gate, input gate, and a forget gate layer also called the sigmoid layer. The function of the input gate is to control the amount of data that goes into a cell. The forget gate regulates the amount of the values left in a cell while the output gate together with values in the cell defines the output of an LSTM.
The LSTM can extract relevant information from streams of data, remember them, and use them to predict future values [37]. The tanh network ensures that values stay between -1 and 1, thus regulate the NN output. The cell state acts as hardware. It can carry relative information through the sequence chain. It is essentially the memory of the network. Information from the earlier time steps carried to the previous time steps, this helps to reduce the short-term memory [38].
The gates are types of NN that determine the information that are allowed in the cell state, they learn information that would be kept or forgetting during training. The activation function in the gates is sigmoid like the tanh function. The sigmoid squeezed information between 0 and 1. This helps to decide what data to remember or to forget. When a value is multiplied by zero, it is forgotten and when it is multiplied by 1, it is remembered. So, the gates help to regulate information flow within a cell.
The forget gate ( Fig. 6(a)) decides which information is kept or discarded. Information from the previously hidden state, h t-1 is combined with the one in the current state and then fed into the forget gate. The output of the forget gate is passed into the sigmoid (3) and the information in the hidden state is passed on for further processing. Information is further processed in the next stage which is made of a sigmoid layer, known as the input gate and a tanh layer. The input gate and a tanh layer, shown in Figure 6(b) determines what information is stored in the cell state. While the tanh layer creates a vector of new candidate to be added to the state, the sigmoid layer decides the value to be updated. The output from the sigmoid output decides which information to be kept from the tanh activation output. Next, the cell state is calculated using the information from the gates and the hidden state using Equation (1). Because the hidden state has information about the previous inputs and it helps in prediction. The tanh output is multiplied with the sigmoid output (Fig. 7) to determine the information in the hidden state ( which is now the output. The new cell and the hidden states then carried over to the next time step.  The steps involved in calculating the inputs, forget, and the output of the gates are described as follows: Step 1: We calculate the inputs of the three gates and the candidate cell using Equations (3) to (6).
[ ] 1 1 , net net net net are the candidate cell, forget gate, input gate, and output gate respectively.
where C t is the new cell state, f t is forget gate, i t is the input gate, O t is the output gate and t C is candidate all at time t. The activation function σ( · ) and tahn( ) are defined in equation (12) ( ) Step 3: Finally, we calculate the output, of the LSTM using equation (14).

PROBLEM FORMULATION
The training samples consist of training inputs and targets. The inputs are the hourly wind speed (HWS), hourly temperatures (at 10m and 2m) and hourly pressure during 1980 to 2019. The training and the testing features are further described in the next subsections.

Training Parameters
The description of the training parameters and their combinations to achieve the different Cases is depicted in Fig. 8. Furthermore, the results of each cases compared and the effect of each exogenous parameter in the prediction of wind speed are investigated. The models are realized in each case and are tested to predict wind speed output in four Scenarios, which include predictions 1 hour, 2 hours, 5 hours and 7 hours ahead of time.

i. Case 1
In this case, the training for the LSTM model is made of only one feature set of historical HWS data measured at height 50m for training and target.

ii. Case 2
This is the first case with exogenous inputs. The input includes HWS in addition to temperatures measured at 10m and 2m. The target is the HWS used in Case 1.

iii. Case 3
This case utilizes the wind speed and the atmospheric pressure as input. The target sample corresponds to the hourly measured wind speed data.

iv. Case 4
In this case, the LSTM model utilizes wind speed, temperatures at 10m and 2m, and atmospheric pressure as input. It includes all features being considered in this study and the target is HWS.

v. Case 5
The input in this case include the temperature at two levels and pressure only. The target is the measured HWS.

Forecast Error Matrix
The forecast error for wind speed defined as the difference between the measured and the predicted wind speed values. The metrics for the evaluation of the forecast error include mean absolute error (MAE), rootmean-squared error (RMSE), mean squared error (MSE), and symmetric mean absolute error. In this paper, only MAE and MSE are used as the error metrics and defined in equations (15,16).
where e t is the absolute difference between the forecasted and the measured values of the wind speed for the testing period. The MSE computed to measure the level of deviation between the forecasted and the measured wind speed recorded. The forecast errors are desired to be less than 0.5m/s, otherwise the training is repeated until a better model is achieved and then the final testing is done. Throughout the rest of this article, measured wind speed is used interchangeably with real speed or actual wind speed and predicted value is interchangeably used as forecasted value.

RESULTS
The results of all the cases arepresented in this section. The training MSE and prediction scatter plots for all cases are shown in Fig. 9 to Fig. 11. The MSE values decrease as the Epochs number increases and becomes constant (~0.0005) after 40 Epochs. Fig. 9 shows the training MSE plot recorded in Case 1 while Fig. 10 depicts the scatter plots between the actual and the predicted wind speed values. The scatter plots show that both values are closely matched confirming the effectiveness of the model. Also, Fig. 11 (Cases 1-4). Also, Fig. 12 (a-d) show the correlation scatter diagrams of each model in all Cases. The scatter plots tell at a glance the performance of each model. Accuracy of the forecast could be interpreted from the correlation and regression of each plot. Furthermore, correlation coefficients of all Cases and Scenarios are given in Tables 1 to 5. Also, Table 6 summarizes all training parameters, network algorithms and assumptions for all Cases and Scenarios discussed in this article. Each Case is further analyzed separately in Sections 5.1 to 5.6. In Case 1, the model is trained with hourly mean wind speed only, made of 100, 000 data points for one hour ahead predictions. The training performance of this case is shown in Figs. 9 and 10. The predictions are obtained for four different scenarios including 1, 2, 5 and 7 hours ahead of time. The error metrics obtained from the testing of this model is presented Table 1, where MSE has higher magnitudes compared to MAE values. However, these values tend to decrease as the prediction duration increases from 1 hour to 7 hours ahead.   Fig. 13(a) compares the measured and the predicted wind speed values for one hour ahead. The predicted values are in close agreement with the measured values with few exceptions. Overall, the predicted values follow the trend of the measured wind speed values which is a strong justification of the accuracy of the model. The same model used to predict wind speed values 2, 5, and 7 hours' head ( Figs. 13(b), 13(c), and 13(d)). It is observed from these Figures that the model is capable of predicting the wind speed values accurately up to 7 hours ahead of time. In all of these cases, except few exceptions, the predicted values followed the trends of measured wind speed values. This demonstrates robustness of the LSTM model in short term predictions. Furthermore, the results demonstrate that model realized by training input data sample for an hour ahead using LSTM method could be used to predict wind speed values up to 7 hours ahead without compromising the accuracy as long as the input is correctly fed into the model.

ii. Case 2
In this case, the model is trained with a combination of wind speed and exogenous (temperatures at 10m and 2m) inputs. The resulting MSE and MAE values are summarized in Table 2. Comparing to Case 1, MSE and MAE values are lower in this Case 2. This can be attributed to the effect of temperature on wind speed variations because the uneven heating of the earth surface with time causes the wind flow. Again, like in Case 1, MSE values are lower compared to MAE magnitudes and are seen to be decreasing with increasing prediction duration from 1 hour to 7 hours. The MSE value decreases from 0.1961 to 0.1824 (~7%) corresponding to prediction duration of 1 and 7 hours. One hour ahead prediction of wind speed compared with the measured values in Fig. 14(a). The predicted values have an excellent match with the measured ones and more importantly follow the trend of measured values with time. Furthermore, these results seem to be a bit better than those in Case 1 (Fig. 13(a)). Also, Figs. 14(c-d) which are obtained using the same model to predict the wind speeds at 2, 5 and 7 hours ahead show that the model performed a little better than that in Case 1. Moreover, the error metrics, summarized in Table 2, also indicated that forecast with exogenous parameters, temperatures at 10m and 2m in this, as inputs can perform better in LSTM networks. Additionally, the observation shows that the model realized for an hour ahead prediction can be used predict wind speed for of up to 7 hours ahead accurately.

iii. Case 3
In this Case, the model is trained with wind speed and pressure measured 50m and near ground surface as the inputs. Table 3 summarizes the magnitudes of MSE and MAE obtained in this Case. The values recorded are also below 0.5m/s and thus indicative of high per-formance. Fig. 11(c) shows that the MSE plot assume similar pattern with those obtained for Case 1 and Case 2. The comparison between the predicted and measured wind speed values (Fig, 15(a)    Under this case, the model is trained with combination of all features in Cases 1, 2, and 3 (wind speed, temperatures at 2m and 10m, and atmospheric pressure) and is tested for all Scenarios, as in previous cases. Table 4 provides the model performance in terms of MSE and MAE values. It is evident from the data in Table 4 that for Scenario 1, (one hour ahead predictions) and Scenario 2, (two hours ahead predictions), this model (Case 4) performed better than the Cases 1 and 3. For 5 hours ahead predictions (Scenario 3), the model in Case 4 performed equally well compared to models for Cases 1 to 3. However, MAE values remained almost in the same range as in Cases 1, 2, and 3 for all Scenarios. To further strengthen the predictability of this model, the predicted values under all scenarios are compared with measured hourly mean wind speed values in Figs. 16(a-d). The patterns recorded in all Scenarios showed that the predicted values have an outstanding match with measured values. It is further noted that the predicted values follow the changing trends of measured wind speed values with time for all Scenarios. As a visual observation, trend following nature in the present Case 4 seems to be even better than those in Case 1 (Fig. 13), Case 2 (Fig. 14), and Case 3 (Fig. 15). Hence it can be said that addition of temperature and pressure as input parameters along with the wind speed enhance the model predictability and can be used if measurements are available. This a special case, for the sake of completion, where exogenous features, excluding the wind speed, are used as input to train the model while the target remains the wind speed. The error metrics, summarized in Table 5 for this Case 5, show outrageously high values of MSE and MAE compared to all the previous Cases discussed earlier. Also, the plots of the measured wind speed versus predicted show a poor match but still follows the trend of the measured values (Figs. 17(a-d)). It is observed that, this model's predictive accuracy is poor relative to other models presented in sub-sections 5.1 to 5.4. Therefore, for a short-term wind speed prediction using an LSTM, the model should not be trained based on exogenous parameters only as input.     In this study, the LSTM method is used for short term (1 hour, 2, 5, and 7 hours ahead) prediction of wind speed. Five models are developed depending on the training input parameters (Case 1: wind speed only; Case 2: wind speed and ambient temperatures at 10m and 2m; Case 3: wind speed and atmospheric pressure; Case 4: wind speed, temperatures and pressure; and lastly Case 5: temperatures and pressure values only). The model performance is evaluated for four scenarios (Scenario 1: 1 hour ahead of time, Scenario 2: 2 hours ahead of time, Scenario 3: 5 hours ahead of time, and Scenario 4: 7 hours ahead of time) using MSE and MAE values. Furthermore, the predicted values are compared with measured wind speed values with time to confirm the trend predictability of proposed models.

CONCLUDING REMARKS
The study found that LSTM is a powerful tool (based on MSE and MAE values and trend produce-ability) for short term wind speed prediction. Furthermore, it is observed that the accuracy of LSTM improves as the number of training exogenous features increases. However, the algorithm requires that wind speed should be part of the input training parameters in order to enhance its accuracy. Moreover, the accuracy of prediction by LSTM network may be compromised if the prediction time ahead differs from the actual time step used for training the network. The accuracy of prediction deteriorates when only exogenous parameters are used as inputs for training the model.
Finally, the study recommends that LSTM may be used for short term prediction of wind speed with training input parameters as used in Cases 1 to 4. In further studies, an algorithm like the transformer can be chosen to replace the LSTM. Also, ensembling of sequence models may be considered to improve overall learning performance of the model.