Multi-step-ahead Combination Forecasting of Wind Speed Using Artificial Neural Networks

Wind speed plays a very important role in the scheduling of power systems and dynamic control of wind turbine. Wind speed forecasting has become one of the most important issue for wind energy conversion recently. Adaptive and reliable methods and techniques for wind speed forecasting are urgently needed in view of its stochastic nature that varies from time to time and from site to site. Back Propagation (BP) algorithm-based neural network, which is a commonly computational intelligence method, has been widely used in forecasting fields. But it does have some deficiencies and uncertainties, for example, the hidden nodes of BP directly affect the network's generalization ability and accuracy, but there is not yet an effective theory to determine the number of hidden nodes. In order to solve the problem of BP network, a combination forecasting model with differently weighed BP networks is proposed in this study. Wind speed data collected from a New Zealand wind power plant is used for experiment research. Simulations show that the results of combination forecasting method is better than those of only one BP network.


INTRODUCTION
In recent years, wind power technology has become one of the fastest renewable energy generation. The installed capacity grows about 30% every year. It was reported that wind power capacity will reach 30,000 MW in 2020 in China.
Scholars at home and abroad have researched on the wind power generation extensively and profoundly. Short-term wind speed forecasting plays a significant role in the dynamic control of the wind power system. Accurate wind speed forecasting is urgently needed for timely scheduling, capacity evaluation and determination of reasonable genera-tion price (Li and Shi, 2010;Li et al., 2011).
Wind speed is difficult to predict because it is affected by a variety of complex factors, such as pressure, temperature, earth's rotation, geomorpho-logy and so on. Existing methods for short-term wind speed forecasting include Neural Network (NN), Kalman filter, wavelet analysis (Pan et al., 2008), moving average method, spatial correlation model, fuzzy evaluation, linear prediction, discrete Ha Bote transformation and so on.
The Back Propagation (BP) NN has been widely used for prediction because its good nonlinear quality, high fitting accuracy, flexible and effective learning method, fully distributed storage structure and hierarchy quality of the model structure. However, it is worth noting that its forecast accuracy is easily affected by some factors, such as network structure, learning rate, number of input-nodes and hidden-nodes, etc.
In BP algorithm, the error is propagated back to the input layer from the output layer, thus the more number of hidden layers, the less reliability of the error, especially near the input layer. If unreliable error is used to amend the weight, then the learning efficiency could be affected and finally result in slow convergence speed or even no convergence. Another disadvantage of BP algorithm is that the layer number, the hidden nodes number and the adjustment of weights are manually set and adjusted by trial, which may also increase the randomness of the algorithm. In order to improve forecasting accuracy of BP, methods like increasing the hidden layer number, the input nodes number or the sample size are used. But they also increase the system complexity and calculation amount, slow down the convergence speed and learning efficiency, reduce the generaliza-tion ability and prediction accuracy (Sfetsos, 2000;Sharm and Frieldander, 1984).
Systematic study on combination forecasting started from Bates and Granger and their research attracted the attention of scholars. The combination forecasting method gained further development in 1970s. In 1989, Journal of Forecasting, which is an international authoritative journal in the forecasting field, published a special issue on combination forecasting. Domestic scholars have also paid their attention on this field and have obtained some results in recent decades. The method of combination forecasting is to combine different prediction models with appropriate weighted average form to derive a new model including all the information of various models (Chen, 2008;Andrawisa et al., 2010). In order to effectively improve the prediction accuracy, the method of calculating weighted average coefficients become the key problem. It has been proved that combination forecasting is an effective way to increase prediction accuracy (Andrawisa et al., 2010). The most significant characteristic of this method is that it can overcome the shortcoming of one single method among so many ones. It is indicated in many literatures that the combination forecasting is more effective than a single prediction scheme. Structure change and parameter drift over time in time series makes it hard to select one single best prediction method. The combination forecasting can decrease those adverse effects effectively.
In order to increase the accuracy and generalization ability of the wind prediction model and overcome the difficulty in determining the hidden nodes of BP, a combination forecasting wind model is proposed in this study. Firstly, some BP models with different hidden nodes are developed. Then a multi-step combination forecasting model is established using corresponding theory. Results show that the model is more accurate and reliable.

BP NEURAL NETWORK
Principle of BP neural network: BP network is a multi-layer feed-forward network. It systematically solves the learning problem of connection weights between hidden layers and has become the most widely used method of neural network learning.
The central idea of BP algorithm is to adjust the weights to minimize the total network's error and ulteriorly minimize the mean of squared error between the actual output and the desired output. The learning process is actually the process of weights adjustment during the propagation of error.
The learning process of a multi-layer BP network includes both forward propagation and back propagation. During the process of forward propagation, the input information is first conducted from the input layer, then handled by the hidden layer and finally to the output layer. The neuron state of each layer only affects its next layer's neuron state. If the expected output cannot be obtained in the output layer, the error will be propagated back along the original connection channel. During the process of back propagation, the weights will be adjusted to minimize the error. Typical structure of BP neural network is shown in Fig. 1.
For learning sample p, the input of the -th i neuron in hidden layer is: where and denote the input and output of the j-th node, respectively. w ij is the connection weight between the -th j neuron in input layer and the -th i neuron in hidden layer. θ i is the threshold of the -th i neuron in hidden layer. M is the node number of input layer. The output of the -th i neuron in hidden layer is: where g(.) is activation function. The total input of the -th k neuron in output layer is: where, w ki = The connection weight between the -th i neuron in hidden layer and the -th k neuron in output layer θ k = The threshold of the -th k neuron in output layer q = The node number of hidden layer The actual output of the -th k neuron in output layer is: Parameters selection of BPNN: BP neural network is made up of multi-layer nodes. Some parameters, i.e., the node number of input layer, the layer number and the node number of hidden layer, the node number of output layer and so on are need to be selected in advance (Zhang et al., 1998). layer and it is easier to be trained than a BP network with more hidden layers, so a typical three-layer network is used in this study.
There is not a general and reliable method to determine the neuron number in hidden layer up to now. The neuron number in hidden layer is relevant to many factors, such as the node number in input and output layer, the complexity of the problem to be solved, the transfer function and the characteristics of the sample data. Existing methods of calculating the number of hidden nodes includes Kolmogorov theorem, one-way gradual change method and two-way method and so on. However, most of them are designed for arbitrary number of training samples and are aimed at the most adverse circumstances or samples containing noise. In fact, the hidden nodes number obtained using theses formulas differs significantly sometimes (Wu et al., 1998). Selection of the hidden nodes number is rather contradictory: on one hand, increase of the hidden nodes number can improve prediction accuracy; on the other hand, too much hidden node could result in excessive similar phenomena and reduce the generalization ability of the network

• Selection of node number of input layer:
Because input nodes contain important information of time series structure, the number selection for them is very important. Many scholars have studied on this over the past decades. However, there is not a method that can be better than the others in all cases. One widely used method is AIC (Akaike Information Criterion) and it is still under extensive discussion.
• Selection of node number of output layer: The choice of output nodes is relatively simple. For time series prediction, the number of output nodes is usually associated with the forecasting area. There are two main forecasting ways at present (Taieb et al., 2012): rolling prediction (only one output node is used) and MIMO(multiple input multiple output) prediction. These two methods have been already applied to multi-step prediction in some literature. For the rolling prediction, the prediction result is taken as an input of the prediction model for the next step prediction. There are many output nodes in the MIMO prediction method and each step prediction can be directly achieved in the model with only one time calculation. It has been pointed out that the rolling prediction method is obviously better than the MIMO prediction method when used in the sunspot forecasting (Weigend et al., 1992).

COMBINATION FORECASTING MODEL
Principle of combination forecasting: Combination forecasting model combines different individual prediction methods together in consideration of the characteristic of each individual prediction method. For example, suppose that there is a prediction method with large error but containing independent information of the system, if it is combined with another prediction method with relatively smaller error, then the prediction performance of the system can also be assured (Chen, 2008). Let � = 1 1 + 2 2 + ⋯ + be the combinational predictive value of x t and l 1 , 1 2 , …, l m are weighting coefficients and they are constrained by Eq. (5) Definition 2: If c m cannot be decreased by adding a single prediction model to the combination model, then the single prediction model is called redundant prediction model. That is to say, the optimal weight of the single prediction method is zero, which indicates that it only provides redundant information.
Inference 1: Simple average combination fore-casting method is at least a non-inferior combination forecasting.

Linear combination forecasting model with Sum Of Squared Error (SSE):
It is well known that SSE is one of the most important indexes to reflect the prediction accuracy. For the combination forecasting model in which the weight coefficients are limited by nonnegative constraints, its optimal solution possesses concrete mathematical expression, thus it can be calculated by using the formula directly. This kind of combination forecasting model has been widely applied in practical prediction fields currently. Let e t be the forecasting error at time t, then we have: Notice that there are nonnegative constraints on L, thus the original problem is a nonlinear programming problem.
If we ignore the nonnegative constraints, then the problem become a linear programming problem: The optimal solution of Eq. (7) can be obtained by using the Lagrange method:

COMBINATION FORECASTING STEP WITH BP NETWORK
Prediction procedure: The combination forecasting with BP network is used for multi-step wind speed prediction in this study. Purelin function is used between the input layer and the output layer. The Sigmoid transfer function is used in the hidden layer. Compared with other training methods, LM (Levenberg-Marquardt) training algorithm has fast convergence speed and high training accuracy, so it is used to train BP network. The specific step are shown in Fig. 2.
Evaluation Indexes: Prediction accuracy is closely related to the prediction error. In order to reflect the effect of the combination forecasting, the RMSE (root mean square error) and MAE (mean absolute error) are used as evaluation indexes:  Fig. 3. The input dimension is determined as five by using the method of phase space reconstruction. The number of hidden nodes is selected from 5 to 20. The prediction step is from one to six, respectively. The initial weights are chosen as simple average combination weights according to Inference 1 Table 1 demonstrates the prediction results of five different models with prediction step from one to step six. The five models respectively are: • Persistence method. In this model: The measured value of last step is taken as the prediction value at the present step.  • Combination model without the constraint of non-negative weights: It refers to the linear combination of prediction model with the minimum RMSE but without the constraint of nonnegative weights. Figure 4 to 9 show the comparison of the combination forecasting model output and the actual value with prediction step from 1 to step 6, respectively.
In the process of multi-step prediction, the persistence model is the worst one and the combination forecasting model is the best one. Although the actual meaning of negative weights is still under discussion in academia, it provides the best predictive results in this example, while the combination forecasting model with nonnegative weights provides the next best predictive results in this example. Experimental results show that, in the process of multistep wind speed forecasting, the combination forecasting model effectively improves the reliability and accuracy of the prediction results. BP networks with zero weight in the combination model indicate that those models just provide redundant information and do not contribute to the improvement of accuracy and reliability in the process of combination forecasting.

CONCLUSION
This study proposed a combination forecasting model using BPNNs with different number of hidden nodes, which is very helpful for wind power bidding strategy in short-term electricity market (Hu et al., 2012;Varkani et al., 2009). The main conclusions are as follows: • The combination forecasting model can effectively avoid poor quality of BP neural network resulted from inappropriate selection of the number of hidden nodes. • The combination forecasting model can improve the accuracy and reliability of multi-step ahead prediction. • The method with nonlinear weights is needed to study further and it is also the main focus of our future work.