A Combined Short-Term Forecast Model of Wind Power Based on Empirical Mode Decomposition and Augmented Dickey-Fuller Test

The high volatility of wind power time series is an important factor that affects its forecasting results. Hence, it is necessary to analyze and preprocess the historical data. To improve the accuracy of wind power forecasting, a two-predictor combined model based on two data processing algorithms, empirical mode decomposition and augmented Dickey-Fuller test, is proposed in this paper. First, the original wind power time series is decomposed into several sub-components by the empirical mode decomposition algorithm. Second, the augmented Dickey-Fuller test is employed to test the stationarity of each sub-component, and the sub-components are divided into two categories: stationary and non-stationary. Third, the stationary components are forecasted by least-square support vector machine while the non-stationary ones are forecasted by the persistence model. Finally, the prediction value is the summary of the results of the sub-components. Three models, least-square support vector machine, the persistence model and the empirical mode decomposition-least-square support vector machine, are used to compare the performance with the proposed model on two real wind power datasets. The analysis results indicate that the proposed model can achieve higher forecast accuracy and stability than other models.


Introduction
Due to the advantages of being clean, resource-rich and renewable, wind energy is widely used in the field of renewable energy generation. To ensure the efficient consumption of wind power and not to increase the operation cost of the power grid, the accurate prediction of wind power output is of great importance. However, wind energy is strongly influenced by natural factors, thus leading to its high volatility and uncontrollable characteristics. Hence, wind power forecast becomes a challenge research topic [1]. In recent years, many researchers have proposed a number of prediction methods for wind power time series [2]- [5]. In terms of data analysis, these forecast methods can be classified into two categories: physical methods and statistical methods. For physical methods, they take abundant physical elements into considerations, such as air pressure, temperature, obstacles and geographic information, to achieve the best forecast performance [6]. Whereas, the statistical models excavate the characteristics of measured historical wind power data, then conduct prediction based on the data analysis results [6]. The persistence model is the simplest statistical forecast model, the principle of which can be briefly explained as follows: The prediction value of the next sampling point is equal to the measured actual value of the current point. Owing to its advantage of tracking the trend of time series, the persistence model can even outperform other single forecast models in short-term and ultra-short-term wind power forecast [7], [8]. Support vector machine has been used in wind power forecast for a long time. The model takes historical data as input, and then establishes the prediction model by fitting and training the data [9]. Least-square support vector machine (LSSVM) is the least square session of support vector machine, which simplifies the original formulation of the latter while preserving its good properties in prediction [10], [11]. LSSVM can fit the historical data well and give satisfactory prediction of the time series in short-term forecast. However, with the increase of look-ahead steps, it cannot maintain the performance of data-fitting. Therefore, to realize both the simplicity and forecast ability of a forecast model, this paper combines the persistence model and LSSVM together to develop a two-predictor short-term wind power forecast model. The high volatility of wind power data is an important factor that affects its forecast results [12]. Reducing its volatility is an effective way to improve its forecast accuracy [13]. Paper [14] uses discrete wavelet transform to process the wind speed data and select artificial neural network to make an efficient wind speed forecasting. So far, many decomposition algorithms have been developed to decompose the original time series into several components [15]- [23]. Paper [15] applies real-time decomposition to decompose the original data, and paper [16] combines data decomposition with feature selection to improve forecasting accuracy. Other decomposition algorithms include variational mode decomposition [17], autoregressive integrated moving average model [18], and fast ensemble empirical mode decomposition (EMD) [19]. In addition, some researchers extract the mean trend component from the original data and predict the mean trend and stochastic components respectively [20]- [22]. The empirical mode decomposition can decompose a wind power signal into several intrinsic mode function components and a residue, thus decreasing the volatility of the original signal [23]. Nevertheless, the stationarity of the components are not the same. In order to achieve a satisfactory forecast result, it is necessary to choose a suitable forecast model for every one of them. The augmented Dickey-Fuller test (ADF) is an efficient unit root test method that evaluates the stationarity of a time series [24], therefore, it is utilized in this research. This paper employs EMD to decompose the original wind power signal, and selects the ADF test to estimate the stationarity of each component. At last, the non-stationary components are forecasted by the persistence model owing to its forecast stability while the stationary ones are predicted by the LSSVM. The final forecast value is the summary of the prediction results of all the components and the residue.

Empirical mode decomposition (EMD)
EMD is an effective method which can deal with the non-stationary and nonlinear data by decomposing the signal into several intrinsic mode function components. Intrinsic mode function is a function that satisfies the following two conditions [23]: (1) in the whole data set, the number of local extreme points and zero crossings of the function must be equal or at most have one difference; (2) at any point, the mean value of the envelope of the local maximum (upper envelope) and the envelope of the local minimum (lower envelope) must be zero. With decomposition, EMD can stabilize the nonstationary signal, and since the decomposition is based on the data, this method is intuitive and adaptive. Let () Wt represent the original wind power time series, the implementation steps of empirical mode decomposition are described as follows [23]: Step 1: Identify all the local maxima and local minima of () Wt. Then connect the local maxima and local minima, respectively, by a cubic spline to obtain the upper and lower envelopes. The mean of these two envelopes is designated as 1  SD , of () k ht satisfies the following criterion: where k is number of iterations, and a typical value for d can be set between 0.2 and 0.3. Then the () k ht is an intrinsic mode function component we obtain.
Step 3: Repeat steps 1-2 until the residue, () n rt, becomes a monotonic function that no more intrinsic mode function can be extracted from it. In this way, the original wind power signal is decomposed as follows: where ( ) ( 1, 2, ..., ) i c t i n = are the intrinsic mode function components of () Wt and () n rt is the final residue.

Augmented Dickey-Fuller (ADF) test
ADF test is a widely used unit root test method to evaluate the stationarity of a time series. By calculating the t statistics of the parameters of a time series model and comparing them with the ADF distribution, ADF test can evaluate whether a time series is stationary or not. For a time series t Y , ADF test will investigate the following three models [24], labeled as model (a), model (b) and model (c): 10 1 where  is the unit root test parameter, and   Step 1: Estimate the appropriate form of the three models at the same time. For every model, select an appropriate hysteretic differential item so that the model's residue is a white noise. Set the null hypothesis 0 :0 The testing order of the three models is model (c), model (b), and model (a).
Step 2: Test model (c) first. Calculate the t statistics of parameter  , and compare them with the ADF distribution critical value v t . If v tt   , the null hypothesis can be rejected, which indicates that there does not exist a unit root in the model and the time series is stationary; and the ADF test ends here. Otherwise, move on to test the next model.
Step 3: Repeat step 2 until the null hypothesis is rejected or all three models have been tested. In the process of ADF test, as long as one of the results of the models can reject the null hypothesis, the time series can be considered stationary. On the contrary, if the test results of the three models can not reject the null hypothesis, then the time series is considered non-stationary. In this paper, ADF test is used to test the stationarity of the components of wind power time series, which are obtained by EMD. Then, based on the test result, each component will be forecasted by either the LSSVM model or the persistence model. The final forecast value is the summary of all the components' forecasting results. The framework of the proposed model is shown in Fig. 1.

Performance evaluation
The evaluation indicators are of great importance for model performance verification. This paper selects two frequently used criteria to evaluate the performance of the models: normalized mean absolute error (NMAE) and normalized root mean square error (NRMSE). NMAE measures the absolute error between the predicted value and the actual value, which directly reflects the prediction ability of the model; while NRMSE reflects the stability of the prediction model. A smaller value of NMAE/NRMSE indicates higher prediction accuracy/stability of a model. The indicators are defined, respectively, as follows:  (8) where N is the size of the testing data set, Y is the installed capacity of the wind farm. i y and ˆi y are the prediction value and actual value, respectively.

Forecast results and analysis
The wind power data collected in February and April from Elia are used to evaluate the proposed forecast model. For each dataset, four forecasting algorithms are selected to carry out the comparative analysis: LSSVM, the persistence model, EMD-LSSVM and the proposed EMD-ADF-LSSVM/persistence model. The forecast performances of the models for 1, 4, 12, 20 look steps are displayed in Table 1. It is obvious that the proposed model maintains the smallest NMAE and NRMSE under these prediction steps for the two datasets. Fig. 2 shows the 1-step ahead forecasting results of the two datasets, which indicates that the proposed model based on EMD and ADF test can achieve more satisfactory forecast results than other models, especially those having only one predictor and lack of data processing. Further, we discuss the cumulative error of the proposed model. Compared with the other three models, where two of them are single and the other is hybrid with EMD, the proposed model can counteract the cumulative error caused by the summary of all the sub-components. This is because that the prediction result of the persistence model is a measured historical data value. For the components forecasted by the persistence model, their cumulative results are consistent with the counteract value obtained by direct prediction without decomposition. Hence, the forecast error of these components will not be affected by the number of the components. In this case, the cumulative error of the model is mostly caused by LSSVM. However, among the components obtained by EMD, there are fewer non-stationary components (for the first dataset, there is only one among 9, and for the other datasets, there are 3 among 10), so the final cumulative error is small. Compared to the persistence model, by combining with EMD and LSSVM, an average increase of 41.60% and 33.67% in NMAE, and an average increase of 37.17% and 32.62% in NRMSE for the two datasets, respectively. Compared to EMD-LSSVM, the proposed model can control the cumulative error and improve forecast stability by combining with the persistence model, and achieve an average increase of 11.69% and 12.32% in NMAE, and an average increase of 12.69% and 9.89% in NRMSE for the two datasets, respectively. Therefore, the proposed model is proved to have multiple advantages of simple calculation, satisfactory prediction effect and stable performance.

Conclusion
This paper proposes a combined short-term wind power forecast model based on empirical mode decomposition and the augmented Dickey-Fuller test, with two predictors: the least-square support vector machine model and the persistence model. Empirical mode decomposition is used to decompose the wind power signal into several intrinsic mode function components and a residue. Afterwards, the augmented Dickey-Fuller test is conducted to evaluate the stationarity of each component obtained by empirical mode decomposition. Lastly, the stationary and non-stationary components are delivered to the predictors of least-square support vector machine and the persistence model, respectively. The final forecast value is the summary of all the forecast results of the components. Three models, LSSVM, the persistence model and EMD-LSSVM are used to give a comparative analysis with the proposed method.  The results indicate that: (1) empirical mode decomposition can significantly improve the forecast accuracy; (2) the application of the augmented Dickey-Fuller test and the collocation of the leastsquare support vector machine model and the persistence model can achieve better forecast ability than other models. In further research, we will look for more data processing models to analyze the sub-components obtained by empirical mode decomposition, and try to reduce the number of components by integrating some of them in accordance with their features, thus achieving less calculation expense. Moreover, we will seek to develop more suitable forecasting models to improve the prediction accuracy. This project was supported by National Natural Science Foundation of China (No. 52077081).