A combined model for short-term wind speed forecasting based on empirical mode decomposition, feature selection, support vector regression and cross-validated lasso

Background The planning and control of wind power production rely heavily on short-term wind speed forecasting. Due to the non-linearity and non-stationarity of wind, it is difficult to carry out accurate modeling and prediction through traditional wind speed forecasting models. Methods In the paper, we combine empirical mode decomposition (EMD), feature selection (FS), support vector regression (SVR) and cross-validated lasso (LassoCV) to develop a new wind speed forecasting model, aiming to improve the prediction performance of wind speed. EMD is used to extract the intrinsic mode functions (IMFs) from the original wind speed time series to eliminate the non-stationarity in the time series. FS and SVR are combined to predict the high-frequency IMF obtained by EMD. LassoCV is used to complete the prediction of low-frequency IMF and trend. Results Data collected from two wind stations in Michigan, USA are adopted to test the proposed combined model. Experimental results show that in multi-step wind speed forecasting, compared with the classic individual and traditional EMD-based combined models, the proposed model has better prediction performance. Conclusions Through the proposed combined model, the wind speed forecast can be effectively improved.


INTRODUCTION
As a sustainable and renewable energy alternative to traditional fossil fuels, wind power has attracted widespread attention and rapid development in recent years (Hu et al , 2018). According to the statistical report of the Global Wind Energy Council, the world capacity is about 650.8 GW (Fu et al., 2020), of which the installed capacity in 2019 is 59.7 GW (Global Wind Energy Council, 2020). However, with the increase of grid-connected wind power, the stability of the power system will be challenged (Liu et al., 2018a). This is because wind power is closely related to the non-stationarity of wind speed. Accurate wind speed forecasting will provide support for wind power planning and control, and even help reduce the impact of unexpected events on the stability of the power system (Liu et al., 2018b). But due to the non-linearity and non-stationarity of wind, it is difficult to establish a satisfactory wind speed forecasting model. To this end, researchers have made great efforts to improve forecasting performance from different aspects, including basic predictive models, preprocessing methods, and combined or hybrid strategies.
For basic predictive models, a variety of methods has been presented, mainly including physical models, statistical models, and machine learning. The physical model usually uses physical parameters such as temperature and pressure to predict wind speed (Heng et al., 2016). Numerical Weather Prediction (NWP) is one of the representative technologies. However, due to the weak correlation between physical parameters and short-term wind speed, this type of model can only be used for medium-and long-term wind speed forecasting, not for short-term wind speed forecasting. In the short-term wind speed forecasting, the wind speed is generally predicted by analyzing the inherent laws of historical wind speed data Liu et al., 2018b).
The statistical model is a method widely used in short-term wind speed forecasting, which uses historical data to predict wind speed. Commonly used statistical models have autoregressive (AR) (Lydia et al., 2016a), autoregressive moving average (ARMA) (Torres et al., 2005) and autoregressive integrated moving average (ARIMA) . Kavasseri & Seetharaman (2009) proposed an f -ARIMA model for wind speed forecasting, and claimed that compared with the persistence model, their model has significantly improved the prediction accuracy. Ait Maatallah et al. (2015) developed a Hammerstein autoregressive model to predict wind speed, and verified that their model has a better root mean square error (RMSE) than ARIMA and ANN. Poggi et al. (2003) developed a model to predict wind speeds of three Mediterranean sites in Corsica based on AR, and proved that the synthetic time series can retain the statistical characteristics of wind speeds. Also, Lydia et al. (2016b) presented a short-term wind speed forecasting model by combining linear AR and non-linear AR. In general, the statistical model is based on the linear assumption of data, while the wind speed series have non-linear characteristics, which makes those methods unable to effectively deal with the non-linear characteristics of wind.
To solve the problem, machine learning is introduced by researchers to predict wind speed. Normally, machine learning is used as a predictive model or parameter optimization, mainly includes the evolutionary algorithm, extreme learning machine (ELM) algorithm, ANN algorithm and SVM algorithm. Wang (2017) presented a wind speed forecasting model by combining SVM and particle swarm optimization (PSO). Zhang et al. (2019) combined online sequential outlier robust ELM with hybrid mode decomposition (HMD) to predict wind speed. Wang, Li & Bai (2018) developed an error correction-based ELM model for short-term wind speed forecasting. Liu et al. (2020) introduced the Jaya-SVM (Jaya algorithm-based support vector machine) into wind speed forecasting. Krishnaveny et al. (Nair, Vanitha & Jisma, 2017) exploited the performance of three different models, i.e., ANN, ARIMA and hybrid model, in wind speed forecasting. Azeem et al. (2018) investigated the KNN-based and ANN-based models for wind speed forecasting. Recently, deep learning, a new branch of machine learning, has received extensive attention. It has been widely used for regression and classification problems. According to the literature, deep learning can abstract the hidden structure and inherent characteristics of data compared with shallow methods. Khodayar & Wang (2019) introduced a scalable graph convolutional deep learning (GCDLA) for wind speed forecasting. Wang et al. (2016a) investigated a deep belief network model for wind speed forecasting. Khodayar & Wang (2019) combined rough set theory and restricted Boltzmann machines presented a wind speed forecasting. Hong & Satriani (2020) based on a convolutional neural network developed a day-ahead wind speed forecasting model. Although researchers claim that deep learning can achieve better performance, these methods are computationally intensive and prone to overfitting on small data sets.
In addition to these basic forecasting models, preprocessing methods such as feature selection (FS) are also introduced in wind speed forecasting. This is because in shortterm wind speed forecasting, the lag of historical wind speed is usually used as the feature, which may lead to a certain degree of redundancy. FS is used to select the best input for the basic predictive model, so that the model can obtain better generalization performance . For example : Paramasivan & Lopez (2016) employed a ReliefF feature selection algorithm to identify key features, and then used a bagging neural network to predict the wind speed. Niu et al. (2018) presented a multi-step wind speed forecasting model using optimal FS, modified bat algorithm and cognition strategy. Botha & Walt (2017) combined FS with SVM to predict short-term wind speed. Kong et al. (2015) combined feature selection and reduced support vector machines (RSVM) for wind speed forecasting.
Due to the unstable nature of wind, the model of combined-or hybrid-signal processing technology has become the mainstream of wind speed forecasting. Wherein the signal processing technology is usually employed to decompose the wind speed to reduce or eliminate the instability. Commonly used signal processing techniques have empirical mode decomposition (EMD), variational mode decomposition (VMD) and wavelet transform (WT). Wang et al. (2016b) decomposed wind speed into stable signals using ensemble empirical mode decomposition (EEMD). Sun & Wang (2018) developed a fast ensemble empirical mode decomposition model to improve the accuracy of wind speed forecasting. Tascikaraoglu et al. (2016) based on WT proposed a wind speed forecasting model.  adopted an empirical wavelet transform (EWT) to extract key information in wind speed time series. Yu, Li & Zhang (2017) explored the performance of EMD, EEMD and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) in wind speed forecasting.
In the field of wind speed forecasting, there are mainly three forecast scenarios: short-term forecasting, medium-term forecasting and long-term forecasting. Among them, short-term wind speed forecasting is essential for estimating power generation, and it is difficult to predict accurately due to the nonlinearity and instability of wind speed. Therefore, in the study, we tried to develop a new model to forecast short-term wind speed. The originality of this model is to propose a combined model of EMD, FS, SVR and Cross-validated Lasso (LassoCV) for multi-step wind speed forecasting.
The framework of our study is as follows: (a) EMD is used to extract the intrinsic mode functions (IMFs) from the original wind speed time series; (b) FS and SVR are combined to predict high-frequency IMF; (c) LassoCV is used to complete the prediction of lowfrequency IMF and trend.
The main contributions of the research are as follows: 1. A novel model based on EMD, FS, SVR and LassoCV is proposed to improve the accuracy of multi-step wind speed forecasting, where EMD is used to extract IMFs from the original wind speed data to reduce the non-stationarity of wind speed. 2. Based on the principle of EMD, the first IMF component decomposed by EMD contains most of the high-frequency information, and an algorithm with good generalization performance is usually required for prediction. We combine FS and SVR to predict the high-frequency IMF (i.e., the first IMF) component. 3. Compared with the first IMF component, the frequency of the other IMF components decomposed by EMD is much lower and presents a Sin-like curve. Linear regression usually gets better performance. We introduce LassoCV to complete the prediction of low-frequency IMFs and trend. The paper is as follows: The framework of the proposed model and the principles involved are introduced in 'Methods'. 'Results' describes the experimental data used in the paper, and the comparison with the classic individual models. 'Discussion' discusses the effectiveness of EMD. 'Conclusion' concludes the study.

The whole process of the proposed model
The architecture of our proposed model is shown in Fig. 1. The whole process is as follows: 1. Use EMD to decompose wind speed into a series of IMFs. EMD algorithm is introduced in 'Empirical model decomposition' 2. Combine FS and SVR to predict the high-frequency IMF obtained by EMD. FS and SVR algorithms are provided in 'Feature selection' and 'Support vector regression', respectively. 3. Use LassoCV to complete the prediction of the low-frequency IMF and trend.
LassoCV algorithm is listed in 'Cross-validated lasso'. 4. Performance evaluation. The performance indicators are introduced in 'Prediction performance criteria', and the experimental results and analysis are given in 'Results' and 'Discussion'.

Empirical model decomposition
Due to the non-stationarity, intermittent and inherent nature of wind speed, it is difficult to directly predict the future wind speed. One possible solution is to decompose different frequencies from chaotic wind data (Bokde et al., 2019) and use models to predict them separately. Based on this idea, the study introduces signal processing technology to decompose wind speed. Common signal decomposition algorithms include Wavelet transform, morphology filters, EMD and many others. Wavelet transform is not adaptive and follows the prior knowledge of its mother wavelet, so somewhat limits its ability to extract nonlinear and non-stationary components from the data. Similarly, the morphology filters have to select the shape and the length of the structural element.
There is no uniform standard and depends on human experience, whereas EMD has received great attention from researchers because of its superior performance and easyto-understand. Therefore, in this study, we used EMD for preprocessing the wind speed. EMD is essentially a non-linear signal analysis method that can handle non-linear and non-stationary time series (Huang et al., 1998). EMD uses the time-scale characteristics of the data to decompose the signal, and does not need to set any basis functions in advance. In theory, EMD can be applied to any type of signal. Since EMD was proposed, it has been rapidly applied to many different engineering fields such as marine and atmospheric research, seismic record analysis and mechanical fault diagnosis (Gao & Liu, 2021).
The basic idea of EMD is to decompose non-stationary time series signals into a series of IMFs along with a residue (Huang et al., 1998). The IMF should meet two principles: (1) the number of extreme and zero values must be equal or differ by at most one; (2) the average value of upper envelop and lower envelope must be zero (Ziqiang & Puthusserypady, 2007). Let s(t ),t =1 ,2,...,l be a time series. EMD decomposition steps are as follows: Step 1: Identify the local minima and maxima of the time series.
Step 2: Use cubic splines to interpolate local minima and maxima values to generate lower s l (t ) and upper s u (t ).
Step 3: Computer the average envelope of the upper and lower envelopes Step 4: Subtract the average envelope from the original time series h(t ) = s(t ) − m t Step 5: Check h(t ) if meets the two principles of IMF. If so, treat h(t ) as the new IMF c (t ) and calculate the residual signal r (t ) = s(t ) − h(t ). Otherwise, replace h(t ) with s(t ), and then repeat steps 1 to 5.
Step 6: Set r (t ) as new s(t ) and repeat steps 1 to 5 until all IMFs are obtained.
Through the whole process, a set of IMFs from high to low frequency can be extracted from the time series. Therefore, the original time series can be expressed as: where n is the number of IMFs. c i (t ) refers to the IMF, which is periodic and almost orthogonal to each other . r n (t ) is the final residual representing the trend of s(t ).

Feature selection
After obtaining the IMF components of wind speed, we need to predict it. In the study, we use the observed and lag of the IMF components as the raw features, respectively forecast each IMF component, and add all the predicted IMF components to get the final wind speed. Despite, the raw features contain sufficient information for forecasting, some irrelevant or partially relevant features in the raw features may have a negative impact on the model. To avoid the impact, a common strategy is to use feature selection to remove irrelevant features. Commonly used feature selection algorithms include filter method, wrapper method, heuristic search algorithm, embedded method (Chandrashekar & Sahin, 2014). In this study, we use the filter method. In order to obtain scores of different variables, we use the univariate linear regression test to calculate the correlation between features and output (Liu et al., 2019b), which is defined as: where X is an N × M matrix, each column is a feature. y is the N × 1 vector of the output we are interested in. Based on the rank of correlation, the irrelevant or partially relevant features are removed.

Support vector regression
The support vector machine (SVM) is a learning method based on structural risk minimization criteria, which can minimize the expected risk and obtain better generalization performance on unknown data. The support vector regression (SVR) is an extension of SVM for regression problems (Drucker et al., 1997). Due to the nonlinear and nonstationary nature of wind speed, SVR is widely used in short-term wind speed forecasting (Khosravi et al., 2018;Liu et al., 2019a;Santamaría-Bonfil, Reyes-Ballesteros & Gershenson, 2016). In the research, we use EMD to decompose the IMF components of wind speed, and the high-frequency IMF component contains the nonlinear and non-stationary part of wind speed. In order to obtain better generalization performance, we refer to existing research and use SVR to predict it. The main idea of SVR is to implement linear regression in the high-dimensional feature space obtained by mapping the original input through a predefined function ∅(x), and to minimize structure risks . Given a set of samples x i ,y i ,i =1 ,2,...,N , y i is the output and x i is the input. The objective is: where W and b are the regression coefficient and bias, respectively. C is the penalty coefficient. L x i ,y i ,f (x i ) represents the loss function, and R f is the structure risk. The corresponding constrained optimization problem can be expressed as: where ξ i and ξ * i refer to the slack variables. By introducing the Lagrange multiplier, the regression can be expressed as: where α i and α * i are the Lagrange multipliers that satisfy the conditions is the kernel function conforming to Mercer's theorem.

Cross-validated lasso
The Lasso algorithm is a regression model that can perform feature selection and regularization at the same time. It was originally proposed by Robert Tibshirani of Stanford University, with better prediction accuracy and interpretability (Tibshirani, 1996). Normally, in regression, we want to find a coefficient β = β 1 ,...,β p that satisfies the following: where Y is the dependent variable, X = (X 1 ,...,X N ) is the covariate, and ε is the unobserved noise. Lasso tries to minimize the objective function while forcing the sum of the absolute values of the coefficients to be less than a fixed value t (Hung, Yen & Li, 2016): Rewritten in the Lagrangian form: The L 1 -norm is used instead of the L 2 -norm in Lasso. Since the constraint region is diamond-shaped, it is more likely to pick the solution that lies at the corner of the region. As a result, the solution of the lasso is sparse, with some coefficients set to exactly equal to zero, that is, Lasso performs a straightforward feature selection.
To estimateβ lasso , the value of the penalty parameter λ is critically important. However, the optimal λ is not given automatically. If λ is chosen appropriately, Lasso achieves the fast convergence under fairly general conditions; On the other hand (chosen inappropriately), Lasso may be inconsistent or have a slower convergence. In the paper, we adopt the cross-validated Lasso algorithm, in which the penalty parameter λ is chosen based on cross-validation, and this is also the leading recommendation way in the theoretical literature (Park & Casella, 2008).

Prediction performance criteria
In the study the mean absolute percentage error (MAPE) , mean absolute error (MAE) and RMSE are used as performance indicators to evaluate the proposed wind forecasting model, which are defined as follows: where Y i andŶ i refer to the observed and predicted wind speed of data point i, respectively. For MAPE, MAE, RMSE, the smaller value, the better the performance.

Wind speed data
The wind speed data used in the study is gathered from two wind stations in Michigan, USA from September 2019 to October 2019. The number of data is 1,464. The initial 50 days from September 1, 2019 to October 20, 2019 are employed as input for model training, and the remaining days, i.e., from October 21, 2019 to October 31, 2019 are used to test. Figure 2 shows these two wind speed time series, and the corresponding statistics are listed in Table 1.

Experiments and result analysis
To verify the effectiveness of the proposed model, we compare it with five classic individual models, including Persistence, ELM, SVR and ANN, ARIMA. The 1-to 3-step forecasting results of these models under time series #1 and #2 are displayed in Figs. 3-4, and the corresponding error estimated results are listed in Tables 2-5. It is worth noting   2. In the 2-step forecasting, when wind station #1 is used, the proposed model has the lowest performance criteria, i.e., the values of RMSE, MAE, and MAPE are 0.7531, 0.5848, and 24.78%, respectively. In addition, for wind station #2, the proposed model still achieves the lowest performance criteria value. Take MAPE as an example, the value of MAPE is 22.99%, which is significantly lower than other models. 3. In the 3-step forecasting, the proposed model is still the model with the highest prediction accuracy, and the MAPE of wind stations #1 and #2 are 27.55% and 24.59%, respectively. Persistence has the worst RMSE value among these models, with MAPE of 57.64% and 47.99%, respectively. In general, under 1-to 3-step forecasting, the proposed model can obtain the best prediction performance compared with the classic individual models.

Compared with traditional EMD methods
As a nonlinear signal analysis method for processing nonlinear and non-stationary time series, EMD has been widely used in time series. To further verify the effectiveness of our  EMD model, we compare it with four widely used EMD models, namely EMD-ELM, EMD-SVR, EMD-SP-SVR, and EMD-ANN. It is worth noting that in this study, these methods used the same way as our proposed model, using EMD to decompose the wind speed, using a single classifier to predict each IMF component separately, and adding all the prediction results to get the final prediction wind speed. The prediction results and the error estimated results of these four EMD-based methods and the proposed method are displayed in Figs. 5-6 and Tables 6-9. Based on Figs. 5-6 and Tables 6-9, it can be observed that: 1. Compared with the above-mentioned classic individual models, the performance of the EMD-based method is significantly improved. Take wind station #1 as an example, in the 1-step forecasting, the value of RMSE of the EMD-based methods is around 0.60, while the classic individual model is around 1.20. After the wind speed is decomposed by EMD, the value of RMSE is reduced almost doubled. 2. For wind station #1, except for the MAE in the 3-step forecasting, the performance indicators obtained from the proposed model are significantly better than those  EMD-based combined models. For the 3-step forecasting, the performance of EMD-SVR and EMD-SVR-SP in MAE is slightly better than the proposed combined model, but in other evaluation indicators, the proposed combined model achieves a significantly better performance. Furthermore, EMD-ANN is always worse in MAPE as compared with the other three combined models, with MAPE of 23.55%, 27.67%, and 29.31% for 1-to 3-step forecasting. 3. For wind station #2, in 1-to 3-step wind speed forecasting, the proposed combined model obtains the best prediction results. The RMSE, MAE and MAPE in the 1-step forecasting are 0.5593, 0.419, and 17.10%, respectively. In comparison, among the other four EMD-based combined models, the EMD-ELM and EMD-ANN models   In total, the EMD-based method has obvious advantages over traditional methods, and the proposed method that using EMD, FS, SVR and LassoCV can achieve better performance.

Performance of SVR-SP and LassoCV on different IMFs
According to the EMD principle, the frequency of the IMF components is from high to low. The non-linear and non-stationary information of wind speed data is mainly concentrated in the high-frequency IMF, and the low-frequency IMF presents a Sin-like function curve. Based on its characteristics, in this study we use SVR-SP and LassoCV to predict IMFs of different frequencies. In order to verify the effectiveness of this hybrid EMD model, in this section, we take wind station #2 as an example to analyze the performance of the two methods on different IMF components. Table 10 lists the RMSE of SVR-SP and LassoCV on different IMF components. It is worth mentioning that in multi-step prediction, the prediction accuracy of the first step is more important than the other steps, which is of great significance for the accurate estimation of wind power. It can be seen from Table 10 that SVR-SP can obtain significantly better performance than LassoCV at high frequency (IMF1), while LassoCV can obtain better performance at low frequencies (IMF2∼IMF7, Trend), and its RMSE is already close to zero at IMF4. Moreover, SVR-SP has a risk of overfitting when predicting low frequencies, resulting in poor performance. In total, the proposed model that combines the EMD decomposition characteristics and the advantages of the algorithm can achieve better performance than the traditional EMD model.

Comparison of different signal decomposition techniques
Besides EMD, Variational Mode Decomposition (VMD) and Ensemble Empirical Mode Decomposition (EEMD) are also widely used in short-term wind speed forecasting. Here, we analyze the impact of different signal decomposition techniques on the performance of our proposed method. Table 11 shows the prediction performance of the three signal decomposition techniques on two wind stations. For wind station #1, it can be found that compared with VMD and EEMD, EMD obtains the best RMSE value in the 1-step forecasting. The performance obtained by VMD in the 1-step and 2-step forecasting is relatively close, but it drops significantly in the 3-step forecasting. EEMD inherits from EMD, similar to EMD, as the step size increases, the performance will decrease significantly. For wind station #2, EMD also obtained the best predictive performance. VMD has a similar conclusion on wind station #1, and the performance of the 1-step and 2-step forecasting is relatively close. It should be pointed out that in multi-step forecasting, the 1-step forecasting is usually used for wind energy estimation, and other steps are used to assist decision-making, so more attention is paid to the performance of the 1-step forecasting.

The impact of the number of selected features on performance
Feature selection is used to remove redundant features in the study. However, the number of selected significant features will more or less affect the short-term wind speed forecasting. In order to ensure the stability in the complicated industrial system, we analyzed the performance of our proposed method under the different number of selected features. Figure 7 shows the RMSE value between the number of selected features and the performance of our proposed method. It should be pointed out that in the study based on the characteristics of EMD decomposition we use FS and SVR to predict high-frequency component (i.e., IMF 1 ), and use LassoCV to predict low-frequency components. Feature selection is mainly used in the prediction of IMF 1 component. From Fig. 7, we can be seen that feature selection can slightly improve the performance of 1-step forecasting, but has little effect on 1-step and 2-step forecasting. Overall, as the number of selected features decreases, the generalization performance of the method will improve, but when the selected features are too scarce, the performance will drop sharply due to the deletion of useful features. In order to determine the appropriate number of features, by following (Bradley, Mangasarian & Street, 1998;Chizi, Rokach & Maimon, 2009) , this study uses cross-validation to select.

Performance under different signal-to-noise ratios
In the process of collecting wind speed, it is often affected by the environment and the anemometer itself, resulting in a certain amount of noise in the data. In order to verify the reliability of the method, we analyzed the prediction performance under different signalto-noise ratios (SNRs). Figure 8 shows the 1-step to 3-step prediction performance of the method from 30∼60db SNR. Take wind station #1 as an example, it can be seen from Fig.  8 that the performance of the proposed method is relatively stable under different signalto-noise ratios. The RMSE value of 1-step forecasting is about 0.6, the RMSE value of 2step forecasting is about 0.75, and the RMSE value of 3-step forecasting is about 0.85. In general, as the signal-to-noise ratio increases, the prediction performance of the proposed method will be improved. Similar performance also exists on site #2. These experimental results show that the proposed method can accurately predict wind speed under certain noise.

CONCLUSIONS
As a sustainable and renewable energy, wind power has attracted widespread attention and rapid development in recent years. Reliable and accurate wind speed forecasting will provide support for wind power planning and control. Due to the non-linearity and nonstationarity of wind, forecasting is still a difficult yet challenging problem. In the paper, we developed a new wind speed forecasting model based on EMD, FS, SVR and LassoCV. EMD is employed to extract IMFs from the original non-stationary wind speed time series. FS and SVR are combined to predict the high-frequency IMF. LassoCV is adopted to complete the prediction of low-frequency IMF and trend. By testing in two wind speeds obtained from Michigan, USA, the experimental results show that under 1-to 3-step forecasting the proposed model can achieve better prediction performance than the classic individual and traditional EMD combined models. Although the proposed model has achieved good performance, it still has some limitations. After the new data is updated, the model needs to be retrained. In future research, we will try to integrate online learning in our proposed method.