Research on Hybrid Wind Speed Prediction System Based on Artificial Intelligence and Double Prediction Scheme

Wind energy analysis and wind speed modeling have a significant impact on wind power generation systems and have attracted significant attention from many researchers in recent decades. Based on the inherent characteristics of wind speed, such as nonlinearity and randomness, the prediction of wind speed is considered to be a challenging task. Previous studies have only considered point prediction or interval measurement of wind speed separately and have not combined these two methods for prediction and analysis. In this study, we developed a novel hybrid wind speed double prediction system comprising a point prediction module and interval prediction module to compensate for the shortcomings of existing research. Regarding point prediction in the developed double prediction system, a novel nonlinear integration method based on a backpropagation network optimized using themultiobjective evolutionary algorithm based on decompositionwas successfully implemented to derive the final prediction results, which enable further improvement of the accuracy of point prediction. Based on point prediction results, we propose an interval prediction method that constructs different intervals according to the classification of different data features via fuzzy clustering, which provides reliable interval prediction results. .e experimental results demonstrate that the proposed system outperforms existing methods in engineering applications and can be used as an effective technology for power system planning.


Introduction
Recently, based on the exhaustion of fossil fuels and increasing demands for environmental protection, wind power and other new energy industries have developed rapidly [1]. Wind energy has the advantages of renewability and cleanliness, so the comprehensive development and utilization of wind energy have a wide range of social and economic benefits. erefore, wind energy is a very promising resource around the world [2]. However, in practice, wind speed has the characteristics of inherent randomness and intermittence, meaning the effective and comprehensive development of wind systems is very limited, which poses a major challenge to the operation and management of power grids, particularly when considering wind power integration [3].
Generally, effective wind speed prediction can reduce the risks of wind power generation associated with uncertainty. Predicting wind speed accurately is a difficult task and major focus for wind farm decision-makers. It is very important to establish a suitable wind farm architecture and determine the nonlinear dynamic modes of wind speed precisely for the sake of efficient management and minimizing potential risk [4]. Wind speed prediction methods can be divided into four different types: ultrashort term (several seconds to 4 h), short term (4 to 24 h), medium term (1 to 7 d), and long term (more than 7 d). Different prediction horizons have different application values. For short-term and ultrashort-term wind speed prediction, the most critical impact is its role in power system operation [5]. For example, the output power of a wind farm in the United States can fluctuate by several hundred megawatts within an hour, which has a significant impact on the safe and stable operation of the system. To avoid potential problems, short-term wind speed prediction is crucial for providing data support for the reasonable dispatching of power resources, improving the efficiency of optimal dispatching, and optimizing the use of various power generation methods [6].
In recent decades, many wind speed prediction methods have been proposed in three main categories: physical methods, statistical methods, and artificial intelligence methods [7][8][9]. Physical methods mainly use detailed data from the lower atmosphere for analysis, mining, and prediction [10]. Such models are based on the basic information of wind turbines provided by numerical weather forecasting systems and the parameterization of physical phenomena according to initial conditions and nonlinear partial differential equation systems, which can be used to obtain a series of different meteorological parameters [11]. However, such models do not analyze and mine historical data, so they ignore potentially useful information in historical data [12]. Additionally, based on the inclusion of different parametric models within a single larger model, there are some difficulties in applying such models to wind farms. Using such models alone for wind energy mining and prediction will produce relatively large system errors [13]. Statistical models use large amounts of historical data to perform prediction without considering the impact of many meteorological factors [14]. In early research on wind speed prediction, traditional statistical models largely relied on autoregression (AR) [15] and its extensions (e.g., ARIMA [14,16], real ARIMA [17], and ARIMA-ARCH [18]). However, for wind speed series containing complex information, such models have difficulty in accurately mining patterns, particularly nonlinear patterns.
Since the initial development of artificial intelligence technology, intelligent prediction models have been designed and applied to wind energy forecasting [19], including artificial intelligence systems [20,21], support vector machines [22], and fuzzy logic methods [23]. Additionally, based on the strong nonlinearity of wind energy data, only nonlinear models have reasonable prediction ability [24]. However, based on the inherent disadvantages of individual models, they cannot achieve the expected prediction results in all circumstances [25,26]. Based on the increasing application of wind power generation in electronic systems, developing effective controls for prediction error is crucial [27]. To compensate for the shortcomings of individual models, some hybrid models have been designed for wind speed prediction to achieve better prediction performance [28]. Generally speaking, the superior ability of hybrid models makes it easier to achieve accurate wind speed predictions compared to the abilities of individual models. erefore, many research reports on hybrid forecasting models are put forward every year. Such reports tend to focus on data preprocessing [29,30], combining single models, and using heuristic algorithms, such as particle swarm optimization [31], genetic algorithms [32], and multiobjective algorithms [33], to optimize model parameters.
In recent years, hybrid models have been applied to both long-term and short-term wind speed prediction. For shortterm wind speed forecasting, Ma et al. [34] developed a wind speed prediction model by using singular spectrum analysis (SSA) to derive a noise removal sequence corresponding to a real sequence to predict short-term wind speeds. Wang et al. [35] developed a hybrid wind speed prediction model using complete ensemble empirical mode decomposition (CEEMD), multiobjective whale optimization, and an Elman neural network. Meng et al. [36] proposed a short-term wind speed prediction hybrid model combining data preprocessing with an artificial neural network and various optimization methods. Assessments of the effectiveness of their model revealed that its prediction accuracy was significantly improved compared to several benchmark models. In [37], an effective short-term prediction framework for wind speed was proposed by combining a local linear fuzzy neural network, discrete wavelet transform, and singular spectrum analysis optimized by the seeker optimization algorithm. In [38], a hybrid model based on wind speed prediction was proposed to combine variational mode decomposition (VMD) with an extreme learning machine (ELM) optimized using the hybrid backtracking search optimization algorithm. is model achieved excellent performance in terms of describing nonlinear modes. ese studies demonstrate not only that hybrid strategies are superior to individual models but also that such strategies can be used as an effective form of engineering application technology. Additionally, there have been a large number of studies on medium-and long-term wind speed prediction. For example, Wang et al. [39] combined support vector regression with seasonal index adjustment and an Elman recurrent neural network to construct hybrid models called PMERNN and PAERNN, which performed the mid-term prediction of wind speed effectively. Ulkat and Günay [40] proposed a method to determine wind speeds corresponding to specific positions without relying on previous wind speed data, which is effective for long-term wind speed prediction, by combining physical factors with an artificial neural network. e prediction results of hybrid models based on the mechanisms discussed above demonstrate the short-and long-term wind speed prediction effectiveness of hybrid models.
Another problem regarding wind energy prediction is that many studies focus on a single mode of prediction. Specifically, most previous studies have focused on either point prediction or interval prediction of wind speed alone, without considering how both models could be used together for predictive modeling and analysis. erefore, existing models cannot meet the needs of engineering applications or guarantee the reliability of wind systems. Existing probability interval prediction methods can generate a large quantity of predictions that can help managers implement appropriate policies. However, the study of interval modeling and prediction is still insufficient. e main research direction for uncertainty quantification focuses on statistical methods, including quantile regression [41,42], bootstrapping [43], and kernel density estimation [44]. Additionally, several interval prediction methods have been proposed based on artificial neural networks, lower bound estimation (LUBE) [45], and so forth. Table 1 summarizes existing methods and models for wind speed point prediction and interval prediction, as well 2 Complexity as the advantages and disadvantages of these methods. e main points in Table 1 can be summarized as follows.
In the area of point prediction, (1) although physical models provide good long-term prediction ability, their application is limited based on complicated meteorological conditions, difficult model initialization, and excessive computations. (2) Traditional statistical models, such as AR and ARIMA, have enhanced computational efficiency. However, the modeling of nonlinear time series, such as wind speed time series, is limited by the linear forms of such models. (3) An important issue related to artificial neural networks is that network iteration can easily fall into local optima, although such networks do provide good nonlinear time series modeling ability. (4) Although recent combined models successfully incorporate the advantages of individual models and improve prediction accuracy significantly, model combination technology in existing systems always revolves around linear combination. Based on the nonlinear characteristics of wind speed, this paper presents a method for combining individual prediction models in a nonlinear manner and optimizing model parameters using a multiobjective optimization algorithm to improve prediction effectiveness further.
Regarding interval prediction, (1) based on the unique advantages of quantile regression, most research has focused on this method. However, quantile regression is disadvantageous for developing prediction intervals because it must obtain a specific training dataset to establish prediction models. Additionally, every quantile must be considered, which increases computational complexity and the probability of discarding useful results during resampling [46]. (2) Bootstrapping methods are statistical methods that apply data resampling and replacement to evaluate the robustness of various statistics, including standard error, confidence interval parameters, correlation coefficients, and regression coefficients. Bootstrapping methods can compensate for the shortcomings of quantile regression methods but are only helpful for handling small sample sizes [47]. (3) Kernel density estimation can quickly calculate intervals based on point prediction results and a given statistical historical error distribution. However, such methods require the strict assumption of distributions [48]. (4) e LUBE method eliminates the shortcomings of traditional interval prediction methods and has high computational efficiency in terms of hypothesizing distributions, but its complex objective function cannot be obtained using conventional methods. In summary, there is no unified interval prediction method and further research and investigation are required to obtain more effective results [49]. erefore, we developed a novel interval prediction architecture that outperforms most individual interval prediction models based on assumed distributions. In the proposed interval prediction architecture, there are no hypotheses regarding distributions and models. erefore, the established interval structure possesses powerful anti-interference ability in the presence of outliers in interval data.
Based on our review of the literature and methods described above, the major contribution of this article is the presentation of a hybrid double prediction system that is designed to combine the point prediction and probability interval prediction of wind speed, which compensates for the shortcomings of existing research. e proposed system is composed of a wind speed point prediction module and interval prediction module, which can provide numerous predictions for the managers of wind farms. Specifically, the proposed double prediction system includes a preprocessing module based on VMD, a prediction module based on a nonlinear combination model, an interval prediction module, and an evaluation module. As a relatively new signal processing technology, VMD decomposes wind speed sequences and then performs denoising and reconstruction to generate a time sequence with greater clarity. e nonlinear combination model proposed in this paper is an effective prediction. ELM [9], a generalized regression neural network (GRNN) [50], and ARIMA [51] are selected as the base models for combination. e prediction results from these three models are aggregated using backpropagation (BP) [52], which is a form of nonlinear combination. BP is very sensitive to the selection parameters, which directly determine the effectiveness of point prediction and interval prediction. erefore, to identify the optimal parameters for the BP model, we adopted the multiobjective evolutionary algorithm based on decomposition (MOEA/D). Additionally, to verify the performance of the proposed prediction architecture, we selected ten indicators to judge the accuracy of prediction. We present a thorough discussion on the verification of the effectiveness of the prediction system in this paper.
Wind speed data from Penglai in Shandong province were selected as experiment datasets on which to test the performance of the proposed system. Shandong province is located on the east coast of China and is rich in wind energy resources. To meet the needs of social development, energy conservation, and environmental protection, Shandong has developed many wind power stations. As a coastal province, Shandong has one of the largest wind farms in China. By the end of 2018, the total installed capacity of wind power was 11.26 million kilowatts. In this study, Penglai city, which is located in northern Shandong province, was selected as a research area based on its huge energy potential and valuable wind energy resources. e major innovations of the proposed predictive system can be summarized as follows.
(1) In this paper, a novel double prediction system for wind speed is established based on certain point prediction and uncertain interval prediction. e goal of the proposed system is to enhance the accuracy of point prediction, enhance the construction efficiency of prediction intervals, and enhance the operation level of wind power systems. Numerical simulation results demonstrate that our model has satisfactory prediction abilities. (2) A nonlinear combination method based on a BP optimization method is proposed. To determine the optimal combination mode for each model and overcome the limitations of existing linear combination models for nonlinear wind speed data, a nonlinear aggregation mechanism based on ELM is used to combine different models to compensate for the inherent defects of individual models and linear combinations. e MOEA/D algorithm is applied to search for the best parameters for ELM to improve prediction accuracy further. (3) An interval prediction method based on fuzzy clustering is established. Compared to traditional parametric statistical models, one unique advantage of the proposed prediction model is its convenience because it does not need to know distribution shapes. is feature significantly reduces the complexity of the model and enhances the overall efficiency of the system. e remainder of this article is organized as follows. Section 2 discusses the relevant methods applied in the proposed double prediction architecture. In Section 3, the double prediction model is established comprehensively. In Section 4, data are introduced and experimental results are analyzed. Further discussion is provided in Section 5. Finally, our conclusions are summarized in Section 6.

Knowledge and Tools for Model Preparation.
When constructing our model, several methods were selected based on their unique advantages and combined to enhance the overall performance of the model. Here, we introduce the two main methods, which are variational mode decomposition [53] and multiobjective evolutionary algorithm based on decomposition.
First, we will discuss variational mode decomposition (VMD).
VMD is an effective data preprocessing method proposed by Dragomiretskiy and Zosso [54] in 2014. e goal of VMD is to decompose a real input signal sequence f into a series of subsignal sequences y k called modes, which have specific sparsity characteristics when reproducing the input. For completeness, signals f and modes y k are required to be complete and squareintegrable to the second derivative (i.e., f, u k ∈ L 1 ∩ W 2,2 ). Each mode k maximally pulsates around a center w k .
Step 1. Accessing the bandwidth of each mode.
For each mode y k , the Hilbert transform is applied to calculate a correlation analysis signal and a unilateral frequency spectrum is obtained. Next, mixed exponents are adjusted to their estimated central frequencies to transfer the 4 Complexity spectrum of the mode to the baseband. e bandwidth is estimated based on the H 1 Gauss smoothness of the demodulated signal (i.e., the square L 2 -norm of the gradient). e constrained variational problem is defined as follows: where δ(t) represents the Dirac distribution and k and t represent the number of modes and time scripts, respectively. Furthermore, {y k } is the set of modes {y 1 , y 2 , . . ., y k } and {w k } is the set of center pulsations w 1 , w 2 , . . . , w k .
Step 2. Defining the optimization problem. Considering the penalty term and Lagrange multiplier λ, the constrained optimization problems above can be redefined as follows: where α represents the equilibrium parameter of the data fidelity constraint.
Step 3. Solving for the modes {y k } and center pulsations {w k }. By adopting the multiplier alternating direction method, the process of solving for y k and w k can be defined as follows.
For y k , the minimization problem is defined as where · n and · n+1 are omitted for the fixed directions w k and u i≠k . e problem is solved in the spectral domain as follows: which can be rewritten as erefore, the final solution can be obtained as follows: For w k , the minimization problem is defined as follows: In the Fourier domain, the problem is optimized as erefore, the final solution can be obtained as follows: where f(w), y i (w), λ(w), and y n+1 k (w) represent the Fourier transforms of f(t), y i (t), λ(t), and y n+1 k (t), respectively, and n represents the iteration number.
Next, we will discuss the multiobjective evolutionary algorithm based on decomposition (MOEA/D).
Recently, the MOEA/D proposed by Zhang and Li [55] has attracted significant interest based on its concise and effective characteristics, and many theoretical and practical achievements have been realized. e MOEA/D algorithm is detailed below.
A multiobjective optimization problem (MOP) with M objectives and N decision variables can be expressed as follows: where Ω ∈ R n is the decision space. e decision vector x � x 1 , x 2 , . . . , x n ∈ Ω is a candidate solution to the MOP. Here, the objective function F(x): x ⟶ R m includes M conflicting object functions with continuous real values where R m represents the target space. e Pareto dominance relationship between individuals is defined as follows. If there are decision vectors U and V which satisfy the following two conditions simultaneously, we say that U dominates V:

Complexity
In this case, V is said to be dominated by U, which can be denoted as u ≻ v, where ≻ represents a set of dominant relationships.
If there is no point x ∈ Ω that makes F(x) dominate F(x * ), then the point x * ∈ Ω is Pareto optimal. ere is only one optimal set of compromise solutions called nondominated solutions (i.e., not dominated by all other solutions). e values of the Pareto optimization solution in the determined space and target space are defined as the Pareto solution set (PS) and Pareto frontier, respectively [56].
MOEA/D has strong search ability for continuous optimization, combinatorial optimization, and PS complex problems.
e main principle of this algorithm can be summarized as follows.
If a multiobjective optimal problem (e.g., equation (10)) and weight vector λ � (λ 1 , . . . , λ m ) are given and the given weight vector satisfies m . . , m, then the MOEA/D can be applied. MOEA/D based on Tchebycheff decomposition uses the weight vector to optimize a MOP into several subproblems based on the following methods: where By solving multiple subproblems with different weight vectors based on equation (12), a Pareto optimal solution set [57] with good diversity can be obtained.
It is known that g tc is continuous in λ, so if λ i is close to λ j , then the g tc (x | λ i , z * ) solution must be close to the g tc (x | λ j , z * ) solution.
erefore, a useful tool for g tc (x | λ i , z * ) optimization is information regarding g tc with weight vectors near λ i .
In the MOEA/D, the population is made up of the optimal solutions to the current subproblem. Each subproblem maintains a list of neighbors, and this list preserves subproblems with weight vectors similar to those of the current subproblem.
erefore, under the assumption of continuity, two neighboring subproblems should have similar optimal solutions. In each generation of MOEA/D, each subproblem is optimized using only the information from its neighboring subproblems.
For each generation t, MOEA/D using the Tchebycheff decomposition satisfies the following conditions.
. . , z m ) T and z i is the best value found now for objective f i . (4) An external population is available to store the nondominated solutions found during the search.
e pseudocode of MOEA/D is described as Algorithm 1 below:

Construction of the Wind Speed Double Prediction System
is section discusses the proposed wind speed double prediction system architecture, including system establishment and evaluation.

System Establishment.
e prediction system proposed in this paper consists of two modules: a point prediction module and interval prediction module. e following subsections describe the system construction process and the system structure is illustrated in Figure 1.

Point Prediction Module.
In this section, we propose a novel type of nonlinear hybrid point forecasting model using ELM, GRNN, and ARIMA, as well as a BP network, the MOEA/D, and a nonlinear combination mechanism, to achieve stable high-precision wind speed point prediction results. Considering the excellent prediction performance of BP networks, the proposed method adopts a BP network for nonlinear combination.
e point prediction module in the designed system is composed of four stages. e details of each stage are discussed below.
(i) First stage: wind speed data preprocessing.
To remove the noise and extract helpful information from a wind speed sequence, we use VMD technology to disintegrate an original sequence and reconstruct a smooth time series. Specifically, an original sequence is decomposed into several intrinsic mode functions (IMFs). IMFs with higher frequencies are eliminated to filter the time series.
Here, we remove IMF 1 , IMF 2 , and IMF 3 , and the remaining IMFs are reconstructed to derive the final series. (ii) Second stage: single-model prediction.
In the proposed method, we first use individual models to predict points. Specifically, we use ELM, GRNN, and ARIMA as individual prediction models to construct a combined model. ELM and GRNN are adopted to handle the nonlinear characteristics of wind speed and ARIMA is adept at discerning the linear characteristics of wind speed data. In this study, we divided 4464 pieces of wind speed data into a training set train 1 and testing set test 1 , where train 1 contained 3964 pieces of data and test 1 contained 500 pieces of data. In general, there are no clear regulations regarding the ratio of training sets and testing sets for neural networks. It is common practice to use approximately 2/3 to 4/5 of the sample data for training and the remaining samples for testing. When the quantity of data is large, the data proportion in the training set can be increased appropriately [58]. As the proportion of training data increases, the neural network can achieve better prediction accuracy [59,60].
6 Complexity erefore, this division of the model is sufficient to construct a model and verify its accuracy. e input and output structures of train 1 are described in equations (13) and (14), respectively. By using ELM and GRNN models trained on train 1 to predict the wind speeds in the test set, the prediction sequences predict 1 and predict 2 are obtained, respectively. Similarly, ARIMA is used to obtain the prediction sequence predict 3 .
where n is the number of samples in train 1 , l is the look-back time lag, and x(k) is the wind speed value at time k. For example, consider x(1) to x(l) as inputs and x(l + 1) as an output. Here, we set l � 5. (iii) ird stage: nonlinear combination model construction.
To obtain an effective combination of each model, a nonlinear decision-making method based on an optimized BP neural network is proposed to obtain optimal results. Specifically, we divide test 1 into a training set train 2 (356) and testing set test 2 (144) and then use predict 1 , predict 2 , and predict 3 as BP network inputs and the 356 data in train 2 as outputs.
It is worth noting that it is difficult to determine the weights and thresholds for the neurons in each layer of the BP network, so MOEA/D is adopted to search (i) EP-external population Setup: Step 1: Initialization (i) / * Initialize an primary internal population uniformly randomly. * / . ., z n ) T by a specific problem method. * / (iii) / * Calculate the Euclidean distance between any two weight vectors, and then calculate the closest T weight vectors to each weight vector. * / Step 2: Updating / * Genetic operators * / / * Randomly select two indexes k, l from B(i), and then generate a new solution y from x k and x l by using genetic operators. for the best weights and thresholds for the neural network. e input and output structure for BP network training are defined in equations (15) and (16), respectively.
where N represents the number of samples in test 2 and x 1 (k), x 2 (k), x 3 (k) represent the kth wind speed values predicted by ELM, GRNN, and ARIMA, respectively. By using the input and output, an optimized BP neural network can be trained. (iv) Fourth stage: wind speed point prediction.
According to the established nonlinear prediction model, a rolling prediction method is used for multistep prediction and final prediction results are obtained. Evaluation indexes are calculated using the prediction results and test 2 , and the performance of the model is evaluated. In particular, multistep forecasting means forecasting multiple load values in the future. A time index t is the forecast origin and a positive integer l is the forecast horizon. It can be assumed that the time index t is exactly the time point that we are in, and our target is to obtain the forecasting value y t+l (l ≥ 1). l � 1, 2, 3 corresponds to 1 step, 2 steps, and 3 steps, respectively.

Interval Prediction Module.
e interval prediction method in the proposed system was developed using point prediction results based on a fuzzy system. e three main steps in this module are summarized below.
(i) First stage: data classification.

Complexity
In this step, the training set train 1 of wind speed data is clustered into several classes using fuzzy c-means clustering. We assume that the data in each category follows the same normal distribution. erefore, we can derive a set of interval classes F 1 , F 2 , . . . , F k . Here, we consider site 1 as an example. One can see that the data is divided into ten categories, where the scope of each category is defined in (17 (ii) Second stage: wind speed interval estimation. e confidence degree of each category interval is 95%. According to the mean and variance of each category of data, a corresponding confidence interval is constructed. Different categories have different widths of unified prediction intervals. is process of constructing different adaptive intervals according to different data characteristics is one of the main innovations of our model. According to the testing set test 2 of point prediction results from the point prediction module, we identify the category F to which each prediction value belongs.
en, according to the constructed confidence interval for each category, the prediction interval for each prediction value is calculated as follows: where x i is a point prediction value, j is the category number of x i , s j is the standard deviation of category j, and n j is the number of data samples in category j. (iii) ird stage: sorting prediction results.
According to the prediction intervals derived above, final interval estimations for wind speed can be obtained.

System
Evaluation. e evaluation indexes for the designed double prediction system are introduced in this section, including four indexes for point prediction and six indexes for interval prediction.

Point Prediction Evaluation.
Generally speaking, evaluation criteria are not unique to a given prediction system. is paper uses four common evaluation standards to evaluate the ability of the developed model and other comparative models, namely, mean absolute error (MAE) [61], mean squared error (MSE) [62], mean absolute percentage error (MAPE) [63], and direction change (DC). e smaller the values of MAPE, MSE, and MAE, the better the prediction performance. If the DC value is relatively large, the predicted direction of motion is considered to be consistent with the real value. Table 2 provides additional details regarding these four indexes.
Among the formula, y i and y i represent the true value and predicted value of wind speed, respectively. N represents the testing set number.
Besides, a i is the directional factor and is calculated as
In this paper, PICP specifically refers to the PICP of the testing dataset, which is the main evaluation index for interval prediction. It indicates the coverage effect of the obtained confidence intervals relative to the target value. Given a confidence level, if the PICP is greater than or equal to (1-alpha), then the constructed interval is valid. Otherwise, the constructed interval is invalid. PINAW refers to the NAW of the prediction interval of the testing dataset. e cost of reducing the width diminishes the probability of achieving the expected target coverage. Increasing coverage requires increasing the width of the interval, so PICP and PINAW are essentially contradictory [48]. ACE represents the difference between the coverage and confidence of a prediction interval. MPI represents the average width of an obtained interval [6]. Similarly, the quality of an interval can also be assessed by its Winkler score. A high-quality interval has a smaller Winkler absolute value for an assigned nominal confidence level [66]. AWD refers to the AWD of the testing dataset, which can be obtained by calculating its relative deviation degree. e cumulative sum of AWD i represents the relative deviation degree [67]. Table 3 provides specific descriptions of these formulas.
Among the formula, U i and L i represent the upper limit and lower limit of forecasting interval, respectively. c i is the number of the truth values contained in constructed interval. N represents the testing set number. y max and y min are the maximum and minimum values of the targets in the whole prediction process.
Besides, S i is calculated as AWD i is the width deviation of construction interval of each sample, of which the calculation expression is

Experiments and Analysis
is section discusses the application of the double prediction model and several comparative models. e comparisons are divided into three experimental demonstrations. e operating environment of the experiments was a PC with a 2.40 GHz CPU, 4.00 GB of RAM, Windows 7 operating system, and MATLAB R2016A platform. Considering random factors, to guarantee the reliability of final results, 20 trials were conducted for each experiment and the average values were recorded.

Dataset Description.
e wind speed data for three sites at Penglai in Shandong Province are chosen as experimental datasets on which to test the performance of the established double prediction system. Basic information regarding this wind speed data is provided in Table 4. Descriptive statistical analysis uses four statistical indicators, namely, the maximum, minimum, and average values, as well as standard deviation (Std.). e basic information and original data for the selected sites are presented in Figure 2.
For the sake of estimating the prediction effects of the models, 10 min wind speed data blocks from the Penglai wind farm from January 1, 2011, to January 31, 2011, were selected as experimental data. is wind farm consists of three different sites. Each dataset contained 4464 data points, which were divided into a training set train 1 and testing set test 1 . e training set train 1 contained 3964 data points and the testing set test 1 contained 500 data points. For nonlinear aggregation, the testing set test 1 was subdivided into a training set train 2 and testing set test 2 containing 356 and 144 points, respectively. For both the training and testing sets, we used a rolling forecasting mechanism to predict wind speed and produce one-step and two-step prediction results. e data structure details of the double prediction model are presented in Figure 2.

Diebold-Mariano Test.
To determine if the designed hybrid model provides better forecasting results than the comparative models, we adopted an effective verification method called the DM test, which was proposed by Diebold and Mariano RS [46]. e theory behind the DM test is summarized below.
Considering a significance level α, the zero hypothesis H 0 indicates that the predictive effectiveness levels of the proposed model and a comparative model are not significantly different. e meaning of H 1 is opposite to that of H 0 . e relevant formulas are defined as follows: where L represents the loss function for prediction error and err 1 i and err 2 i are the error sequences predicted by the selected models.
Additionally, the statistics of the DM test can be defined as follows: where S 2 is the estimate of the variance of d i � L(err 1 i ) − L(err 2 i ). Assuming a certain significance level α, the obtained DM value is compared to z (α/2) . Once the DM statistics exceed the interval [− z (α/2) , z (α/2) ], H 0 can be rejected. is indicates that the predictive performances of the target model and a comparative model are significantly different, meaning H 1 will be accepted.  Prediction interval coverage probability

Results and Analysis of Point Prediction.
For the sake of verifying the applicability of the proposed point prediction module, two experiments are presented in this section, which are denoted as experiment I and experiment II, respectively. e main purpose of experiment I was to prove the superiority of the nonlinear combination model in the point prediction module compared to a single model, thereby reasonably proving the validity of hybrid modeling. Additionally, the results of experiment I demonstrate the necessity of data preprocessing. Similarly, to demonstrate the rational and superior ability of the VMD technology adopted in our system, it was compared to other common data preprocessing methods in experiment II. Detailed analysis of each experiment is provided below.

Experiment I: Comparison to Individual Models.
In this experiment, all experimental datasets were considered to assess the effectiveness of the point prediction module based on three comparisons. In the first comparison, the proposed model was compared to three preprocessed data models,    namely, VMD-ARIMA, VMD-GRNN, and VMD-ELM, to analyze the advantages of the combination model and nonlinear combination method. In the second comparison, the three VMD-based models were compared to ARIMA, GRNN, and ELM, respectively. In the third comparison, the effectiveness of the designed prediction model was evaluated further by using the traditional wavenn model and BP as comparative methods. e predicted results are presented in Table 5 and Figure 3, and the comparison results are summarized below.
(1) Regarding the first comparison, the hybrid nonlinear model yielded the best results for one-step and twostep wind speed prediction on all three datasets according to the error indexes. For example, for onestep prediction, the MAPE value of the developed model is approximately 2% to 3%, while the best accuracy values of the VMD-based models are more than 1% lower than that of the developed model. Two-step prediction yields similar results. (2) Regarding the second comparison, when comparing VMD-ARIMA, VMD-GRNN, and VMD-ELM to ARIMA, GRNN, and ELM, respectively, without data preprocessing, one can see that data preprocessing is very important for enhancing wind speed prediction. For site 1, the MAPE values of ARIMA and ELM are 4.2190% and 6.7442% higher than those of VMD-ARIMA and VMD-ELM, respectively, for one-step prediction and 4.2927% and 6.4793% higher, respectively, for two-step prediction. e accuracy of VMD-GRNN is also slightly improved. For sites 2 and 3, the results are very similar. (3) Regarding the third comparison, based on the four indexes of MAPE, MAE, MSE, and DC, one can see that the developed model is more accurate than other individual models, such as wavenn and the BP neural network. Additionally, the individual models with the highest prediction accuracy are ARIMA, BP, and GRNN. erefore, we selected BP as a model for nonlinear combination. Because ARIMA is a linear model, it can determine if wind speed data has certain linear characteristics, so it is intuitive to consider ARIMA in the proposed model. e other three models, namely, ARIMA, GRNN, and ELM, are submodels of the combined model.

Experiment II: Testing Data Preprocessing Methods.
is experiment aimed to compare the effectiveness of the VMD selected in this study to that of other common data preprocessing technologies, such as EMD, EEMD, CEEMD, and SSA. erefore, the point prediction models based on In particular, ARIMA is single linear prediction model; GRNN, ELMNN, wavenn, and BPNN are single nonlinear prediction models. VMD-ARIMA, VMD-GRNN, and VMD-ELM are single prediction models after data preprocessing. different data preprocessing methods are the EMD-based model, EEMD-based model, CEEMD-based model, and SSAbased model. ese models only use different decomposition methods during the data preprocessing stage. In this experiment, we tested whether or not the proposed prediction model is reasonable and identified the best method for removing noise to improve prediction effectiveness. e results obtained by models using different data preprocessing methods are listed in Table 6. Figure 4 presents a clearer and more intuitive comparison. In Table 6 and Figure 4, one can see that the model based on VMD technology has superior performance compared to the other decompositionbased prediction models. e MAPE value of the VMD-based proposed model is 0.3 to 4 percentage points higher than those of the EMD-based model, EEMD-based model, CEEMD-based model, and SSA-based model. Of all the benchmark models, the SSA-based model performs the worst. Compared to the other models, the MAE, MSE, and DC values for one-step and two-step prediction by the proposed model are improved to some extent, which demonstrates the superiority of the data preprocessing method adopted in our hybrid model.
(1) Remark Regarding the Point Prediction Module. Experiments I and II focused on proving the advantages of the proposed point prediction module and verifying it from the perspective of single prediction models, combination models, and data preprocessing. e results show that, in both cases, the proposed point prediction model is superior to all the comparative models. is proves that the combination of data preprocessing technology, optimization algorithms, and nonlinear combined methods can successfully resolve the issues of wind energy prediction based on the selection of appropriate prediction methods. Based on the superior effectiveness of the designed point prediction model, it has very promising application potential.
In particular, EMD, EEMD, and CEEMD are a series of processes of the same principle; the changing process of EMD ⟶ EEMD ⟶ CEEMD can be summarized as follows: e signal formula of EMD is e signal formula of EEMD is e signal formula of CEEMD is where x(t) is the original signal, n i (t) is the noise sequence, and n + i (t) and n − i (t) are positive noise and negative noise sequence. On the basis of EMD, noise sequence is added to form EEMD. CEEMD further decomposes the noise sequence into positive noise sequence and negative noise sequence.

Results and Analysis of Interval Prediction (Experiment III).
Based on wind speed point prediction results, probability interval prediction can derive additional wind speed information. In this study, we developed a method based on fuzzy clustering which performs interval prediction based on point prediction results. ree datasets were considered in this experiment. For the sake of verifying the effectiveness of the designed interval prediction module, we used all of the comparative models for point prediction and performed multistep prediction to verify the interval prediction results. e results of the proposed interval prediction model and other models are listed in Table 7. Based on space limitations, Table 7 only lists the results for site 3. We set the confidence interval to 90% to assess the effectiveness of the interval EMD-based model, EEMD-based model, CEEMD-based model, and SSA-based model are prediction models after data preprocessing. ese models are the same in combination and optimization methods, and the only difference is that they have different data preprocessing methods.
The purpose of this experiment is to compare the performance of VMD used in this study with other well-known data preprocessing technologies, including EMD, EEMD, CEEMD and SSA. Based on the excellent performance of the VMD, it is a good choice as data preprocessing method of model.

Time series diagram
Error diagram Step 1, Step 2 Step 1 Step  prediction model. From Table 7, one can draw the following conclusions. e best values for all indexes among all models are obtained by the proposed prediction model. e coverage probability of the prediction interval is 96.5278% in one-step PICP and 90.1944% in two-step PICP. e average width of the interval is 1.3399 for one-step prediction and 1.2158 for two-step prediction according to MPI. In terms of the absolute value of wind speed, the interval width is relatively accurate. e AWD is 0.0066 for one-step prediction and 0.0264 for two-step prediction, indicating that the deviation degree of the constructed interval is small. All indexes indicate that the predicted interval is qualified. In contrast, for the PICPs of individual prediction models, none of the onestep predictions reach more than 90% and all the two-step predictions are below 80%. By combining PINAW with PICP, for the proposed model, when the PICP value is very high, PINAW is relatively small, which demonstrates the superiority of the proposed model. For one-step and twostep prediction, the AWD values of most other benchmark models are more than ten times that of the proposed model. ACE is the difference between the coverage and confidence of the prediction interval.
e ACE values of all models except for the proposed model are negative, indicating that the coverage of the developed model is much better than that of the other models. e absolute value of the WS index of the proposed model is the smallest, indicating its reliability. ese six indexes fully reflect the superior prediction performance of the proposed model.
To present the comparison results intuitively, the results of the designed module and comparative methods are visualized in Figure 5. ese conclusions are consistent with the results in Table 7, providing intuitive evidence that verifies the superior abilities of the proposed system for wind speed interval prediction. As shown in Figure 5, compared to the other methods, the proposed model yields superior interval prediction results. e prediction range not only covers most of the wind speed values but also is the smoothest range among all models. is demonstrates that the proposed model is more stable than the other models. erefore, our model is more advantageous for the three experimental datasets.
(1) Remark Regarding the Interval Prediction Module. Similar to the comparison model used for point forecasting, 12 different models based on three datasets and multistep forecasting were compared. e results demonstrated that the designed interval model is superior to all the comparative models. Based on the excellent results of the designed interval prediction module using fuzzy clustering, it is a very promising interval prediction method for wind speed.
All the above comparison models are the comparison models of experiment 1 and experiment 2. We still use them to compare the performance of interval prediction, so as to prove the interval prediction performance of the developed model. In particular, the WS value in the table is bracketed to indicate its absolute value.

Discussion
For the sake of discussing our experimental conclusions in detail and reducing the error of wind speed forecasting, the validity of the established model, combination mechanism of the combined model, and its practical application to wind power systems are discussed in this section.

DM Test.
First, the validity of the proposed model was verified via DM testing in which all of the other models were compared to the proposed double prediction model. Based on the DM testing theory, the zero hypothesis is that the forecasting results of two models contain no significant differences. e alternative hypothesis is opposite to the zero hypothesis. We chose two scales with alpha values of 0.1 and 0.05 as the criteria for judging the significance of results with Z 0.05 /2 � 1.96 and Z 0.1 /2 � 1.645, respectively. Table 8 lists the DM statistics and averages for the three test sites. Table 8 reveals that most of the DM test values calculated by the developed model and comparative models are greater than the upper limit of a 5% significance level. However, for the VMD-ARIMA-, VMD-ELM-, and EEMD-based models, the results do not reveal significant differences compared to the proposed model. erefore, we can reject the zero hypothesis at a threshold of 10% significance. For example, the DM test statistic for the  VMD-ELM model at site 1 is 1.7554, which is not significantly different from that of the developed model at a 5% significance level, but is significantly different from the developed model at a 10% significance level. At a 10% significance level, all distinctions between the designed model and benchmark models are significant. erefore, it can be concluded that the designed hybrid double prediction model is preferable to the other models.

Combination Mechanism of the Combined Model.
For the sake of verifying the effectiveness of the designed nonlinear combination mechanism (MOEA/D-BP), a simple averaging strategy and linear combination mechanism were selected as comparative methods in this study. e simple averaging strategy computes the mean value of the prediction results of each model, while the linear combination mechanism uses the MOEA/D as a weight determination method to derive the final prediction results. Comparative results for the developed model and the other two methods are listed in Table 9.
e effects of each combination mechanism are compared based on four point prediction error measurement rules and six interval error prediction measurement rules. One can see that the prediction effectiveness of the nonlinear combination model is greater than that of the simple averaging strategy and linear combination mechanism, regardless of the location and prediction steps. e linear combination mechanism is often more effective than the simple averaging strategy. In other words, the simple averaging strategy performs the worst. erefore, the developed MOEA/D-BP mechanism successfully improves forecasting effectiveness for wind speed. e simple average method is to use the simple average formula under statistical sense to calculate the final predicted value. e method formula is briefly introduced as follows: where p i is the prediction results of the corresponding model. e linear combination of the models is the weighted combination of the results of the three single models, and a final  16 Complexity prediction value is obtained. e weights are determined by the multiobjective optimization algorithm, which increases the intelligence of the method.

Performance Testing of Optimization Algorithms.
is section first introduces the parameter settings for the BP network and MOEA/D and then presents convergence testing results for metaheuristic algorithms.

Parameter
Settings. An artificial intelligence algorithm called BP was used to combine wind speed results. In a BP neural network, the weights and thresholds of input, hidden, and output layers play crucial roles in terms of network performance. To determine the appropriate connection weights and node thresholds efficiently, we adopted the MOEA/D for parameter optimization. e parameters for BP and the MOEA/D are listed in Tables 10 and 11, respectively.

Convergence Testing of Optimization Algorithms.
To analyze the performance of the MOEA/D, different population size numbers were selected to test its abilities using four test functions. ree multiobjective optimization algorithms, namely, MOGWO, MOALO, and MODA, were used as comparative models. Table 12 contains the details of    Table 13.
We selected two performance indexes as the criteria for evaluating the optimization algorithms, namely, the IGD index and SP index. Additionally, the running times of different algorithms were compared. IGD is an indicator of the convergence conditions of an algorithm and it can be used to judge the robustness and stability of algorithms. e smaller the IGD value, the better the performance of an algorithm. In a Pareto set, SP is typically used to evaluate the distribution of solutions. If SP is equal to zero, then all nondominant solutions are equidistant. e final simulated results are listed in Table 13. For all of the algorithms, as the population size increases from 100 to 150, 200, and 300, convergence is enhanced. e MOEA/D yields the best performance for ZDT1, ZDT2, ZDT3, and ZDT6. e IDG of the MOEA/D is far less than that of the other algorithms, indicating that the MOEA/D provides the best convergence performance. MODA is the second-best algorithm. e convergence effect of the MOGWO algorithm is much weaker than that of the other algorithms. For SP, the MOEA/D yields the best allocation performance. e running time of the MOEA/D is significantly lower than the running times of the other three algorithms, which demonstrates that the MOEA/D is the fastest and most efficient algorithm.
IGD and SP are two important performance evaluation indexes of multiobjective algorithm solution set, of which the calculation formulas are as follows [60]: where P is the set of points uniformly distributed on the real Pareto surface and |P| is the number of individuals of the set of points distributed on the real Pareto surface. Q is the optimal Pareto solution set obtained by the algorithm. d(v, Q) is the minimum Euclidean distance between individual v and population Q in P. erefore, IGD is to evaluate the comprehensive performance of the algorithm by calculating the average value of the minimum distance from the point set on the real Pareto surface to the obtained population.
where d i represents the minimum distance from the i-th solution to other solutions in the solution set and d represents the mean value of all d i . n is the number of solution set individuals. SP measures the standard deviation of the minimum distance from each solution to other solutions. e smaller the SP value, the more uniform the solution set.

Practical Application to a Power
System. Wind power forecasting systems are of great importance for large-capacity wind power systems. Effective wind speed forecasting can be helpful in many areas, such as timely maintenance scheduling and electronic grid safety management. e contributions of an accurate wind speed forecasting model to a power system can be summarized as follows [50]: (1) To guarantee the best wind energy output quality, it is very important to assess the quantity of wind power. Wind power has a direct power relationship with wind speed, so the evaluation of wind power can be accomplished based on wind speed prediction. erefore, precise wind speed forecasting can enhance decision-making for wind farms and is conducive to smart grid planning.
(2) Accurate wind speed forecasting can provide essential guidance for the dispatching and control of wind turbines. Based on predicted wind speeds, 18 Complexity administrators can control wind turbines immediately to ensure the best wind energy output quality. If the wind speed value is greater than the fan capacity, the fan should be closed to avoid damage and reduce operating costs. (3) Wind speed prediction effectiveness has an important impact on electronic grid dispatching and supervising. Wind power output fluctuates significantly and intermittently, which makes power system operation very challenging. erefore, accurate prediction models can assist decisionmakers in making timely decisions to avoid the problems discussed above.

Conclusions
Based on the depletion of traditional energy sources, wind energy is considered to be a promising alternative energy source because of its sustainability and cleanliness. However, based on its inherent intermittence and randomness, the  extraction of wind energy is very limited, which can endanger the dispatching and management of wind power systems.
To analyze the uncertainty characteristics of wind speed more comprehensively, a double prediction system was successfully developed in this study. e proposed system compensates for the shortcomings of previous methods. e proposed system consists of two main parts: a point prediction module based on nonlinear combination and interval prediction module based on fuzzy clustering. It is of great significance to explore the predictability and modeling of wind speed comprehensively. Unlike previous works, we implemented a BP neural network using MOEA/D optimization as a novel nonlinear combination mechanism to derive final prediction results, which enhances the accuracy of point prediction and improves final prediction accuracy. To improve the accuracy of point prediction, wind speed data was divided into different categories based on fuzzy clustering and different intervals were constructed according to the prediction data in different categories. is method of constructing different intervals according to different data characteristics has been proven to be an effective interval prediction method. Finally, a large number of experiments were conducted using quantitative indexes, which demonstrated the effectiveness and superiority of the proposed system. Additionally, because the proposed system provides reliable performance, it can also be applied to load prediction, wind power forecasting, economic forecasting, and other fields.