Crude oil prices and volatility prediction by a hybrid model based on kernel extreme learning machine

In view of the important position of crude oil in the national economy and its contribution to various economic sectors, crude oil price and volatility prediction have become an increasingly hot issue that is concerned by practitioners and researchers. In this paper, a new hybrid forecasting model based on variational mode decomposition (VMD) and kernel extreme learning machine (KELM) is proposed to forecast the daily prices and 7-day volatility of Brent and WTI crude oil. The KELM has the advantage of less time consuming and lower parameter-sensitivity, thus showing fine prediction ability. The effectiveness of VMD-KELM model is verified by a comparative study with other hybrid models and their single models. Except various commonly used evaluation criteria, a recently-developed multi-scale composite complexity synchronization (MCCS) statistic is also utilized to evaluate the synchrony degree between the predictive and the actual values. The empirical results verify that 1) KELM model holds better performance than ELM and BP in crude oil and volatility forecasting; 2) VMD-based model outperforms the EEMD-based model; 3) The developed VMD-KELM model exhibits great superiority compared with other popular models not only for crude oil price, but also for volatility prediction.


Introduction
As the benchmark of oil market, crude oil has a strong impact on the global economic growth, social stability and national security [1]. In the last two decades, the prediction of crude oil either for prices or volatility has attracted extensive attention of scholars. This is because the accurate prediction of crude oil price is beneficial to perfect the plans of corresponding production, marketing and investing, to regulate market risks and to enhance future's gainings of the oil-related industries [2], and the oil price volatility is the core of asset pricing, asset allocation and risk management. But in practice, the crude oil prediction is always a great challenging task [3]. One reason is that numerous information factors usually affect the crude oil prices, including fundamental supply-demand relationship [4], external uncertainties factors [5] and unexpected event impact such as epidemic disease. For instance, affected by the coronavirus disease 2019 (COVID-19) pandemic, crude oil prices have exhibited tremendous downturn on April 20 and even reached a historic negative value. The market was observed with great uncertainty and volatility. These factors expand the uncertainty of the prediction results and lowering the prediction accuracy. Therefore, scholars are always seeking a better and more effective forecasting method. In this backdrop, this paper is devoted to propose an effective crude oil prediction model, which can better extract the real information in crude oil prices and volatility so as to achieve accurate forecasting.
In literatures, kinds of prediction algorithms have been proposed, which can be mainly classified into three groups, namely econometrics approaches, artificial intelligence (AI) and hybrid models. Crude oil prices have the characters of highly nonlinear, irregular and complex, the econometric models cannot effectively extract these features. The artificial intelligence algorithms have become popular in dealing with nonlinear and non-stationary time series, like artificial neural networks (ANNs) [6], support vector regression (SVR) [7], least squares support vector regression (LSSVR) [8] and various other deep learning models [2]. However, these AI-technologies often suffer from the disadvantages of long running time and parameter sensitivity [9,10]. For example, ANNs use iterative learning process such as gradient descent method to adjust parameters, which requires a lot of time. Besides it usually trapped into a local optimal solution and the fixed hidden neurons also effect the result. SVR and LSSVR apply iterative learning algorithm, like grid search approach or trial and error technique, to compute the parameters of regularization and kernel, which also face the time-consuming and parameter sensitivity problems.
In recent years, the ideas of randomization and some non-iterative algorithms have been proposed to overcome limitations of AI-models and display excellent performance in speediness and prediction accuracy [9][10][11], which possess the features of random fixed parameters, random mapping characteristics and unnecessarily to set stop condition, learning rate training times and other parameters during training procedure [12]. Among them, Huang et al. introduced a novel machine learning algorithm known as Extreme Learning Machine (ELM), which randomly chooses input weights between the input layer and the hidden layer, leading to less consuming of time [9]. Meanwhile, weights between the hidden layer and output layer are computed through inversion of matrix and computation, involving lower complexity of computation. But the randomly selected weights will lead to the output changes of different trial runs, so that the system becomes not robust.
Later an improved ELM model called Kernel based Extreme Learning Machine (KELM) was developed [13,14] in which the hidden layer feature map is defined by the kernel matrix. After introducing the kernel function into ELM, the stability of forecasting is greatly improved. It has been seen extensive application in many fields with higher performance, easier implementation and faster training speed [15][16][17][18][19].
Since the single prediction models are limited, more and more hybrid models are utilized combining various single algorithms for predicting prices of crude oil, particularly following the decomposition-ensemble learning paradigm. Some typical decomposition techniques are wavelet decomposition [20], empirical mode decomposition (EMD) and their developed approaches [9]. But EMD-based methods have generally been proved to have some shortcomings, such as, boundary effects, noise sensitivity, mode overlap and lacking of accurate mathematical basis. These may have a negative impact on the precision of decomposition, resulting in distortion of results. Different from EMD, variational mode decomposition (VMD) is a completely non-recursive model which can decompose the original data into multiple components with a specific bandwidth in the spectral domain [21]. Compared with existing decomposition algorithm, such as EEMD and EMD, VMD is more sensitive to noise and sampling. The superiority of VMD method has been indicated in VMD-based decomposition-ensemble models for crude oil prices in some few works [6,[22][23][24].
Briefly, the main contributions of this work can be briefly described in three aspects. Firstly, we follow the "Decomposition-Ensemble" framework and develop a hybrid model VMD-KELM which shows excellence applicability in forecasting the international crude oil prices. The VMD algorithm which can effectively extract intrinsic features and smoother the nonlinear and complex characteristics of crude oil data, while the KELM prediction model is capable in overcoming the time-consuming and parameter sensitivity problem of iterative process. Compared with other hybrid VMD-based and ensemble empirical mode decomposition based (EEMD-based) models as well as single models, the VMD-KELM model demonstrates powerful predictive capabilities of crude oil time series. To the best of our knowledge, this VMD-KELM model has not yet been used for crude oil data. Secondly, this paper also focuses on the prediction of crude oil volatility. The existing works concentrate mostly on the crude oil price forecasting but relatively rare on volatility by the non-econometrics model. This paper validates that the proposed model VMD-KELM has the superior performance in volatility prediction. Thirdly, a recently developed multi-scale composite complexity synchronization (MCCS) statistic [25] from complexity theory is utilized to evaluate the mode, which offers a new perspective to show the forecasting performance. Overall, this study, on one hand, complements the existing decomposition-ensemble learning paradigm in terms of precision of crude oil prediction. On the other hand, it fills in the method literature of crude oil volatility forecasting (using decomposition technology plus promising randomized algorithms).
The remaining of this paper is organized as follows: the main algorithms and performance evaluation measures are given in Section 3. Section 4 depicts the dataset. In Section 5, the prediction effects of VMD-based KELM model for crude oil prices and volatility are analyzed empirically, meanwhile the comparison results with the EEMD-based hybrid models and single models are demonstrated. Section 6 concludes.

Literature review
In crude oil price prediction, scholars have proposed a large amount of algorithms. In general, these algorithms can be divided into three categories which are econometrics approaches, artificial intelligence (AI) and hybrid models which integrate two or more single models in any of the above type. In the first type of traditional economic models, autoregressive integrated moving average (ARIMA), random walk (RW), vector auto regression (VAR), generalized autoregressive conditional heteroskedasticity (GARCH) and error correction models (ECM) are comprehensively utilized in forecasting the crude oil price [26][27][28] as well as volatility [29][30][31][32]. For example, Kanjilal and Ghosh [26] used ECM to explore the fluctuation of crude oil prices [26]. Xiang and Zhuang [28] performed a prediction of Brent crude oil price by ARIMA model and suggested that ARIMA (1,1,1) model can be used as short-term prediction of international crude oil price. Marchese [29] et al. compared the prediction ability of short-memory multivariate GARCH models and long-memory multivariate models, showing the superiority of long-memory multivariate models in predicting crude oil data. Wang and Wu examined the prediction effectiveness of univariate and multivariate GARCH-class models with energy market volatility and found that univariate models allowing for asymmetric effects have higher prediction accuracy than other models [31]. Klein and Walther shown that the mixture memory GARCH (MMGARCH) outperform other predicting models (GARCH, EGARCH, and APARCH, etc.) in predicting volatility and value at risk [32]. Traditional economic models require the processed data to be linear, and this assumption is very difficult to realize. Therefore, they cannot predict the nonlinear and non-stationary time series well.
In order to avoid the shortcomings of economic models, some nonlinear and emerging artificial intelligence algorithms have been with popularity in crude oil price forecasting. The mainstream artificial intelligence algorithms adopted widely include artificial neural networks (ANNs) [6,20,25,[33][34][35][36][37][38][39][40][41], support vector machine (SVM) [7,43], least squares support vector regression (LSSVR) [8,44]. For instance, Lahmiri applied the generalized regression neural network (GRNN) to forecast day ahead energy prices and shown that GRNN is a promising tool for prediction of energy prices [6]. Azadeh et al proposed a flexible algorithm based on artificial neural network (ANN) to predict long-term oil prices [34]. Chiroma et al applied genetic algorithm and neural network (GANN) to forecast WTI crude oil prices and shown that GANN outperform ten BP models in prediction accuracy and computational efficiency [36]. Yu et al. utilized a LSSVR ensemble learning paradigm with uncertain parameters to forecast WTI price and the empirical results verified the prediction effectiveness of the proposed model [44]. Wu et al. added crude oil news as input data and used ANN to predict crude oil prices and made a good progress [41]. They applied the convolutional neural network to extract text features from online crude oil news to show the explanatory power of text features for crude oil price prediction [42]. These AI algorithms often hold the disadvantages of long running time and parameter sensitivity [9,10]. More stable models are waiting to be found.
In recent years, the hybrid models are becoming more and more popular. Following the "Decomposition-Ensemble" framework, there are models based on some typical decomposition techniques, namely, wavelet decomposition [20,25,45], empirical mode decomposition (EMD) and their developed approaches [9,[46][47][48][49]. For example, Jammazi et al. implemented a HTW-MPNN model combining multilayer back propagation neural network and Harr A trous wavelet decomposition to achieve prediction of crude oil prices and shown that HTW-MBPNN performs better than the traditional BPNN [20]. Tang et al tested the prediction effects of crude oil prices by employing several randomized algorithms like extreme learning machine (ELM), random vector functional link network (RVFLN) and random kitchen sinks (RKS) combined with EEMD method [11]. Wu et al improved the ensemble empirical mode decomposition (EEMD) model, proposed a novel EEMD-LSTM model to predict the crude oil spot price of West Texas Intermediate (WTI) [48]. Recently, a novel decomposition technique originated in signal processing, named variational mode decomposition (VMD), has been adopted as an effective decomposition approach. Lahmiri [6] employed the VMD and neural network for day-ahead energy prices forecasting, the results shown superiority of VMD in decomposition. Bisoi et al predicted the crude oil prices based on VMD and the robust random vector functional link network (RVFLN) [23]. Li et al. proposed VMD-AI models for crude oil price forecasting, and compared the results of VMD-AI, EEMD-AI and AI models [22].
The empirical results implied that the hybrid VMD models are superior to hybrid EEMD models and single models.

Variational mode decomposition (VMD)
Variational mode decomposition (VMD) is a non-recursive and adaptive data decomposition technique with multi-resolution [21]. It aims to disintegrate an input data X into several discrete subseries called intrinsic mode function (IMF) 1,2, ⋯ , , where each IMF has limited bandwidth in spectral domain and needs to be mostly compact around a center pulse identified along with the decomposition. The bandwidth of every model will be computed in steps as: firstly for a mode , an associated analytical signal is calculated through the method of Hilbert transform, * .
(1) * denotes the convolution and δ is Dirac distribution. The frequency spectrum is then transferred to its respective central frequency estimated, Finally, H1 Gaussian smoothness of the demodulated signal is utilized to compute the mode bandwidth, which is squared L2 norm of the gradient. After the bandwidth estimation, the resulting constrained variational problem is expressed as where X denotes the decomposed original data, K is the number of modes, is the set of IMFs , , ⋯ , and is the central pulsation set i.e., , , ⋯ , . By combining the quadratic penalty term with Lagrange multipliers, the constrained problem could be converted into an unconstrained problem. The discussion is as follows: where λ(t) is Lagrangian multiplier and α represents the balance parameter of the data fidelity constraint. In order to deal with this problem, the alternate direction multiplier method (ADMM) is employed to solve the saddle point of the augmented Lagrangian. It is believed that bi-directional update of and is helpful to the analysis process of VMD, and the solutions of and is expressed as follows: where , , , and k (ω) denote the Fourier transforms of , , and respectively and n refers to the total iterations number. Figure 1. Architecture of ELM model.

Kernel-based extreme learning machine (KELM)
Extreme learning machine (ELM) [9] is an improved learning algorithm of single hidden layer feed forward neural network (SLFN). Figure 1 shows its architecture. ELM randomly selects weights between the input layer and the hidden layer without iterative learning, and determines the weights between the hidden layer and output layer by matrix inversion. The following represents the output function of ELM with L hidden node: where β , , ⋯ , is the vector of the output weight that connects the hidden nodes to the output nodes. h x is ELM feature mapping function that maps the data from the N-dimensional input space to the feature space of L-dimensional hidden layer. ℎ 1,2,3, ⋯ , , 1,2, ⋯ , denotes randomized matrix in the hidden layer of neural network. The output weights β is determined by the least square (LS) approach: (9) where the norm of is the minimum and unique among all the LS solutions of the linear system β (Eq. (8)). , , ⋯ , indicates the target matrix. represents the Moore-Penrose generalized inverse [9] of output matrix H for the hidden layer, which is given as To get more stable generalization, a regularization parameter C is usually added to HHT diagonal, and the output weight β is computed by: Nevertheless, ELM might face the problems of time-consuming and poor stability caused by randomly parameters assignment. A kernel extreme learning machine (KELM) was developed [13] combining the kernel function theory with ELM. The random feature mapping in ELM is replaced by the kernel mapping, and the kernel matrix based on Mercer Theorem is presented by , and , ℎ ℎ .
Hence, the output function can be written as: The five kernel functions that meet the Mercer condition include: Sigmoid kernel, Polynomial kernel, Radial basis function kernel and Wavelet kernel etc. This paper selects Radial basis function (RBF) kernel as it can realize the nonlinear mapping and improve the generalization capabilities of ELM [13,14], which is given: The optimal regularization factor C and kernel width σ are evaluated in trial and error. VMD-KELM follows a typical decomposition-ensemble training paradigm, which consists of three major processes, that is, data decomposition through the VMD technique, individual forecasting by the KELM algorithm and results integration through linear aggregation. Figure 2 displays the schematic depiction of implement procedures for VMD-KELM model. Specifically, it can be achieved briefly in the following steps: 1) Data decomposing. The historical data series X , 1,2, ⋯ , is separated by VMD technique into an ensemble of modes | 1,2, ⋯ , , each of which will be a new time series that KELM is prepared to forecast separately.

VMD-KELM hybrid model framework
2) Individual forecasting. The KELM is introduced to predict all the extracted IMF series. For each series , it is split into training and testing set. The exact KELM model is constructed based on the training data, which is further employed to predict the testing dataset. Through the KELM learning process, the prediction output is obtained. 3) Results ensemble. All the individual forecasted outputs are added linearly to form the final integrated prediction results ∑ . To illustrate the process more clearly, the pseudo-code of model is described as follows: Algorithm //the meaning of letters is in the Note. // The input nodes are N. //Data decomposition 1) Using VMD to decompose the original data series , the decomposition process is: | , , ⋯ ; , , ⋯ , . //Individual forecasting 2) For each | , , ⋯ , ; , , ⋯ , is as input to train the model. //Forecasting the crude oil price from day to . 3) count ⇐ 1. // prediction counter, which is a temporary variable. Repeat 4) count ⟸ count + 1. 5) Using the well-trained model to predict the th day's value of crude oil prices. This can be written as: is the length of training set of X t , is the length of testing set of X t . Model() is the well-trained model, is the modes decomposed by VMD, K is the number of decomposed mode, Output is the prediction value of the well-trained model.

Performance evaluation
Commonly-used metrics This work adopts seven commonly-used criteria to examine the robustness and superiority of the model from different aspects. Table 1  and correlation coefficient R are selected to measure the level accuracy and Dstat is used to measure the directional accuracy. The better performance corresponds to smaller MAE, MAPE, RMSE and TIC, larger R and Dstat. The closer , and to 0, the smaller the difference is between the two models. In this paper, VMD-KELM model is taken as a benchmark method. Table 1. Commonly-used performance evaluation metrics.
Note: T is the data length, is the real data, and is the predicted data. , , denote the criteria of the VMD-KELM.

Diebold-Mariano (DM) test
To illustrate the superiority of the proposed model from statistical perspective, we apply DM test to evaluate the predicting performance of VMD-KELM against other models [47]. The DM test investigates the null hypothesis of forecast accuracy equality against the alternative of different forecasting capabilities between the target model A and its benchmark model B. The DM statistic is written as: , and , denote the predicted value of by the forecasting model A, B respectively.

Multi-scale composite complexity synchronization
Multi-scale composite complexity synchronization (MCCS) is a recently-used new method for measuring the synchronization of two data, which can be used to evaluate the synchronization degree between the prediction results and the original time series [24]. MCCS algorithm combines the theory of sample entropy (SampEn) and complexity-invariant distance (CID), which can be briefly described in the following steps:

1) Given
, , ⋯ , and , , ⋯ , are the real data and predicted data with length T respectively, the generalized complex-invariant distance (GCID) between them is calculated in the following process: where q is set to 2 here, which denotes the power exponent.
2) Calculate the composite complexity synchronization (CCS) between and combining SampEn and GCID. Firstly, the SampEn is computed for (the same for ), where m is the space-dimension set as 2 and ∈ is the tolerance that equals to k 0.1 k 0.25 times the standard deviation of the data. Lastly, CCS is measured as: 3) Compute the MCCS values. MCCS approach considers the multiple time scales of CSS.
Firstly for and , the coarse-grained sequences with scale factor τ , X τ , , ⋯ , and τ , , ⋯ , can be obtained respectively: where is the number of coarse-grained sequences that are separated from the original sequence for any τ . Then, MCCS between actual and predicted data and is The smaller values of MCCS are, the high the synchronization of two time series is.

Data description and preparation
The daily closing prices and volatility of the two typical energy, the Brent crude oil spot and WTI crude oil spot are selected as the prediction samples. The original prices dataset are comprised of 2000 daily observations for Brent oil from Oct 07, 2013 to Aug 16, 2021, and 2000 daily observations of WTI oil from Aug 28, 2013 to Aug 16, 2021, which are gathered from the energy information administration (EIA). The realized volatility at time t for the daily prices is calculated by: where is the number of days remaining after time t , is the logarithmic returns of daily prices, defined as log log 1,2, ⋯ , and ̅ is the average mean of . The realized volatility is the value obtained by observing how much the crude oil price has changed during days, which is considered as the historical volatility. We take 7-day volatility as the prediction target. Figure 3 shows the evolution dynamics of daily closing prices and 7-day volatility. Further the dataset of daily prices and volatility are partitioned into the training set that accounts for first 80% of the samples of 1600 data points (1594 data points for volatility) and the testing set that accounts for the last 20% of the samples with 400 data points (399 points for volatility) respectively. That is, the training set of Brent crude oil prices lasts from Oct 07, 2013 to Jan 15, 2020, and testing set lasts from Jan 16, 2020 to Aug 16, 2021. For WTI oil, the prices dataset is trained from November 29, 2011 to Jan 10, 2020 and tested from Jan 11, 2020 to December Aug 16, 2021.

Forecasting prices by different models
In this subsection, we employ the proposed VMD-KELM hybrid model to make a prediction of crude oil prices. The performance analysis of VMD-KELM is conducted by comparing it with other three type approaches, including single models (KELM, ELM and back propagation neural network (BPNN)), VMD-based models (VMD-ELM and VMD-BPNN) and EEMD-based models (EEMD-KELM, EEMD-ELM and EEMD-BPNN). According to the "decomposition and ensemble" learning paradigm, firstly the prices are decomposed by VMD technique. For comparison, the number of decomposed components K by VMD is set to be the same as that obtained by the EEMD technique [50], which can adaptively decompose the original series data without any pre-set parameters. Both Brent and WTI oil, have 11 decomposed IMFs representing different local oscillations embodied in the price series. Later, the corresponding KELM prediction model is constructed for each composed IMF subseries. In parameters setting of the KELM model, a historical lag of order 5 is taken for the energy price series, which means there are five input nodes. It is determined by autocorrelation and partial correlation analysis. For KELM, when the input node is limited, the number of hidden layer nodes will be the same as the input nodes number under the effect of kernel function, which is also 5. So we also set hidden layer nodes as 5 for ELM and BPNN as an experimental comparison. That is, the 5 x 5 x 1 (where the input and output parameters are numeric) is set for all the prediction models. Suitable parameters of the regularization C and kernel width σ in KELM are selected basically based on trials and errors approach. C is searched within the range [10, 100] with the interval 10 and it finds the value of 100 for all the models. The kernel width σ is searched ranging from [0, 1] with the interval 0.1. Suitable σ of 0.1, 0.3 and 0.1 is found for KELM, EEMD-KELM and VMD-KELM respectively for Brent oil prices, while that of 1, 0.9 and 0.2 for WTI oil prices. Figure 4 demonstrates comparisons between the predicted results and every original subcomponent on the testing set for Brent crude oil prices by VMD-KELM. Roughly, the prediction results of the subsequences with lower frequencies are closer to the true values than those with higher frequencies, which shows that VMD-KELM algorithm seems to have a higher prediction accuracy for low-frequency information. Table 2 further lists the MAPE values of VMD-KELM predicting all the IMF series of Brent and WTI oil, in which the results of other hybrid models are also listed for comparison. It can be seen clearly that the smoothest IMF1 with the lowest frequency characteristics has the lowest MAPE value in almost all sequences, indicating that the hybrid model has a better prediction effect on low frequency data than high frequency data. Taking Brent crude oil as an example, VMD-KELM, VMD-ELM and VMD-BPNN models have the maximum MAPE value for IMF7 prediction, while IMF11 produces the maximum MAPE value in EEMD-based hybrid models.
continued on next page   Figure 5 shows the predictive results about testing data of Brent and WTI oil price for different models. Due to COVID-19, crude oil prices fluctuated sharply around August 20, 2020. The curves are all gathered together, indicating that the predicted curve of each model is highly close to the real price curve. According to enlarged view, we can see that the VMD-KELM has a good performance of the large fluctuation. The prediction value of VMD-KELM is much closer to the true value. So, we can boldly guess that target model has the highest accuracy for crude oil price prediction.     Note: IMF1-IMF11 of EEMD, which originally corresponds to the high-frequency to low-frequency, is arranged in reverse and here represents the low-frequency to high-frequency to keep consistent with results of VMD.
The boxplots of the relative forecasting errors for each model are shown in Figures 6 and 7. The prediction errors of single model are much larger than those of hybrid model. The median of the VMD-KELM model is closest to 0, and the absolute values of the upper and lower quartiles are the smallest. The smallest values also appear for the absolute values of the upper and lower limits, the number of outliers as well. All above indicate that the relative error of the target model is relatively smaller and more concentrated, illustrating the closest predicted values of VMD-KELM model to the true values. As for WTI crude oil, similar results are exhibited and it is evidently observed that single models have larger predictive errors than the hybrid models.
In order to quantitatively measure the predictive effect of each model, various evaluation metrics are calculated in Table 3, and the bar graphs of them are exhibited in Figure 8. In the figure, the heights of MAE, MAPE, RMSE, TIC bars for the single models are almost twice than those for the hybrid models, while the heights of R and bars are quite lower for the single models. It indicates that the hybrid models based on EEMD algorithm and VMD algorithm have better performances than single models. Specifically, taking three KELM-related models for WTI crude oil as an example (in Table 3 29.4% and TIC value about 29.5%, but increases R about 1.5% and about 9.2%. These illustrate that the prediction accuracy of each model is really improved by decomposing the original time series. Therefore, it is meaningful to choose the decomposition-ensemble prediction strategy in this paper. Besides, the evaluation criteria of VMD-based hybrid model are smaller than those of EEMD-based hybrid model. Taking Brent oil as an example, MAPE value of VMD-KELM, VMD-ELM and VMD-BPNN model is respectively reduced about 28.5, 27.6 and 20.1% compared to that of the corresponding EEMD-KELM, EEMD-ELM and EEMD-BPNN model. It manifests that VMD algorithm can better eliminate the noise of the original data than EEMD algorithm, resulting in a better prediction accuracy. Moreover, KELM model performs best among three single models, illustrating that KELM model which is chosen as the basic target model is appropriate. Among the three VMD models, the first four evaluation criteria of the target model hold the smallest values, while the latter two are slightly higher, suggesting the best prediction of VMD-KELM in forecasting crude oil price. The , and values of each model can be intuitively Table 4, where VMD-KELM is taken as the benchmark. These metrics quantitatively show how much higher is the error of each model than that of VMD-KELM. Taking MAPE of Brent crude oil as an instance, VMD-KELM improves the prediction accuracy of VMD-ELM, VMD-BPNN, EEMD-KELM, EEMD-ELM, EEMD-BPNN and single models by 10.83, 50.21, 30.00, 35.95, 56.56% and more than 60% respectively. Besides, VMD-ELM show the smallest , and than other models, implying that the forecasting performance of VMD-ELM is only second to VMD-KELM in Brent price prediction, meanwhile VMD-KELM is only second to VMD-KELM in WTI price prediction.  The DM test is further performed to verify the superiority of the proposed model VMD-KELM against the compared models from a statistical point of view. The results are listed in Table 5. For Brent data, the VMD-KELM as target model, the p-values (except for VMD-ELM) are much smaller than the significance level of 1%, demonstrating that the VMD-KELM model has statistically better prediction effectiveness under the confidence level 99%. Compared with VMD-ELM, the VMD-KELM is statistically significant better at confidence level of 90%. Besides, as for the WTI data, the confidence level reaches 95%. The results statistically confirm the superiority of VMD-KELM. Furthermore, the MCCS is applied as an accuracy evaluation method to describe the synchronization degree between the predictive and the actual data series. Figure

Forecasting volatility by different models
In this subsection we will investigate the prediction effects of the referred nine models on the 7-volatility of crude oil. From the prediction result of volatility, it can be understood whether the proposed model has scalability. Figure 10 depicts the 7-day realized volatility and predictive volatility of energy prices on testing set for different models. It is seen that the realized volatility has many fluctuations making prediction more difficult relative to the price. All the forecasting curves in the graph are very compact and highly close to the curves of the actual volatility. Through the enlarged view of the peak, it is observable that VMD-KELM fits the sharp volatility caused by the COVID-19 very well. The prediction value of VMD-KELM is closest to the actual value. Besides, the ELM model exceeds the true value of the peak, while BP is much lower than the true value of the peak. We can guess that the target model of VMD-KELM has a better prediction effect in prediction of volatility series when the fluctuation is large. Figure 10. Predictive value of volatility of Brent and WTI crude oil for different models. Figure 11 shows bar graphs of the performance evaluation metrics for each model intuitively. It is significantly obvious that the first four criteria of single models are quite higher than those of hybrid models, while the latter two are lower. It implies that overall prediction effect of all hybrid models is superior to that of single models. . It illustrates that prediction accuracy of each model for volatility is really improved when the volatility time series is decomposed by the VMD decomposition algorithm. Through the observation, it can be seen that a single KELM model has a good predictive effect on extreme data. Therefore, the decomposition-ensemble prediction strategy is a promising strategy for predicting volatility. The error of the VMD-based hybrid model is smaller than that of the EEMD based hybrid model. Taking Brent volatility for instance, the MAPE value of the VMD-KELM model (7.4215) is reduced about 13.1% compared to the EEMD-KELM model (8.5444), the MAPE value of the VMD-ELM model (7.9708) is reduced about 12.8% compared to the EEMD-ELM model (9.1458), and the VMD-BPNN model's MAPE value (13.1988) is reduced about 32.7% compared to the EEMD-BPNN model (19.6297). It shows that the VMD algorithm can better extract the features of the original time series, bringing about a higher prediction result. Besides, the errors of VMD-KELM are the smallest. Among the three VMD models, VMD-KELM has the smallest level accuracy measure of MAPE and the largest direction accuracy measure of . For WTI crude oil, the conclusions are similar. Therefore, we can conclude that VMD-KELM has a superiority in prediction of crude oil volatility.

Conclusions
The prediction of crude oil price and volatility is of great significant for guidance for investors and government decision-making. To improve their accuracy of prediction, this paper aims to introduce an effective decomposition-ensemble model, VMD-KELM, to predict both crude oil price and volatility. In the model, the VMD method is used to decompose the original series into several subseries with different frequencies, and then each subseries is predicted by the kernel extreme learning machine. Finally, the prediction result is obtained by summarizing the results for the subsequences. Brent, WTI daily prices and their 7-day volatility series are applied to evaluate the forecasting accuracy. The following aspects of conclusions are obtained: 1) The VMD-KELM model which combines the advantage of variational mode decomposition and kernel extreme learning machine, possesses more robust decomposition ability and strong prediction ability. 2) Compared to the ELM and BPNN algorithm, KELM model shows a better prediction performance with relatively low values of MAE, MAPE, RMSE, TIC and higher values of R and .
3) The superiority of the decomposition-ensemble strategy is demonstrated by the fact that the prediction accuracies of the hybrid models are greatly higher than those of the single models. Besides, the VMD-based model has a better prediction effect than its EEMD-based model, especially for forecasting the high-frequency time series. 4) The proposed VMD-KELM model shows a more powerful ability than other models in improving the precision of forecasting crude oil volatility. However, this work only considers the impact of historical price or volatility data on the crude oil forecasting. There are many other potential factors (including structured and non-structured text data etc.) that affect crude oil price and volatility. We think they are worth for further being analyzed and considered and hope it will be instructive for further research.