A Hybrid Forecast Model of EEMD-CNN-ILSTM for Crude Oil Futures Price

: Crude oil has dual attributes of ﬁnance and energy. Its price ﬂuctuation signiﬁcantly impacts global economic development and ﬁnancial market stability. Therefore, it is necessary to predict crude oil futures prices. In this paper, a hybrid forecast model of EEMD-CNN-ILSTM for crude oil futures price is proposed, which is based on Ensemble Empirical Mode Decomposition (EEMD), Convolutional Neural Network (CNN), and Improved Long Short-Term Memory (ILSTM). ILSTM improves the output gate of Long Short-Term Memory (LSTM) and adds important hidden state information based on the original output. In addition, ILSTM adds the learning of cell state at the previous time in the forget gate and input gate, which makes the model learn more fully from historical data. EEMD decomposes time series data into a residual sequence and multiple Intrinsic Mode Functions (IMF). Then, the IMF components are reconstructed into three sub-sequences of high-frequency, middle-frequency, and low-frequency, which are convenient for CNN to extract the input data’s features effectively. The forecast accuracy of ILSTM is improved efﬁciently by learning historical data. This paper uses the daily crude oil futures data of the Shanghai Energy Exchange in China as the experimental data set. The EEMD-CNN-ILSTM is compared with seven prediction models: Support Vector Regression (SVR), Multi-Layer Perceptron (MLP), LSTM, ILSTM, CNN-LSTM, CNN-ILSTM, and EEMD-CNN-LSTM. The results of the experiment show the model is more effective and accurate.


Introduction
Crude oil performs an essential role in human survival and development. It is a vital strategic resource and one of the most important energy sources [1]. Since China entered the 21st century, rapid industry development requires more crude oil and energy security has become increasingly important [2]. Therefore, if the crude oil price changes significantly, it will impact the economic development and financial market of China and even the whole world in a short time [3,4].
The development of machine learning has provided possibilities for predicting nonlinear and medium to long-term time series data, thus compensating for the limitations of traditional econometric models. A traditional fully connected neural network has many parameters and quickly loses time series information. The Recurrent Neural Network (RNN) has realized the extraction of information from the time dimension and has the memory ability of early data. In recent years, this model has achieved significant breakthroughs in speech recognition [5], text recognition [6], machine translation [7], and time series data analysis [8]. Still, RNN has apparent shortcomings, namely, "gradient vanishing" and "gradient explosion". When the time series data's length is longer, the independent variable of the activation function is smaller or larger, and the weight adjustment effect is lost in the process of product transmission of the chain rule. LSTM improves the shortcomings of RNN by gated mechanism.
(1) This paper applies the signal decomposition method to time series prediction. The IMF components are integrated into low-frequency, medium-frequency, and highfrequency according to the Zero-Crossing Rate (ZCR). Thus, the number of components input into the model is fixed, which ensures the high availability of the model and solves the problem of long training time for too many input components. (2) By studying the structure and principle of RNN and LSTM models, this paper proposes ILSTM. ILSTM adds the calculation of the cell state at the previous time to the forget gate and input gate of LSTM and improves the output gate by adding important hidden state information on the basis of the original output so that the model can learn more fully from historical data. (3) This paper proposes a hybrid model of crude oil futures price prediction based on EEMD-CNN-ILSTM. The introduction of EEMD helps CNN to extract the features of different frequency signals better. Through comparative experiments, it is verified that the hybrid model based on EEMD-CNN-ILSTM for crude oil futures price prediction has the highest prediction accuracy and is superior to the other seven prediction models.

Related Work
EMD and EEMD are practical tools for analyzing nonlinear and non-stationary signal sequence data. For instance, Yu et al. forecasted crude oil prices using an EMD-based neural network [10]. Chen et al. used EMD decomposition to forecast China containerized freight index [11]. Shu et al. combined EMD decomposition with CNN and LSTM to forecast stock prices [12]. Wu et al. predicted crude oil prices by EEMD and LSTM hybrid model [13]. Tang et al. used EEMD combined with a randomized algorithm to predict oil and natural gas prices [14]. Through reading relevant papers, it is found that the decomposition and integration method is usually adopted in the application of EEMD, and its feasibility in time series data prediction is also confirmed [15]. However, in practice, when a new day's data is inputted, the number of IMF components decomposed by EEMD may change, meaning the model needs to be retrained. In this paper, IMF components are combined into three categories of low-frequency, medium-frequency, and high-frequency according to the ZCR for training, thus improving the model's usability and reducing its training time.
In recent decades, many domestic and foreign researchers used traditional statistical and econometric methods to predict crude oil prices. For example, the autoregressive integrated moving average model [16], generalized autoregressive conditional heteroscedasticity [17], vector autoregressive [18], random walk [19], and error correction model [20] were used to forecast crude oil price. However, traditional statistical methods cannot thoroughly study the nonlinear and non-stationary characteristics of time series data, and the forecast accuracy is relatively poor.
The artificial neural network can learn complex nonlinear data, which solves the shortcomings of traditional statistical methods in forecasting time series data [21]. RNN can remember previous data, so it can be used for time series forecasting. However, the RNN cannot learn the long-term dependencies in the data, and the problems of "gradient disappearance" and "gradient explosion" will occur [22].
LSTM effectively overcomes the shortcomings of RNN and adopts gated technology to learn long-term dependencies in the data. For instance, Manowska et al. used LSTM to forecast crude oil consumption [23], and Liu et al. used LSTM to predict and analyze renewable energy supply risk [24]. Due to the prediction model being single, extracting the data feature is often insufficient, and it is challenging to achieve a high-precision forecast. Yang et al. used EEMD, LSTM, linear regression, and Bayesian ridge regression methods to prove the superiority of decomposing the original data into multiple sub-sequence values by EEMD, then using LSTM to predict each subsequence value, respectively, for final integration [25]. CNN can be used in predicting time series data. Sun et al. proposed a CNN-LSTM soybean yield prediction model, which confirmed that the hybrid model combined with CNN was better than a single LSTM model [26]. Zhang et al. used the CNN-LSTM hybrid model to predict air quality [27]. Kim and Sun, and Zhang et al. used the feature extraction of CNN to further improve the prediction accuracy.

EEMD
In 1998, Huang and Shen et al. proposed the EMD method, which can be used to decompose time series data [28]. EMD decomposes a set of time series data into several IMF components, each representing the change frequency and trend of the original data on different scales. However, EMD has the disadvantage of mode aliasing. The so-called mode aliasing means that there are signals with different frequencies or scales in an IMF component, or signals with the same frequency or scale are decomposed into different IMF components. Due to the above problems, in 2009, Wu and Huang et al. [29] put forward EEMD using the noise-assisted data analysis method. EEMD can decompose the initial time series data into one residual sequence (Res) and n IMF components. Noise amplitude and ensemble numbers in EEMD decomposition are crucial parameters that affect modal decomposition. According to the previous research results [30,31] and crude oil futures data characteristics, the noise amplitude is set to 0.04, and ensemble numbers are set to 100. Finally, we use the daily closing price data of crude oil futures from 26 March 2018 to 18 November 2022, in the Shanghai Energy Exchange of China as the experimental data set. Figure 1 shows the decomposition of the closing price into one residual sequence and eight IMF components. experimental data set. Figure 1 shows the decomposition of the closing price into one residual sequence and eight IMF components. As the data decomposed by EEMD is historical, considering the input of future data, the number of decomposition results may change, which means that the model trained after decomposing historical data may need help to deal with the input data in the future. At the same time, considering that the more components, the more time it will take. Therefore, this paper calculates the ZCR [32] of each component by where x(n) is the n-th data value, N is the total length of a frame of data, and sgn () is formulated as: ZCR represents the ratio of sign change in each component data, which can approximately reflect the fluctuation frequency of data. As shown in Table 1, the ZCR of IMF1 is higher than 40% as a high-frequency component, the ZCR of IMF2 is between 10% and 40% as medium-frequency component, and the ZCR of IMF3~IMF8 is less than 10% as low-frequency components [33]. Xu et al. utilized the EMD-CNN-LSTM composite model to predict short-term power load by categorizing the IMF components and residual sequence into three categories: high-frequency, medium-frequency, and low-frequency [33]. On the other hand, Li. employed the EEMD-ARIMA-LSTM combination model to forecast crude oil futures prices by categorizing the IMF components into high-frequency and lowfrequency, then adding a residual sequence, resulting in a total of three categories [34]. Experimental results, as shown in Table 2, demonstrate that the EEMD-CNN-ILSTM As the data decomposed by EEMD is historical, considering the input of future data, the number of decomposition results may change, which means that the model trained after decomposing historical data may need help to deal with the input data in the future. At the same time, considering that the more components, the more time it will take. Therefore, this paper calculates the ZCR [32] of each component by where x(n) is the n-th data value, N is the total length of a frame of data, and sgn () is formulated as: ZCR represents the ratio of sign change in each component data, which can approximately reflect the fluctuation frequency of data. As shown in Table 1, the ZCR of IMF1 is higher than 40% as a high-frequency component, the ZCR of IMF2 is between 10% and 40% as medium-frequency component, and the ZCR of IMF3~IMF8 is less than 10% as low-frequency components [33]. Xu et al. utilized the EMD-CNN-LSTM composite model to predict short-term power load by categorizing the IMF components and residual sequence into three categories: high-frequency, medium-frequency, and low-frequency [33]. On the other hand, Li. employed the EEMD-ARIMA-LSTM combination model to forecast crude oil futures prices by categorizing the IMF components into high-frequency and low-frequency, then adding a residual sequence, resulting in a total of three categories [34]. Experimental results, as shown in Table 2, demonstrate that the EEMD-CNN-ILSTM model performs best when utilizing the four components of low-frequency, medium-frequency, high-frequency, and a residual sequence. Therefore, the number of the prediction model's input data components is reduced from nine to a fixed number of four. Only four components of low-frequency, medium-frequency, high-frequency, and residual sequence need to be trained, and the training time is greatly reduced, as shown in Figure 2. model performs best when utilizing the four components of low-frequency, medium-frequency, high-frequency, and a residual sequence. Therefore, the number of the prediction model's input data components is reduced from nine to a fixed number of four. Only four components of low-frequency, medium-frequency, high-frequency, and residual sequence need to be trained, and the training time is greatly reduced, as shown in Figure 2.

CNN
CNN has a good feature extraction function. Three-dimensional convolution can extract temporal and spatial features at the same time, which can be used for behavior recognition and video processing. Two-dimensional convolution can be used to extract local

CNN
CNN has a good feature extraction function. Three-dimensional convolution can extract temporal and spatial features at the same time, which can be used for behavior recognition and video processing. Two-dimensional convolution can be used to extract local features of pictures, which can be used in the field of image processing. One-dimensional convolution has only one convolution direction, which can be used in a sequence model. In this model, CNN can extract the frequency features of low-frequency, medium-frequency, high-frequency, and residual sequence, then input them into ILSTM. Its convolution operation C is formulated as: where b is the bias vector, x is the input feature, w is the weight of a one-dimensional convolution kernel, f is the activation function, and ⊗ denotes the convolution operation.

ILSTM
LSTM has advantages in analyzing long-term dependencies of time series data because it adopts the ideas of gate and cell state and avoids the long-term dependencies of RNN. There are three gates in LSTM: the forget gate, the input gate, and the output gate. Figure 3 illustrates the structural unit of LSTM.
features of pictures, which can be used in the field of image processing. One-dimensional convolution has only one convolution direction, which can be used in a sequence model. In this model, CNN can extract the frequency features of low-frequency, medium-frequency, high-frequency, and residual sequence, then input them into ILSTM. Its convolution operation C is formulated as: where is the bias vector, is the input feature, is the weight of a one-dimensional convolution kernel, is the activation function, and  denotes the convolution operation.

ILSTM
LSTM has advantages in analyzing long-term dependencies of time series data because it adopts the ideas of gate and cell state and avoids the long-term dependencies of RNN. There are three gates in LSTM: the forget gate, the input gate, and the output gate. Figure 3 illustrates the structural unit of LSTM. The structural unit of ILSTM is shown in Figure 4. Compared with LSTM, ILSTM introduces a candidate hidden state ℎ in the output gate, stores the original hidden state information into ℎ , then adds × ℎ (ℎ ) . Such calculations can perfect the information with greater weight in the hidden state. In addition, the previous cell state is added to the input gate and the forget gate, which affects the data update. The detailed calculation process is as follows [23].  The structural unit of ILSTM is shown in Figure 4. Compared with LSTM, ILSTM introduces a candidate hidden state h t in the output gate, stores the original hidden state information into h t , then adds i t × tanh h t . Such calculations can perfect the information with greater weight in the hidden state. In addition, the previous cell state C t−1 is added to the input gate and the forget gate, which affects the data update. The detailed calculation process is as follows [23].
features of pictures, which can be used in the field of image processing. One-dimensional convolution has only one convolution direction, which can be used in a sequence model. In this model, CNN can extract the frequency features of low-frequency, medium-frequency, high-frequency, and residual sequence, then input them into ILSTM. Its convolution operation C is formulated as: where is the bias vector, is the input feature, is the weight of a one-dimensional convolution kernel, is the activation function, and  denotes the convolution operation.

ILSTM
LSTM has advantages in analyzing long-term dependencies of time series data because it adopts the ideas of gate and cell state and avoids the long-term dependencies of RNN. There are three gates in LSTM: the forget gate, the input gate, and the output gate. Figure 3 illustrates the structural unit of LSTM. The structural unit of ILSTM is shown in Figure 4. Compared with LSTM, ILSTM introduces a candidate hidden state ℎ in the output gate, stores the original hidden state information into ℎ , then adds × ℎ (ℎ ) . Such calculations can perfect the information with greater weight in the hidden state. In addition, the previous cell state is added to the input gate and the forget gate, which affects the data update. The detailed calculation process is as follows [23].  The memory cell of LSTM receives the output value h t−1 at the previous time, and the current input value x t to enter the forget gate and obtains the information f t that needs to be forgotten.
Compared with LSTM, C t−1 is added to the forget gate algorithm of ILSTM, which impacts the current data forgetting. The forget gate algorithm of ILSTM is formulated as: where C t−1 is the cell state at the previous time, σ is the sigmoid activation function, b f is the bias of the forget gate, and W f is the weight of the forget gate. The input gate i t of the LSTM can update the cell state, and i t determines how much data can enter the memory cell C t . Then, the current input data x t and the previous hidden state information h t−1 are operated by the tan h function to obtain a new candidate value C t .
Compared with LSTM, C t−1 is added to the input gate of ILSTM. The input gate algorithm of ILSTM is formulated as: where b i is the bias of the input gate, C t−1 is the cell state of the previous time, W i is the weight of the input gate, W c is the weight of the candidate cell state, and b c is the bias of the candidate cell state. By increasing the operation of C t−1 , the input gate provides greater weight for retaining current data. Then, the output values of i t and f t are combined to obtain the current cell state information C t formulated as: The output gate o t of the LSTM determines the part of the output. The equation tanh(C t ) scales the cell state value to [−1, 1], preventing gradient disappearance.
Compared with LSTM, ILSTM introduces h t , which can store the original hidden state information, then add the equation i t * tanh h t , thus perfecting the hidden state information. The equation tanh h t is used to extract essential hidden state information. i t can determine the crucial hidden state information to be retained. The output gate algorithm of ILSTM is formulated as: where b o is the bias of the output gate, W o is the weight of the output gate.
In summary, based on LSTM, ILSTM increases the calculation of C t−1 for the forget gate and the input gate, introduces h t into the output gate, then uses i t * tanh h t to further extract the data features. Here, the i t from the input gate controls the weight allocation of the input information in the cell state. The tanh function helps alleviate the gradient vanishing problem. Therefore, adding i t * tanh h t to the original output determines which important parts of the hidden state will be passed to the model's output or the next time step. Thus, ILSTM has better prediction effectiveness than LSTM.

EEMD-CNN-ILSTM
Crude oil futures data exhibits characteristics, such as periodic fluctuations, nonlinearity, and non-stationarity. The EEMD-CNN-ILSTM model demonstrates strong nonlinear modeling capabilities and the ability to learn complex relationships within sequences. Figure 5 shows the overall structure diagram of EEMD-CNN-ILSTM. The model mainly consists of seven parts. The input layer is the first part. In this paper, crude oil futures data is the research object, and the closing price of WTI, DJIA, and USDX as the influencing factors. The EEMD decomposition layer is the second part, which decomposes the crude oil futures closing price into components of various frequencies by EEMD, then reconstructs them into four components: high-frequency, medium-frequency, low-frequency, and a residual sequence for subsequent input to the neural network. We combine the Open, High, Low, Settle, Close, USDX, DJIA, and WTI with one of the high-frequency, medium-frequency, low-frequency, and residuals to form nine characteristics, respectively. We inputted the nine characteristics into the CNN-ILSTM model to predict the crude oil futures closing price at high frequency, medium frequency, low frequency, and residuals separately. The data preprocessing layer is the third part, which involves standardizing the data, performing preprocessing operations, and constructing three-dimensional time series data. These steps help ensure that the data remains within the same order of magnitude. The feature extraction layer is the fourth part, which uses the CNN convolution operation to extract hidden features from four components. The prediction layer is the fifth part, which uses the optimized ILSTM model to predict the four components. The sixth part is the combination layer, which summarizes and sums the prediction results of each component. The last is the output layer, which carries out the reverse scaling of the summarized results to get the final prediction result.

EEMD-CNN-ILSTM
Crude oil futures data exhibits characteristics, such as periodic fluctuations, nonlinearity, and non-stationarity. The EEMD-CNN-ILSTM model demonstrates strong nonlinear modeling capabilities and the ability to learn complex relationships within sequences. Figure 5 shows the overall structure diagram of EEMD-CNN-ILSTM. The model mainly consists of seven parts. The input layer is the first part. In this paper, crude oil futures data is the research object, and the closing price of WTI, DJIA, and USDX as the influencing factors. The EEMD decomposition layer is the second part, which decomposes the crude oil futures closing price into components of various frequencies by EEMD, then reconstructs them into four components: high-frequency, medium-frequency, low-frequency, and a residual sequence for subsequent input to the neural network. We combine the Open, High, Low, Se le, Close, USDX, DJIA, and WTI with one of the high-frequency, medium-frequency, low-frequency, and residuals to form nine characteristics, respectively. We inpu ed the nine characteristics into the CNN-ILSTM model to predict the crude oil futures closing price at high frequency, medium frequency, low frequency, and residuals separately. The data preprocessing layer is the third part, which involves standardizing the data, performing preprocessing operations, and constructing three-dimensional time series data. These steps help ensure that the data remains within the same order of magnitude. The feature extraction layer is the fourth part, which uses the CNN convolution operation to extract hidden features from four components. The prediction layer is the fifth part, which uses the optimized ILSTM model to predict the four components. The sixth part is the combination layer, which summarizes and sums the prediction results of each component. The last is the output layer, which carries out the reverse scaling of the summarized results to get the final prediction result.

Experimental Environment
The experiment was carried out in Windows 10 operating system with Intel(R)Core(TM) i5-12400F, NVIDIA GTX1660S, and RAM 16.00 GB. The programming language used for the experiment was Python 3.8.0, and the compiler was PyCharm 2022 1.0 × 64. Anaconda 22.9.0 was used as the platform for deep learning training, and Keras 2.9.0 and TensorFlow 2.9.1 were used as the deep learning framework.

Data Collection and Analysis
This paper uses the crude oil futures data of the Shanghai Energy Exchange in China as the experimental data set. This experiment obtains data from https://tushare.pro/ (accessed on 19 November 2022). Tushare is an open community of big data, which provides abundant financial data, such as market data (including stocks, funds, futures, and digital cash) and fundamental data of companies (including corporate finance, and fund managers). In this experiment, DJIA, S&P 500, USDX, WTI closing price, Russell 2000, NASDAQ, and crude oil futures closing prices are selected for correlation analysis by PCC [35]. The correlation coefficient R can be calculated by: where x and x i are the average value and the i-th value of the candidate influencing factors, respectively, and y and y i are the average value and the i-th value of the crude oil futures' closing price, respectively. When the absolute value of PCC is closer to 1, the correlation is stronger. The PCC obtained are shown in Table 3. Finally, WTI closing price, USDX, and DJIA are selected as the relevant influencing factors. It can be found from Figures 6-8 that there is a strong correlation between them. All the data mentioned above is represented in daily units.

Experimental Environment
The experiment was carried out in Windows 10 operating system with Intel(R)Core(TM) i5-12400F, NVIDIA GTX1660S, and RAM 16.00 GB. The programming language used for the experiment was Python 3.8.0, and the compiler was PyCharm 2022 1.0 × 64. Anaconda 22.9.0 was used as the platform for deep learning training, and Keras 2.9.0 and TensorFlow 2.9.1 were used as the deep learning framework.

Data Collection and Analysis
This paper uses the crude oil futures data of the Shanghai Energy Exchange in China as the experimental data set. This experiment obtains data from h ps://tushare.pro/ (accessed on 19 November 2022). Tushare is an open community of big data, which provides abundant financial data, such as market data (including stocks, funds, futures, and digital cash) and fundamental data of companies (including corporate finance, and fund managers). In this experiment, DJIA, S&P 500, USDX, WTI closing price, Russell 2000, NASDAQ, and crude oil futures closing prices are selected for correlation analysis by PCC [35]. The correlation coefficient R can be calculated by: where ̅ and are the average value and the i-th value of the candidate influencing factors, respectively, and and are the average value and the i-th value of the crude oil futures' closing price, respectively.
When the absolute value of PCC is closer to 1, the correlation is stronger. The PCC obtained are shown in Table 3. Finally, WTI closing price, USDX, and DJIA are selected as the relevant influencing factors. It can be found from Figures 6-8 that there is a strong correlation between them. All the data mentioned above is represented in daily units.    Thus, the experiment dataset consisted of the Open, High, Low, Se le, and Close of crude oil futures, USDX, DJIA, and WTI from 26 March 2018 to 18 November 2022. There are a total of 1131 pieces of experimental data, of which the training data is the first 70% of the data set with 792 pieces, and the test set is the last 30% with 339 pieces.

Data Preprocessing and Scaling
Furthermore, 26 March 2018 is the first trading date of Shanghai crude oil futures data, so it is essential to align the data of influencing factors with crude oil futures data. In addition, due to the differences between the opening and closing dates at home and abroad, the initial data need to be filled. The missing value is filled using the previous two data. Finally, the influencing factors are combined with crude oil futures data, and Table  4 presents some of the samples of experimental data.   Thus, the experiment dataset consisted of the Open, High, Low, Se le, and Close of crude oil futures, USDX, DJIA, and WTI from 26 March 2018 to 18 November 2022. There are a total of 1131 pieces of experimental data, of which the training data is the first 70% of the data set with 792 pieces, and the test set is the last 30% with 339 pieces.

Data Preprocessing and Scaling
Furthermore, 26 March 2018 is the first trading date of Shanghai crude oil futures data, so it is essential to align the data of influencing factors with crude oil futures data. In addition, due to the differences between the opening and closing dates at home and abroad, the initial data need to be filled. The missing value is filled using the previous two data. Finally, the influencing factors are combined with crude oil futures data, and Table  4 presents some of the samples of experimental data.  Thus, the experiment dataset consisted of the Open, High, Low, Settle, and Close of crude oil futures, USDX, DJIA, and WTI from 26 March 2018 to 18 November 2022. There are a total of 1131 pieces of experimental data, of which the training data is the first 70% of the data set with 792 pieces, and the test set is the last 30% with 339 pieces.

Data Preprocessing and Scaling
Furthermore, 26 March 2018 is the first trading date of Shanghai crude oil futures data, so it is essential to align the data of influencing factors with crude oil futures data. In addition, due to the differences between the opening and closing dates at home and abroad, the initial data need to be filled. The missing value is filled using the previous two data. Finally, the influencing factors are combined with crude oil futures data, and Table 4 presents some of the samples of experimental data. When the difference between different feature values in the data set is significant, it is easy to cause the algorithm to fail to thoroughly learn the data features, thus reducing the model's training accuracy. Therefore, it is necessary that the choice of MinMaxScaler to process the data in this experiment [12]. The process of scaling features is formulated as: where X min and X max refer to the data's minimum boundary and maximum boundary, and X is each item in the data set. In this experiment, the feature is scaled to a specific interval [0, 1] so that all data are mapped to the same scale, thus accelerating the convergence speed of the model.

Constructing Time Series Data
The data used in this experiment should be transformed into three-dimensional time series data according to step size and sequence length. As shown in Figure 9, the sequence length is set to 4, and the step size is set to 1. The data length is 1131, the experimental data is constructed in three dimensions, and the data dimension (1128,4,9) represents the data divided into 1128 groups, 4 rows, and 9 columns. The dimension (1128, 4,9) represents a time series dataset with 1128 samples. Each sample consists of 4-time steps. Each time step corresponds to daily data. There are nine features available for each time step. The nine features are formed by combining the Open, High, Low, Settle, Close, USDX, DJIA, WTI, with one of the high-frequency, medium-frequency, low-frequency, and residuals. According to the time series data, the data from DATA1 to DATA4 form the first group. Then, the data from DATA2 to DATA5 form the second group, and so on. Finally, the training data set is the first 70% of the data set, and the test data set is the last 30%. When the difference between different feature values in the data set is significant, it is easy to cause the algorithm to fail to thoroughly learn the data features, thus reducing the model's training accuracy. Therefore, it is necessary that the choice of MinMaxScaler to process the data in this experiment [12]. The process of scaling features is formulated as: where and refer to the data's minimum boundary and maximum boundary, and is each item in the data set. In this experiment, the feature is scaled to a specific interval [0,1] so that all data are mapped to the same scale, thus accelerating the convergence speed of the model.

Constructing Time Series Data
The data used in this experiment should be transformed into three-dimensional time series data according to step size and sequence length. As shown in Figure 9, the sequence length is set to 4, and the step size is set to 1. The data length is 1131, the experimental data is constructed in three dimensions, and the data dimension (1128, 4,9) represents the data divided into 1128 groups, 4 rows, and 9 columns. The dimension (1128,4,9) represents a time series dataset with 1128 samples. Each sample consists of 4-time steps. Each time step corresponds to daily data. There are nine features available for each time step. The nine features are formed by combining the Open, High, Low, Se le, Close, USDX, DJIA, WTI, with one of the high-frequency, medium-frequency, low-frequency, and residuals. According to the time series data, the data from DATA1 to DATA4 form the first group. Then, the data from DATA2 to DATA5 form the second group, and so on. Finally, the training data set is the first 70% of the data set, and the test data set is the last 30%.

Parameter Tuning
The neural network parameters contain model hyper parameters and model parameters. Hyper parameters include epochs, batch size, learning rate, and the number of neurons in the hidden layer. Model parameters include bias, weight, and other values that can

Parameter Tuning
The neural network parameters contain model hyper parameters and model parameters. Hyper parameters include epochs, batch size, learning rate, and the number of neurons in the hidden layer. Model parameters include bias, weight, and other values that can be automatically obtained during training. In this experiment, we find a range of optimal hyper-parameters by experience. Then, the grid search algorithm is used to optimize further and adjust parameters to make the model have good generalization ability. The grid search method is an exhaustive search of all candidate parameter values to find all possible combinations, and the combination with the best result is taken as the final parameter. Try different combinations of all hyper-parameters in the given range of values until the optimal result is found. The batch size is 32, the step size is 4, the epochs is 50, and the number of neurons in the hidden layer is 64. Table 5 presents the detailed experimental parameters. Table 5. Model parameter.

Model
Layer Parameters

Model Comparison
To verify the prediction model's accuracy of the crude oil futures price based on EEMD-CNN-ILSTM, EEMD-CNN-LSTM, CNN-ILSTM, CNN-LSTM, ILSTM, LSTM, MLP, and SVR are used as comparison models. In this paper, we take R 2 , RMSE, and MAE as the criteria to evaluate the model's overall performance. Table 6 shows the evaluation results using the test data set.  Table 6 shows that the traditional regression model SVR has the worst prediction result, ILSTM is better than LSTM, CNN-ILSTM is better than CNN-LSTM, and EEMD-CNN-ILSTM has the best prediction effect. Figures 10-12 illustrate the values of MAE, RMSE, and R 2 , respectively. CNN-ILSTM has the best prediction effect. Figures 10-12 illustrate the values of MAE, RMSE, and , respectively.   CNN-ILSTM has the best prediction effect. Figures 10-12 illustrate the values of MAE, RMSE, and , respectively.         and increases from 0.9625 to 0.9648. The feature extraction function of CNN c ther improve the forecast accuracy, and ILSTM has improved the forecast accurac pared with LSTM. The predicted value of LSTM, ILSTM, CNN-LSTM, and CNNversus true value are shown in Figure 13.   To assess the reliability of the models, the Friedman analysis is applied [36]. EEMD-CNN-ILSTM is compared to seven other advanced models: EEMD-CNN-LSTM, CNN-ILSTM, CNN-LSTM, LSTM, ILSTM, MLP, and SVR. The average ranking R j is calculated to demonstrate that EEMD-CNN-ILSTM outperforms other models. The calculation process of R j is formulated as: where r j i represents the rank of the i-th model in the j-th performance measure. N represents the total number of models being compared. The final average ranking calculation results are shown in Figure 15.
CNN-ILSTM is compared to seven other advanced models: EEMD-CNN-LSTM, CNN-ILSTM, CNN-LSTM, LSTM, ILSTM, MLP, and SVR. The average ranking is calculated to demonstrate that EEMD-CNN-ILSTM outperforms other models. The calculation process of is formulated as: where represents the rank of the i-th model in the j-th performance measure. N represents the total number of models being compared. The final average ranking calculation results are shown in Figure 15. The calculation process of Friedman analysis is formulated as: where k is the number of models and is a random variable that follows a chi-squared distribution with k − 1 degrees of freedom. With a significance level of 0.05 and 7 degrees of freedom [36], the critical value was determined to be 14.067. The Friedman statistic was calculated as 16.0, which exceeded the critical value, indicating the presence of significant differences among the models.

Discussion
The results of this experiment show the comprehensive evaluation index of EEMD-CNN-ILSTM for crude oil futures price prediction is optimal. Compared with LSTM, the forecast accuracy of ILSTM is improved due to the calculation of in the input gate and forget gate and the improvement of the output gate. After CNN is combined with LSTM or ILSTM, the feature extraction of CNN convolution operation improves the prediction accuracy. After EEMD is combined with CNN-LSTM or CNN-ILSTM, the forecast precision is further improved because EEMD decomposes and reconstructs the original data.
The improvement of EEMD-CNN-ILSTM's forecast accuracy of crude oil futures price lies in as follows: (1) Decomposition of original data by EEMD. After EEMD decomposition, the low, middle, and high-frequency components of IMF and the residual sequence are input into CNN to extract hidden features, thus making up for the shortcoming of LSTM and ILSTM in extracting features. The calculation process of Friedman analysis is formulated as: where k is the number of models and X 2 r is a random variable that follows a chi-squared distribution with k − 1 degrees of freedom. With a significance level of 0.05 and 7 degrees of freedom [36], the critical value was determined to be 14.067. The Friedman statistic was calculated as 16.0, which exceeded the critical value, indicating the presence of significant differences among the models.

Discussion
The results of this experiment show the comprehensive evaluation index of EEMD-CNN-ILSTM for crude oil futures price prediction is optimal. Compared with LSTM, the forecast accuracy of ILSTM is improved due to the calculation of C t−1 in the input gate and forget gate and the improvement of the output gate. After CNN is combined with LSTM or ILSTM, the feature extraction of CNN convolution operation improves the prediction accuracy. After EEMD is combined with CNN-LSTM or CNN-ILSTM, the forecast precision is further improved because EEMD decomposes and reconstructs the original data.
The improvement of EEMD-CNN-ILSTM's forecast accuracy of crude oil futures price lies in as follows: (1) Decomposition of original data by EEMD. After EEMD decomposition, the low, middle, and high-frequency components of IMF and the residual sequence are input into CNN to extract hidden features, thus making up for the shortcoming of LSTM and ILSTM in extracting features. (2) ILSTM adds the calculation of C t−1 in the input gate and the forget gate to ensure the complete learning of the historical state and introduces the crucial hidden state information in the output gate so that the model can learn more fully from historical data. However, ILSTM has the limitation of longer training time than LSTM.

Conclusions
This paper presents a hybrid prediction model of crude oil futures price based on EEMD-CNN-ILSTM. Compared with EEMD-CNN-LSTM, CNN-ILSTM, CNN-LSTM, IL-STM, LSTM, MLP, and SVR, the prediction results are all the best in the overall evaluation. It is proved that the hybrid model integrates the characteristics of EEMD, CNN, and ILSTM, and effectively improves forecasting accuracy. Compared with the prediction results of ILSTM and LSTM, ILSTM has higher prediction accuracy. After introducing CNN and EEMD, the result is that the model combined with ILSTM is better than the model combined with LSTM. After EEMD decomposition and reconstruction, it can help CNN further extract hidden features from data. The hybrid model of EEMD-CNN-ILSTM for crude oil futures price prediction performs best in comprehensive evaluation indexes of MAE, RMSE, and R 2 . In summary, the strengths of this paper are as follows: (1) Compared with LSTM, ILSTM is an improvement of LSTM. ILSTM adds learning of cell state at the previous time in the input gate and forget gate and i t × tanh h t in the output gate to further extract important hidden state information, further eases the problems of "gradient disappearance" and "gradient explosion" of RNN and improve the forecasting accuracy.
(2) Fix the number of components after EEMD decomposition. After EEMD decomposition and reconstruction, the low, medium, and high-frequency components of IMF and the residual sequence are obtained, which ensures a fixed number of inputs, ensuring the increased availability of the model and reducing the overall prediction time. However, the hybrid model based on EEMD-CNN-ILSTM for crude oil futures price prediction has several limitations. First, the model exhibits high complexity, requiring significant computational resources and time. Additionally, crude oil futures prices are influenced by various external factors, such as global economic conditions, political events, natural disasters, and so on. These factors are not fully captured by the model, resulting in potential limitations in the accuracy of its predictions. In our future research, we plan to add the analysis of news text and use it as an influencing factor to improve prediction accuracy.
Author Contributions: Conceptualization, J.W.; methodology, J.W. and Z.X.; software, T.Z.; validation, T.L.; investigation, T.L. and T.Z.; writing-original draft preparation, T.Z. and T.L. writingreview and editing, J.W. and Z.X.; visualization, T.Z. and T.L.; supervision, Z.X. and J.W.; project administration, T.Z. and T.L.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data available on request due to restrictions privacy. The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.