Load Prediction Based on Hybrid Model of VMD‐mRMR‐BPNN‐LSSVM

State Key Laboratory of Eco‐hydraulics in Northwest Arid Region, Xi’an University of Technology, Xi’an 710048, China e Institute of Water Resources and Hydro‐Electric Engineering, Xi’an University of Technology, Xi’an 710048, China e Shaanxi Gas Storage Transportation and Comprehensive Utilization Engineering Research Center, Xi’an 710016, China e Shaanxi Gas Group New Energy Development Co., Ltd., Xi’an 710016, China


Introduction
Power load forecasting is the basis for power system to arrange power generation plan, economically dispatch the grid, and determine spare capacity, and a high forecasting accuracy plays an important role in the safe and reliable operation of the power grid [1]. With the continuous expansion of the power grid and the rapid development of science and technology, the grid system can obtain a large amount of data, thus providing data support for the research load forecasting system.
In recent years, with the rapid development of computer technology, arti cial intelligence algorithms have been introduced into the eld of load forecasting. At present, the main power load forecasting techniques are mainly divided into two categories: parametric model based methods and nonparametric model based methods [2]. Among them, the parametric model based methods mainly include time series model [3], regression analysis model [4], trend extrapolation model [5], and other methods. However, the nonlinearity, uncertainty and time-varying nature of electrical load cannot be completely expressed by mathematical formulas, therefore, such models are complicated and di cult to implement. While the nonparametric model method overcomes these weaknesses, and it can adapt to electrical load forecasting with nonlinearity, uncertainty and time-varying nature [6]. e methods based on nonparametric models mainly include wavelet analysis method [7,8], grey model method [9], support vector machine (SVM) [10] and artificial neural network method [11][12][13], etc. ese methods could consider the nonlinearity of the electrical load, and the law of the electrical load could be found through fitting and approximating of the original data. erefore, compared with the traditional method, such a model is superior and the prediction result is more accurate.
Among them, the wavelet analysis methods applying in the electrical load prediction, different wavelets are used to identify and classify loads of different natures and predict them separately, finally, the predicted sequences are integrated as the final load prediction results [14]. In literature [7,8], the wavelet transform is used to project each sequence component onto different scales, the data processing on different sub-load sequences is performed, and the models matching them are used for prediction, finally, through the wavelet reconstruction, we obtain the load prediction results. However, in the wavelet transform, once the wavelet base is selected, it cannot be replaced during the whole decomposition and reconstruction process. erefore, it is possible that the wavelet base is only optimal globally. In addition, what criteria are used to select the wavelet base in theoretically and practical applications are still a difficult point [15]. e grey model is a research method suitable for systems with small sample data and poor information. It can obtain useful information from the grey system and build a model based on this useful information to predict unknown data [16]. Literature [9] combined with the new initial conditions and rolling mechanism, designed a new optimized gray prediction model, which could accurately predict industrial electricity consumption and total electricity consumption. However, for the grey model, the greater the degree of data dispersion, the worse the prediction accuracy. And its differential equation exponential solution is more suitable for the load index with exponential growth trend. e SVM model uses a kernel function to replace the nonlinear mapping, with few controlled parameters, the algorithm is simple and easy to implement [17]. It can establish a decision function based on a small number of support vectors, effectively avoiding dimensionality disasters. In literature [10], the improved bird swarm algorithm (IBSA) is used to optimize the parameters of the LS-SVM, and then conduct the prediction using the optimized model. However, SVM is limited to small cluster samples. When observing too many samples, it is less efficient, and relatively difficult to find a suitable kernel function. Artificial neural networks have powerful nonlinear processing capabilities, parallel computing capabilities, selflearning and self-adaptive capabilities, and better memory ability and fault tolerance. When using neural network for electrical load forecasting, there is no need to special processing the data or model, the neural network can achieve better fitting effect by its self-learning ability and self-adaptive ability. erefore, neural networks have a wide range of applications in the field of power load forecasting. For example, literature [11] proposed a short term electrical load forecasting model based on the hybrid GA-PSO-BPNN algorithm. In literature [12], considering the influencing factors such as temperature, relative humidity, date type, and historical load, a new shortterm load prediction method based on kernel function extreme learning machine model is proposed. Literature [13] uses wavelet neural networks to predict electrical load.
However, the artificial neural network has the disadvantage of a local minuscule and determining difficulty the number of hidden layer units.
In addition, in order to make full use of the multifrequency and nonlinearity of power system load, some scholars use the first decomposition and then integration prediction method to predict the power system load. For example, the literature [18] uses empirical mode decomposition (EMD) to decompose the electrical load and then use support vector regression (SVR) to conduct the modeling prediction, in literature [19], the original load data is decomposed using the ensemble empirical mode decomposition (EEMD), and then predicted using the Elman neural network. Although EMD and EEMD can automatically decompose modal components based on data, it is convenient to study, however, the EMD and the improved EEMD that is adding the white noise suffer "endpoint effects" during the decomposition process where divergence phenomena occur at the edge of the data. As the iterations progress, the data sequence may grow severely distorted with modal aliasing and spurious components [20]. erefore, the variational mode decomposition method that can overcome the end-effect is applied in many fields. For example, the literature [21] uses EMD and VMD to decompose the price data respectively, then uses GRNN to predict each component and uses GRNN to integrate each component. e experimental results show that the prediction accuracy obtained by using the VMD method is higher. Literature [22] uses VMD to decompose the power load data and then predict the individual sub-components using the extreme learning machine (ELM), and then obtain the final power load prediction value by integrating the individual sub-components. Although these two literatures have improved the prediction accuracy compared with the single prediction model, they do not consider the information loss that may occur in the decomposition process and the influence of other factors on the data. As we all know, the load on the power system is also affected by factors such as holidays and temperature. erefore, some scholars have begun to consider the impact of these factors on the power load. For example, the literature [23] uses the autocorrelation function (ACF) to find the dependency relationship between one time and other times in the time series, then the model is established combining with the LS-SVM to predict the electrical load. However, this method only considers the relationship between the power load time series and does not further consider the influence of other influencing factors on the load. e literature [24] uses the mutual information method to consider the impact of influencing factors on power system load. However, mutual information will generate relatively greater redundancy, increase the model input, and reduce the run speed of the model [25].
Based on the discussions of the above methods, this paper firstly uses the VMD which is completely different from the EMD and EEMD theory to decompose the original loads and to improve the modal aliasing appearing in EMD and EEMD, a er obtaining IMFs with different fluctuation characteristics, all IMFs are divided into high-frequency component, intermediate-frequency component and low-frequency component, and then different prediction methods are used to predict each kinds of frequency components to improve the limitation of the single prediction model. In the prediction of high frequency components, this paper uses BPNN for prediction.
is is because high frequency components have strong nonlinearity, and a single hidden layer BPNN can usually approximate any nonlinear function with arbitrary precision. is characteristic makes BPNN very popular in predicting complex nonlinear systems [26]. Although BPNN has the problem of easily falling into the local minimum, and the RBF neural network can improve this problem, the number of hidden layer neurons in the RBF neural network is much higher than that of BPNN when the training samples increase, which makes the complexity of the RBF neural network increased, the structure is also too large, so the amount of calculation has also increased. For the convolutional neural network (CNN), which is the representative algorithm of deep learning algorithm, although sharing the convolution kernel, that is, has no pressure on high-dimensional data processing, and there is no need to manually select features, it requires large amount of samples to adjust parameters because it has more structure layer compared to the BPNN network. Moreover, in order to overcome the shortcomings of BPNN, many scholars have proposed di erent methods to optimize the initial connection weight value and thresholds of traditional BPNN. For example, the literature [11] uses genetic algorithms and particle swarm optimization (PSO) to optimize BPNN parameters. Both of the convergence speed and prediction accuracy of optimized model are superior to the traditional BP neural network model [26]. For intermediate and low frequency components, LS-SVM is used for prediction. LS-SVM is a kind of extension based on SVM, which is obtained by transferring standard SVM inequality constraints into equality constraints. e basic idea is to map the input vector to the high-dimensional feature space by nonlinear transformation. When constructing the optimal decision function in this space, the principle of structural risk minimization is applied, and the kernel function of the original space is used to replace the dot product operation in the high-dimensional space [27]. e LS-SVM has a simple structure and few parameters under controlled, which is bene cial for predicting intermediate and low frequency components with moderate uctuations. In addition, in order to improve the prediction accuracy of LS-SVM, some scholars use optimization algorithms to optimize the parameters of LS-SVM. For example, the literature [10] uses IBSA to optimize the parameters to improve the accuracy of load prediction. In the process of predicting each frequency component, in order to analyze the in uence of time, holiday and other in uencing factors on each component to compensate for the information that may be missing in the process of power load decomposition and eliminate the large redundancy caused by mutual information, and by this, the input quantity is reduced, this paper will use the maximum relevance minimum redundancy (mRMR) based on mutual information to select the set of characteristic factors that have the greatest in uence on each component from the feature set of in uential factors established in advance, and then use the set together with each component as the input of model. Finally, BP neural network is used to integrate the prediction results of each component to obtain the nal power system load prediction value.

VMD.
In order to resolve the endpoint e ect caused by EMD and EEMD and make full use of the multi-frequency and volatility of power system load, this paper uses VMD to decompose the power system load data. VMD was proposed by Dragomiretskiy and Zosso in 2014 [28], the essence is a set of adaptive Wiener lter banks, which could use nonrecursive mode decomposition to estimate the modalities of di erent center frequencies simultaneously. e decomposition process of VMD algorithm is the process of solving the variational problem. e algorithm can be divided into the construction and seeking solution of the variational problem. It mainly involves three important concepts: classical Wiener ltering, Hilbert transform, and frequency mixing [29].

e Construction of Variational Problems.
Assuming that each modality is a nite bandwidth with a center frequency, the variational problem can be described as seeking to preset modal functions ( ) under the condition that makes the sum of the estimated bandwidths of each modality smallest, with the constraint condition being the sum of each modal is the original input signal and the speci c construction steps are as follows: (1) rough the Hilbert transform, the analytical signal of each modal function ( ) is obtained, in order to obtain its unilateral frequency spectrum: (2) Adding an estimated center frequency − to the parsed signals of each mode, so that the spectrum of each modal can be modulated to the corresponding base frequency band: (3) Calculate the square norm L2 of the above-mentioned demodulated signal gradient, and estimate the bandwidth of each modal signal. e variational problem is expressed as follows: is the central frequency set; is the partial derivative of the function for time ; ( ) is the unit pulse function; is the imaginary unit; * indicates the convolution.

Seeking Solution of Variational
Problem. An augmented Lagrangian function is introduced to transform the above optimizati on problem into a problem of obtaining the (1) ( ) + * ( ).
(4) Double promotion for all ≥ 0: where represents the noise tolerance coe cient, which is set to zero to ensure e ective denoising.

mRMR.
e mRMR is a method of using mutual information to measure the dependencies between two variables, considering not only the relevant information between the feature and the target variable, but also obtaining the redundant information between the features.

e Solution of Maximum
Relevance. Based on the mRMR method, the maximum relevance criterion can be expressed as the average of the mutual information between the feature and the target variable as [31]: here: is the characteristic, indicating the in uencing factors of each component; is the component; is the feature set, which is the collection of features ; | | is the number of features in feature set ; is the mean of the mutual information between each feature in feature set and the target variable ; , is the mutual information between the feature and the target variable , and its expression is as follows [31]: where: , , and , are the edge probability density function and the joint probability density function of the random variable , , respectively. e greater the correlation between the variable and the variable , the larger the value , of the mutual information; when the two variables are independent of each other, the mutual information value is zero, meaning that there is no interdependence between the two variables.

e Solution Process of Minimum Redundancy.
Since the features selected by the maximum relevance criterion may have some redundancy, while the input of the redundant features not only increases the number of input features, but also reduces the accuracy of the prediction model. erefore, in the feature selection process, the redundancy degree between features needs to be calculated. e overlapping information between any two feature variables, that is, redundant information.
minimum value of via alternating direction multiplier method where represents the bandwidth parameter; ( ) represents the Lagrange multiplier. e speci c solution steps are as follows [30].
(1) Find the minimum value of the function : (2) Calculate the minimum value of the function : (3) Apply the following iteration constraint conditions: e implementation process of the VMD method is as follows [30].

Decompose the Original Load Data.
In this paper, EMD, EEMD, and VMD will be used to decompose the data in this paper, then compare and analyze them. e original data is shown in Figure 2, and the exploded view of each model is shown in Figure 3.
In this paper, there are 96 data points per day. It can be seen from Figure 2 that the local power load value began to decline a er entering February, and then in mid-February, it increased again and the uctuation tended to be stable. In addition, by observing the data, it can be found that the power system load data is with a cycle of 96, that is, a similar uctuation occurs every 96 points, and it can be seen that the daily load change situation are very similar. For a more intuitive view of the data, Table 1 describes the basic information of the load data.
It can be clearly seen from Table 1 that the maximum load value of this group of load value is 5355.7 MW, and the minimum value is 1788.3 MW, and the two di er by 3567.4 MW, it can be seen that the volatility of the load is very strong. In addition, the average value of the load value of the group is 3305.3 MW, and the standard deviation is 794.6 MW, it can be seen that the measures of dispersion of the data is large, which further explains the strong volatility and nonlinearity of the load data.
As can be seen from Figure 3, the EMD decomposes the load data into 10 modal components and 1 remainder. e uctuations of the rst four components (IMF1 -IMF4) are relatively strong, and starting from the component IMF5, the uctuation begins to moderate, and the uctuations of the IMF8-IMF10 and the remainder are all moderate. e EEMD decomposes the load data into 12 modal components and 1 remainder. e uctuations of IMF1-IMF5 are relatively strong, and the uctuation of components from IMF6-IMF8 begins to ease, and the uctuations of IMF9-IMF12 and the remainder are all moderate. Since EMD and EEMD have the same decomposition abort condition, their remainders are monotonic functions.
The VMD decomposes the load data into 10 modal components. Since there are different decomposition termination conditions comparing with EMD and EEMD, e minimum redundancy requires that the dependency relationship between each feature x be minimized, it can be expressed by the following Equation [31]: en mRMR can be expressed by Equations (11) and (13) as: In general, the features can be searched using the incremental search method [29]. Assuming that the − 1 features that have been selected from the feature set together constitute the feature set −1 , the selection of the n-th feature formula from the set − −1 according to the incremental search method, the formula is as follows:

Various Frequency Component Prediction
Model. e BPNN has the ability to process di erent information in parallel, excellent nonlinear fitting ability, self-learning ability, and very e ective ability to deal with complex problems. erefore, BPNN is used to predict high frequency components.
e least squares support vector machine (LS-SVM) model has a simple structure, fast learning speed, good generalization, and good regression prediction performance. erefore, the LS-SVM is used to predict the intermediate and low frequency components. e method of BPNN and LS-SVM are very mature and have been widely used in power system load forecasting [11,23,25], and will not be described here.

Prediction Model.
is paper uses the prediction model of "decomposition-feature analysis-multi-frequency predictionintegration". The model first uses VMD to decompose the original power system load data to obtain the modal components of di erent uctuation characteristics, and divides these components into high frequency components, intermediate frequency components and low frequency components according to their uctuation characteristics. en, through mRMR, the factor set that have the greatest in uence on each component is selected, and the set and the corresponding modal component are simultaneously used as model inputs. Finally, BPNN is used to predict the high frequency components, LS-SVM is used to predict the intermediate and low frequency components, and BPNN is also used to integrate the predicted values of each frequency component to obtain the nal power system load prediction value. e model ow chart is shown in Figure 1.
In order to evaluate the performance of the model, three evaluation indicators: mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) were used. e formula is as follows: , .    9 Complexity uctuation characteristics. In addition, although EEMD is theoretically improved on the basis of EMD, the empirical decomposition mode makes it inevitable that illusive components and modal aliasing occur. However, VMD improves this phenomenon well, enabling the power system load data to be more fully decomposed and to obtain more physically meaningful components, to make a good foundation of the final component is not a monotonic function, it can be seen that VMD has got rid of the empirical decomposition method. Similarly, it also decomposes the load data into components with higher fluctuation (IMF1-IMF5), moderate components (IMF6-IMF9), and more moderate components (IMF10). For further analysis, it will be further analyzed by spectrograms. The spectrum decomposition diagrams of EMD, EEMD, and VMD are shown in Figure 4.
It can be seen from Figure 4 that EMD and EEMD inevitably arise modal aliasing in the decomposition process, that is, one modal component is decomposed into multiple decomposition results. For example, the frequency band of IMF4 in Figure 4(a) overlaps with that of IMF3 and IMF5; the frequency band of IMF3 and IMF4 in Figure 4(b) overlaps. At the meanwhile, the above-mentioned overlap phenomenon does not occur in the Figure 4(c), that is, the VMD decomposition method.
It can be seen that the VMD improves the modal aliasing well during the decomposition process. Based on Figures 3  and 4, we can conclude that EMD, EEMD, and VMD can e ectively decompose the power system load with multi-frequency and volatility and obtain the components of di erent search is as shown in Figure 5. e sorting names of each feature in Figure 5, that is, the order of the IMF6 candidate feature sets are as shown in Table 3.
Combined with Table 3 and Figure 5, it can be seen that the characteristics of mRMR values exceeding 1 is By observing these characteristics, it can be found that except that the third is the month, all the other are load history values, and these the historical load value is concentrated on the load data from 2 hours to 4 hours before moment , which shows that IMF6 is greatly a ected by historical load data 2 hours to 4 hours before.
As described above, the mRMR values of the candidate feature set are sorted in descending order and input into the prediction model one by one, and then the MAPE is used to calculate the error, and the set composed of the number of features with the smallest error is taken as the input feature set . e relationship between IMF6 input dimension and error is shown in Figure 6.
It can be seen from Figure 6 that as the number of input features increases, the prediction error of IMF6 is constantly uctuating, and a small error occurs when the number of inputs is 11, 19, and 24, but in comparison, when the input number of features is 11, the error is the smallest, so the number of features of the input feature of the IMF 6 is 11. Its input features are shown in Table 4.
As can be seen by observing Table 4, the input feature of IMF6 are included in the 15 components of the previously analyzed mRMR value exceeding one. e features of mRMR values who is in descending order are input one by one into the prediction model to calculate the error and obtain the number of features with minimum error, which greatly reduces the matrix dimension and increases the running speed.
e input feature set of other components is obtained by using the method that obtains the IMF6 component input feature set. e relationship between the input dimension of each component and the error is shown in Figure 7.
It can be seen from Figure 7 that the relationship between the number of input features and the error can be roughly divided into the following categories: the error rstly decrease and then increase, for example, Figures 7(a) and 7(d), the error rst decreases, and a er the input features quantity reaches a certain value, it starts to increase again; the error is always increasing, for example, Figures 7(b) and 7(c), the error reaches a minimum at the beginning or a er a small uctuation, and then increases continuously with no trend of decreasing or the trend is not obvious; the error is always decreasing, for example, Figures 7(e) and 7(g), their nal input characteristic quantity is determined according to the minimum error; the change trend of error is uctuating, that is, as the number of features increases, the error curve uctuates continuously, as shown in  Table 5.
For convenience of representation in Table 5, the feature name order is not expressed in descending order of mRMR values like the IMF6. Table 5 shows that except for IMF8-IMF10, other components have historical load conducting load prediction with fully utilizing the volatility and multi-frequency of the power system load data.

Feature Information Selection.
e magnitude of the power system load value is a ected by factors such as season, time, and holidays. erefore, this paper will establish a set of in uencing factor features and select the input feature set suitable for each component that is a er VMD decomposition. e in uencing factor set built and the representation method are shown in Table 2.
In Table 2, represents time. In this paper, there are 96 data points in one day, that is, four data points in one hour. Each day from 0.00 to 23.45 indicates the change of time; represents the month, where 1 represents January and 2 represents 2 Month and so on; stands for the week, where Monday is 1, Tuesday is 2, and so on; indicates whether it is a holiday (1 means that it is holiday); indicates season, spring, summer, autumn and winter respectively are represented by 1, 2, 3, 4; represents the load value before moment , for example, −15 represents the value before 15 min of moment , and so on. A er establishing the set of in uencing factor features, the incremental search method is used to nd the feature satisfying Eq. (15) to form candidate feature set , and then calculate the mRMR value of each feature in , and then sort the mRMR values in descending order and input mRMR one by one into prediction mode to calculate the MAPE error, and the calculation formula is as shown in the Eq. (18). Finally, the number of features with the smallest error is taken as the nal input feature set .
is article uses IMF6 as an example. e mRMR value of the candidate feature set obtained a er the incremental    Table 6. Comparing with Table 5 and it can be seen from Table 6, for the correlation coe cient method, the number of features of the IMF1, IMF4, and IMF5 is close to that of the mRMR, and the number of features of other components is greater than that of the mRMR. Similarly, the number of features selected by MI, except for that of IMF4 and IMF5, is greater than mRMR. Therefore, these two methods increase the matrix dimension and reduce the speed of the operation. In addition, comparing with Table 5, it can be found that except for IMF2, IMF4, and IMF5, other components will be a ected by features other than historical load values, but a er correlation coe cient selection and MI selection, the e ects of other features are not obvious. is is due to the correlation coe cient method and mutual information only considering the correlation between variables but neglecting the characteristics and account for a large proportion of input characteristics. It can be seen that historical load values have a greater impact on IMF1-IMF7, especially IMF1, IMF5, and  of the training set increases, the prediction accuracy of the model also increases. e error in the back starting to increase is because the training set data is too much, and the model has over-tting in the training process, which makes the generalization ability of the model worse, the prediction error increases again. In summary, in this paper, among a total of 11,520 load data points from January 2, 2016 to April 30, 2016, 10,752 load data points from January 2, 2016 to April 22, 2016 will be used as intra-sample data, and then use the selected model to conduct the prediction on a total of 768 load data points from April 23, 2016 to April 30, 2016. Besides, before adding the in uencing factors, BPNN and LS-SVM are used to directly predict the various frequency components decomposed by VMD in 4.1, and use MAPE to count the prediction error. e results are shown in Table 8.
It can be seen from Table 8 that for high frequency components IMF1-IMF5, the prediction accuracy of BPNN is signi cantly higher than that of LS-SVM. For the intermediate frequency and low frequency components, at IMF6, the prediction accuracy of them are very close, at other IMF, for the prediction accuracy of intermediate and low frequency components, LS-SVM is signi cantly higher than BPNN, especially a er the IMF8, it is becoming more and more obvious. In summary, in comparison, for the highfrequency component, the prediction performance of BPNN is better than that of the LS-SVM, while for the high frequency and intermediate frequency components, the prediction performance of the LS-SVM is better than that of the BPNN. redundancy between the two variables, while mRMR not only considers the correlation between the two variables but also considers their redundancy, so the selected in uencing features are more diversifying than the other two methods. In summary, using mRMR for feature selection reduces the matrix dimension and improves the speed of operation and better re ects the in uence of various feature factors on each component.

Various Frequency Component Prediction.
According to the uctuation characteristics of each frequency component in Figure 3(c), the rst ve components with small uctuation periods (IMF1-IMF5) are classified as high frequency components, the IMF10 with longest period is classi ed as low frequency data, and the remaining components (IMF6-IMF9) are as intermediate frequency data. Regarding the division of training sets and test sets for a total of 11520 load data from January 2, 2016 to April 30, 2016, this paper divides them using multiple proportions, and on the premise of considering the in uencing factors, uses the data of training set to train the model and conduct the prediction on the test set, and evaluates the nal prediction results by MAPE. e comparison results are shown in Table 7.
It can be seen from Table 7 that as the proportion of the training set increases and the proportion of the test set decreases, the prediction error results rst decrease and then increase. is is because the training set data is less at the beginning, the model training accuracy is not enough, thus resulting in a large prediction error. en, as the proportion   Table 10.  Table 10. e prediction results are shown in Figure 9.
As can be seen from Figures 9 and 8, the uctuation of intermediate frequency component (IMF6-IMF9) and the low frequency component (IMF10) are much more moderate than the high frequency component uctuation. erefore, as its uctuation gradually eases, the prediction results are getting better and better. It can be seen that using the LS-SVM with fast learning speed and good generalization to predict the intermediate and low frequency components that the uctuation is moderate have smaller error.

Combination of Each Frequency Component Result.
In this paper, BPNN is used to integrate high, intermediate, low frequency components, and compare the integrated results with that of using LS-SVM, in the process of integration, with the real value of each component in training data set as input, the real load value of training data set as the output to train it, and nally use the predicted value of each component in test data set as input to obtain the nal power system load prediction value. e combination result is shown in Figure 10. e upper one of Figure 10 is the power load prediction result, and the lower one is the error between the actual value and the predicted value, where the green point is the distribution of the predicted value. As can be seen by observin Figure 10, most of the predicted points can be distributed around the actual value except that few points deviate from the actual values. However, in Figure 10, predicted value distribution point in (a) is signi cantly more concentrated than (b), especially the head and the end, which indicates that the integration e ect of BPNN is better than LS-SVM.
A er completing the above discussion, the feature set of various frequency component, selected in 4.2, is taken into consideration in the prediction process.

e Prediction of High Frequency Component.
According to the above, the high frequency component is predicted by BPNN, and the predicted result is shown in Figure 8. In the modeling process, the number of iterations is set to 1000, the learning speed rate is 0.1, and the expected error is 0.0004. e structure of the neural network is 3-input and 1-output. e three inputs in this paper refer that the three original points of certain component and the features corresponding to the three original points are together as inputs. In addition, in this paper, we use the method of literature [32], which combines the "trial and error method" with the "ten-fold cross-validation method" to determine the number of hidden layer nodes of the BP neural network. at is, in the vicinity of the number of hidden layer nodes calculated by formula (19), the BP neural network is constructed respectively, and then the mean square error (MSE) of each group of data is obtained by using the "ten-fold cross-validation method", nally take the number of nodes, that has the smallest average value of the MSE, as the number of hidden layer nodes of the BP neural network. Where Equation (19) is as follows [32].
where is the number of hidden layer nodes, is the number of input layer nodes, is the number of output layer nodes. According to the above method, the number of hidden layer nodes of each high frequency component is as shown in Table 9.
A er the above prediction model is obtained by training, each high frequency component is predicted separately. e prediction results are shown in Figure 8.
It can be seen from Figure 8 that the high frequency component has strong uctuation and high frequency, especially the IMF1, IMF2, and IMF3 have the strongest uctuation, so the error in the prediction map is larger than other components (IMF4, IMF5). As the uctuation of IMF4 and IMF5 diminishes, the error will decrease. But in general, BPNN's highly self-learning and adaptive capabilities are suitable for predicting high-frequency components with strong uctuation and short periods.  Figure 11.
It can be clearly seen from Figure 11 that the prediction points of the single prediction model deviate farther comparing with multi-frequency combination prediction models (EMD, EEMD, and VMD), while in the multi-frequency combined prediction model, it can be seen that the improved EEMD based on EMD is better than EMD, and the prediction results of VMD is the best, the principle of VMD is completely di erent from the other two. is shows that the prediction

Comparison of Multiple Prediction Model Results.
In order to test the prediction e ect of the proposed model of this paper and further analysis of the prediction results of Figure 10, we compared the proposed model against the single model (ARMA, BPNN, LS-SVM) and decomposition models (EMD, EEMD, VMD) to validate its e ectiveness. Among them, the EMD, EEMD and VMD decomposition models all use the decomposition method to decompose the original load data, then divide the sub-component into high, intermediate and low frequency components, and use BPNN to predict high-frequency components and use  the prediction values of each components are used as inputs to predict the final power system load prediction value. Finally, we use the MAE, RMSE, and MAPE to compare the prediction results of prediction model in this paper with that of single prediction model (BPNN, ARMA, LS-SVM) and multifrequency combined prediction model of (EMD, VMD, EEMD) and the models that VMD multi-frequency combined prediction model respectively combined with correlation coefficient and MI. By comparison, it is found that the prediction accuracy of multi-frequency combined prediction model is higher than that of single prediction model, and the prediction accuracy of model in this paper is higher than that of other multi-frequency combination prediction model. In the prediction process, it is found that when the mRMR values in descending sort are input into the prediction model to calculate the MAPE value, there is a case where the MAPE fluctuates as the number of input features increases. For example, in Figure 7 (i), it can be seen that when the number of features is 2 and 3, the MAPE values are very close, while in a sense, there is not much difference in the input dimension when the number of input features is 2 and 3. en how to deal with this situation better in the feature selection process and whether there is a more reasonable feature selection method to analyze the relationship between features and power load will become the future research object of this paper.
Data Availability e power load data used to support the findings of this study have not been made available because policy limitation of China's Power Grid.

Conflicts of Interest
e author declare that they have no conflicts of interest. accuracy of the multi-frequency combined prediction model is greater than that of the single prediction model, while the VMD multi-frequency combination prediction model that resolves the problem of modal aliasing and illusive components has the highest prediction accuracy. e results of the above-mentioned evaluation indexes which we used to ensure a comprehensive comparison are shown in Table 11.
rough the comparison of model results of Figure 10 and Figure 11 (the comparison as shown in Table 11) can clearly see that the error of the single prediction model is larger than that of the multi-frequency combination prediction model. For the prediction accuracy of different multi-frequency combination prediction models, the VMD model is better than the EEMD model and the EMD model. In addition, a er mRMR is combined with VMD, the prediction accuracy of mRMR-VMD is higher than the VMD model, and for models considering influencing factors, the effect of using BPNN integration is better than that of using LS-SVM. For the VMD multi-frequency combined prediction model using different feature selection methods, the prediction accuracy of the VMD multi-frequency combined prediction model using mRMR is better than the VMD multi-frequency combined prediction model using the correlation coefficient method and MI. In summary, the multi-frequency combined prediction model based on mRMR feature selection and VMD proposed in this paper is superior to other models.

Conclusion
is paper proposes a power system load forecasting model based on VMD and feature selection. By comparing the decomposition results of three decomposition methods of EMD, EEMD, and VMD, it can be seen that VMD can resolve the problem of modal aliasing and illusive components well.
rough the example analysis, it can be seen that the prediction accuracy is improved a er resolving the problem of modal aliasing and illusive components.
In this paper, a er using VMD to decompose the original loads, all IMFs are divided into high frequency component, intermediate frequency component and low frequency component according to the fluctuation characteristics of each frequency component. In addition, when using mRMR to pick features related to each component, the candidate feature set is first established by the incremental search method, then the mRMR values of each feature are calculated and arranged in descending order. Finally, the arranged features are input into the prediction model one by one to calculate MAPE and choose the minimum MAPE as the number of input features of the component. In addition, this paper also compares the three feature selection methods containing mRMR, correlation coefficient method and MI, the results show that mRMR can reduce the matrix dimension and better select the influencing feature of each component.
A er completing the above work, each component is input into the model together with the corresponding feature matrix for prediction. In the prediction process, BPNN is used to predict the high frequency components and LS-SVM predicts