A novel ensemble learning-based grey model for electricity supply forecasting in China

Abstract: Electricity consumption is one of the most important indicators reflecting the industrialization of a country. Supply of electricity power plays an import role in guaranteeing the running of a country. However, with complex circumstances, it is often difficult to make accurate forecasting with limited reliable data sets. In order to take most advantages of the existing grey system model, the ensemble learning is adopted to provide a new stratagy of building forecasting models for electricity supply of China. The nonhomogeneous grey model with different types of accumulation is firstly fitted with multiple setting of acculumation degrees. Then the majority voting is used to select and combine the most accurate and stable models validated by the grid search cross validation. Two numerical validation cases are taken to validate the proposed method in comparison with other well-known models. Results of the real-world case study of forecasting the electricity supply of China indicate that the proposed model outperforms the other 15 exisiting grey models, which illustrates the proposed model can make much more accurate and stable forecasting in such real-world applications.


Introduction
Energy is the foundation of a country's economic development. Electricity is an essential energy source, and the electric energy policy will guide the development of a country to a certain extent [1]. Sustainable supply of electricity is one of the biggest challenges. Effective forecasting of electric energy supply has therefore become a prerequisite for formulating energy policies. If the electricity supply is underestimated, it will not only fail to meet the regular power demand of the country, but also pose a threat to the security of the electricity system. On the contrast, if the electricity supply is overestimated, the national economy will suffer severe losses due to the difficulty of large-scale storage of electric energy. Terefore, it is important to have a high-precision system of electricity supply forecasting. Among numerous forecasting models, the complicated calculation process of support vector machines [2] and the demand of artificial neural networks [3] for data magnitude sets cannot always obtain the ideal forecast accuracy. Because the power supply is faced with the problems of small sample size and incomplete information, and grey system proposed by Deng is a tool to deal with them, so the forecasting method of grey system is suitable for solving such problems [4]. The grey system model has been applied in various fields in recent few years. For example, Zeng et al. applied a multivariate grey model to Chinas grain production [5]. Ding et al. applied the improved Simpson grey model to the prediction of electric vehicles through an adaptive method of generating dynamic weighted sequences [6]. In terms of environmental pollution, in order to predict carbon dioxide emissions in the BRICS countries, Wu et al. established a new conformable fractional-order nonhomogeneous grey model [7]. Hu et al. proposed a novel time-delayed fractional grey model, which took into account the time delay effect and applied it to the natural gas consumption forecast of the manufacturing industry in China [8].
Driven by the research of many scholars, the grey system has been developed vigorously, which is mainly reflected in the three aspects of accumulative order, background value optimization and expansion form. Wu et al. designed a grey model of fractional accumulation to eliminate the randomness of the original data [9]. Zhou et al. used the exponential weighted average method to define the cumulative generation of new information priority with parameters [10]. Ma et al. designed a conformable fractional grey model [12]. As more accumulation methods are proposed, the grey model can achieve stronger and more accurate predictions. Zeng et al. developed a structurally compatible multivariate grey model to improve compatibility [11]. Ma et al. applied the Simpson formula to the model and proposed a background value optimization model [13]. Wei et al. also established a new method of optimizing background value by using the integral median theorem [14].
With the discrete grey model proposed by Xie et al., there are more ways to deal with various data types [15]. In order to fit a time series that approximates the nonhomogeneous exponential law, Cui et al. designed NGM to solve the limitations of the traditional grey model [16]. Chen et al. combined the Bernoulli equation to improve the prediction accuracy by controlling the power exponent n to adjust the curvature of the curve [17]. To overcome the defects of the traditional Verhulst model's misalignment substitution of parameters and the unreasonable selection of initial values, Zeng et al.
proposed a new prediction model for tight gas production [18]. These methods of improving the grey model greatly expand its application range and enrich the grey system theory.
In response to the formulation of energy policies in developing countries, Li et al. applied the adaptive grey model to electricity consumption forecasting in the Asia-Pacific region and obtained effective results [1]. Xu et al. designed a grey model with an optimal time response function, used particle swarm optimization to optimize nonlinear parameters, and verified the reliability of the model with an example of China's electricity consumption [19]. Wang et al. focused on the problem of forecast stability and developed a hybrid forecast based on an improved grey forecasting model based on a multi-objective ant optimization algorithm, which dynamically selects the best input training set and takes the annual electricity consumption in various regions of China as the research object for practical evaluation [20]. Ding et al. used the particle swarm optimization algorithm to optimize the new initial conditions and combined with the rolling prediction mechanism to obtain the future trend of China's total electricity consumption and industrial electricity consumption [21]. Liu et al. predicted the electricity consumption in China and India with a new grey multinomial model of time electricity term score [22]. Yu et al. developed a highly flexible grey model with time-delayed power-driven for photovoltaic power generation [23]. The above research shows that although the single improved model performs well in electric energy, the traditional single-model prediction method also has certain limitations in different studies.
As a branch of machine learning, ensemble learning aims to get better results than the single estimator [24]. The integration schemes have many types [25], roughly divided into classification and regression. Yu et al. used a simple addition strategy set to output the prediction results [26]. The dynamic ensemble model proposed by Chen applied to the prediction of wind speed showed strong competitiveness [27]. Dong et al. first built a decision tree to mine energy consumption patterns and used ensemble learning methods to establish building energy consumption prediction models for different modes [28]. Lin et al. designed an air quality monitoring tool by using multiple linear regression technology to integrate multiple deep learning prediction models [29]. At the level of classification, Arangarajan uses voting methods combined with discrete wavelet transform analysis to detect and classify different power quality disturbances [30]. The ensemble idea can not only effectively improve the prediction accuracy, but also has a lot of room for development in the integration of multiple models.
According to the literature study above, although there are many cases of combination of grey system and machine learning in recent years [31], ensemble learning has never been used for grey system models at present. With significant performance of ensemble learning to improve the existing machine learning models, it can also be expected to enhance the performance of the grey system models. Therefore, the novel ensemble strategy is propsed to integrate the different form of grey system models in this work. Then the ensemble learning-based grey model is used of the application of electricity supply in China and the effectiveness of the new model has been verified.
The remainder of this paper is organized as follows. Section 2 describes the accumulation methods and diverse forms of the grey model. Section 3 introduce the construction of the novel ensemble theory. In Section 4, the verification of numerical cases is given. The electricity supply of China is discussed in Section 5. Furthermore, the conclusions are shown in Section 6.

Method of grey model
The main methods used in this work will be presented in this section. Section 2.1 presents the different processing methods of the raw data in the modelling process. Section 2.2 presents the establishment of the nonhomogeneous grey model and discusses the form of different models.

The different accumulation methods
The method of processing raw data in the grey model determines the accuracy of the forecast. For a set of raw time series data Y (0) = y (0) (1), y (0) (2), . . . , y (0) (n) , the corresponding accumulation generation sequence is defined as Y (ℵ) = y (ℵ) (1), y (ℵ) (2), . . . , y (ℵ) (n) , ℵ is the accumulative generator, the following three accumulation methods are used as the research direction of this paper.

The base estimator
In this work, the nonhomogeneous grey model (NGM) will be selected as the basic estimator for ensemble learning because it has a simple form but is more flexible than the classic grey model. The NGM is based on multiple accumulations, which makes the form of the model more abundant and can fully reflect the various characteristics of the data. Therefore, in the process of ensemble learning, it can comprehensively incarnate the essence of the system to improve forecasting performance. Other common grey models are also given in the form of the NGM conversion.

Relationship between the nonhomogeneous grey model to other existing grey models
• Denote βt + γ as W, when the grey action quantity W: the NGM degenerates into GM (1,1), the continuous-time response function of GM is: (2.14) • Note the equationŷ as the discrete form of GM, the recursive function of the discrete grey model (DGM) is: • In the same way, the discrete form of NGM is: 17) and the recursive function of the nonhomogeneous discrete grey model (NDGM) is: • When the grey action quantity W: where τ is the nonlinear parameter, and the continuous-time response function of the nonlinear grey Bernoulli model (NGBM) is: when τ = 2, the above model is Verhulst grey model.
The basic form and solution of models are shown in the Table 1. To sum up the above contents, it can be seen the NGM has the better generality because the NGM could degenerate into various basic models within such higher generality, it is used as the base estimator of ensemble learning could give full play to the advantages of several grey models.

The theory of ensemble grey model
Ensemble learning is a commonly used machine learning method that uses specific strategies to combine multiple learners to complete the learning task. Based on the conventional ensemble method, a novel ensemble strategy is constructed to improve prediction performance. The new strategy will simplify the ensemble method and apply it to the grey system to obtain forecasting results by combining several basic models. This section introduces the necessary elements of the conventional ensemble model in the first part, then gives an improved ensemble strategy, discusses the advantages of the new method in the second part, and ultimately introduces the construction of the ensemble grey model in this paper. Grey model Nonhomogeneous grey model Nonhomogeneous discrete grey modelŷ (ℵ) (k Nonlinear grey Bernoulli model

Conventional ensemble learning strategy
In the ensemble learning theory, the most significant operation is divided into the construction of the base model and the approach of ensemble. This section will present the ensemble learning from these two stages. In the first stage of the ensemble operation, a single data set is used by different base models to obtain a trained base model. In the second stage, the forecasting results of each model are combined by the averaging method to improve the forecasting accuracy.

Construction of the base model
The raw data sequence is assumed as Y (0) = y (0) (1), y (0) (2), . . . , y (0) (n) , the base model is denoted by e , and the parametric grid is S. Divide the data set Y (0) into two parts: the modelling set Y M to get the trained model and the forecasting set Y F to compare the forecast performance of the model, which: The parameter χ is selected from the parameter grid S of model e as a single base model e (χ = χ ). By fitting the data on Y M , the ensemble base modelˆ e (χ = χ ) can be obtained. The variety of different base model is obtained by the sampling times of χ in the parameter grid S.

Ensemble approach
This paper adopts a simple averaging method to combine the forecasting results of base models. After the trained base models are obtained, the set of base models is recorded as E and the results on the forecasting set Y F is recorded asŶ F = ŷ (0) (m + 1),ŷ (0) (m + 2), . . . ,ŷ (0) (m + ℘) . Then the final ensemble forecasting results can be expressed by the following mathematical formula: where card (E) represents the cardinality of E,ŷ (0) e (k) is the forecasting results of a single base model andŷ H (k) is the forecasting results of ensemble learning.

Improved ensemble strategy
In the above-mentioned conventional ensemble operation, as the average prediction result is selected as the ensemble method, if there is a model with poor generalization ability in the input base model, it will pull down the performance of other models on this data set. Furthermore, if the sampling times in the parameter grid are plethoric, the number of base models that need to be input in is exceeding numerous. Therefore, we improved the ensemble strategy from the structure: first, the different parameter values χ is selected by the sample of the parameter grid S with equal intervals s. Then the forecasting model with optimal parameters χ * is obtained by setting the objective function and conditions of optimization. This section focuses on the search for the optimal basic model.
For the models with nonlinear parameters (such as r − order, λ − order or nonlinear parameter τ), the modelling set Y M continue to be divided into two data blocks: the training set Y train and the validation set Y valid , which: Y train is used to estimate model parameters, Y valid is used for validating the performance of the model out of the training sample. The objecive function is formulated as: To obtain a relatively superior parameter value in the interval, the parameter values are sampled with equal intervals s to calculate the forecasting results of the model. According to the Section 2, we get the parameter values that satisfying the optimal conditions Eq (3.4) in the parameter interval S as the approximate optimal parameters χ * of the model, and take the trained modelˆ e (χ = χ * ) as the parametric base model of the ensemble model. The process of parameter searching is presented by Algorethm 1.

The ensemble of the nonhomogeneous grey model with a different accumulative approach
In this section, the ensemble learning-based grey model (ELGM) will be presented. The three accumulation forms of the nonhomogeneous grey model is used as the base models, i.e. .
According to Section 2.1, when e = NGM, we have n = m; when e = FNGM, NIPNGM, we have n = ς. And: When e = NGM, directly calculate the value on Y F and get the foracasting resultŷ (0) e through Eq (2.2): When e = FNGM, NIPNGM, solve the optimized parameter ℵ * (ℵ = r, λ): The optimal parameter ℵ * is used to refit the training set to caculate the value on the Y F . Then the forecasting resultŷ (0) e is obtained.
The forecasting resultsŷ (0) e of E is combined to get the ultimate resultsŷ H (k) of ELGM by averaging method:ŷ And the process of building the ELGM can be represented by Figure 1.

Model validation
This section gives several numerical examples to verify the different accumulation forms of NGM under the new ensemble strategy. In each cases, the ELGM we proposed is compared with the different models in different accumulation forms presented in Section 2.2.2. In addition, the first section briefly presents the model evaluation criteria used in this work and the last section gives a summary of numerical examples.

Evaluation criteria for model prediction performance
To compare the generalization ability between the ELGM and other models, the prediction performance of the model is quantified by MAPEPR (Mean absolute percentage error for the prior-sample period), MAPEPO (Mean absolute percentage error for the post-sample period) and MAPE (Mean absolute percentage error) [33,34]. The specific mathematical expressions are as follows:

Numerical cases
Numerical case 1: (The renewable energy consumption of the Czech Republic) We consider a sample for establishing a grey model provided in the literature [35]. The raw data is divided by the parameter values given in the Table 2. And Figure 2 shows the comparison of the forecasting results of the five models with the smallest MAPE value. The MAPEPR, MAPEPO, and MAPE obtained from the ELGM are 5.93%, 2.82%, and 5.37%, respectively, which are better than the other comparison models.
The novel model has the smallest evaluation criterion value in the modelling and forecasting stages, which shows that the ELGM is better than other competing models in this case.  Numerical case 2: (The biodiesel production of the United States) The data on biodiesel energy in the literature [36] is used to verify the grey model again. The data is divided as defined in the Table 2. Similar to numerical case 1, the forecasting results of the five models with the smallest MAPE value are plotted in the Figure 3. From the bottom of the figure, it is shown that the new model has not only achieved good results in the modelling and forecasting stages but also has a MAPEPO value of 0.73% in the forecasting stage, which is much better than other comparable models. It indicates that the ELGM could produce better predictive power than the base model under certain circumstances.

Summary of the numerical cases
The performance of the ELGM in this section has been evaluated through two real numerical cases. It is worth noticing that these data not only contain monotonous changes in the degree of growth but also contain the characteristics of gradual fluctuations over time. The ELGM can also obtain the relatively best forecasting results among many models. However, there are also significant differences in the results obtained when dealing with changing trends over time. In numerical case 1, except for the ELGM, the NIPNGBM with the optimal parameters can also obtain similar prediction performance. But when we compared the forecasting results of the base model with the ELGM, we can conclude that the application of the novel ensemble strategy is, to a certain extent, optimize the forecasting results so that the prediction performance is further improved. In numerical case 2, when the ELGM faced disturbance data, although the overall prediction accuracy is not as good as the application of monotonic trend data, it can even achieve ten times the prediction performance of other models.
In a nutshell, on the cases verified in this work, the ELGM has the lowest MAPE compared with the other fifteen models in the modelling and forecasting stages. It can be obviously seen that the ELGM can also show higher prediction accuracy in short-term prediction when processing the data with different feature.

Data collection and division
The raw data of electricity supply in China is collected from the website [37] until the last update, which includes electricity supply data from 2000 to 2018 in China. The data is divided from a total of 19 time nodes from 2000 to 2018 by the division method introduced in Section 2.1 to obtain two data sets: modelling set and forecasting set without optimized parameter models; training set, validation set and forecasting set with optimized parameter models. The parameters of the division method are shown in the Table 3.

Forecasting results
In this paper, the three accumulation forms of NGM are used as three basic models to train and integrate the forecasting results of the models to get the final results. At the same time, the forecasting results of five different models under three accumulation forms are analyzed. The forecasting results, optimized parameter values and evaluation metrics values are shown in the Table 4. MAPEPR, MAPEPO and MAPE of the ELGM are 2.34%, 0.20% and 2.55%, respectively, which are the minimum values among the metrics, and it indicates that the ELGM has the best prediction performance.
Taking the MAPE value as the benchmark, the forecasting results of the minimum accumulation mode of MAPE value under the different accumulation of five models and the prediction charts of the ELGM results are given, as shown in Figure 4. It can be seen the prediction ability of the ELGM is slightly better than the optimal situation of other several models, which illustrates that the ensemble strategy can further improve the traditional prediction method and optimize the prediction results.    Figure 4. Comparison of optimal forecasting results between ELGM and various theoretical models.
The ELGM is compared with the original three basic models. It can be concluded from Figure 5 that the ELGM can not only ensure the minimum error between the forecasting set and the original data, but also make up for the poor prediction effect of the early samples due to the minimum error in the later samples. The evaluation criteria values of the top ten models with better prediction performance are visualized in Figure 6. From the evaluation criteria values in the figure, it can be clearly observed that the prediction of the ELGM is better than other models, which further verifies the feasibility of the ensemble learning-based grey prediction.  Figure 6. The MAPEPR, MAPEPO, MAPE of models with superior forecast performance.

Brief summary and short discussion
According to the forecasting results in Section 5.2, first of all, the models with the same basic form but different accumulation are called a group. The group of GM, DGM, or NGBM has a better fitting effect than the NGM group, but all of them are less effective than the NGM group in the forecasting stage, which indicating the NGM group has high stability. And the NDGM group performed worse in modelling and forecasting set. It could be seen the forecasting ability of ELGM benefits from the base estimator.
However, compared with the forecasting results of the basic models, the forecasting accuracy of ELGM obtained by the new ensemble strategy exceeds all basic models, which verifies that the strategy achieves the ensemble effect of ensemble learning while theoretically simplifying the input of the basic model.
The above discussion shows that the proposal of ELGM not only verifies that the effect of ensemble learning is very significant, but also provides a new way for the selection of ensemble strategies. The application results also fully illustrate that ELGM could be used as a reliable forecasting tool in the energy field.

Conclusions
A nonhomogeneous grey model based on a novel ensemble strategy is proposed in this work, abbreviated as ELGM. At first, the grid search cross validation is used to obtain the optimal parameters of each basic model, and the nonhomogeneous grey models under the three accumulation forms are taken as the ensemble objects, and the ensemble is carried out in the way of average prediction results. In the case study, compared with fifteen general models, the results show that the ELGM has the highest generalization ability in small-sample time series forecasting, and can also have the same excellent performance when dealing with different trend data. The applied research of electricity supply shows that the proposed ELGM has the highest prediction accuracy in short-term electricity prediction and is expected to become a reliable tool for energy prediction.