Prediction of energy consumption: Variable regression or time series? A case in China

Energy consumption is closely related to industrial structure, economic prosperity, and population. Because of the Granger causal relationship between GDP and energy consumption, many researchers consider the regression method to predict energy consumption using economic variables. However, some other researchers regard the time series method (forecast) to estimate for energy consumption. To address the advantages and disadvantages of two methods in energy consumption prediction, we performed the deep learning (DL) based on the case of China. The experiment results show that the accuracy of time series forecast is much higher than regression methods. In addition, we also explored the relationship between industry, economy, population, and energy consumption. We found that GDP, population, secondary sector, and tertiary sector were significant to energy consumption. A tertiary sector can help to reduce energy consumption to some extent. This could give suggestions about changing industry structure. More and more attention has been paid to energy intensity. The strategic approach to sustainable development requires reducing energy intensity while developing the economy. Since China is committed to achieving carbon emission peak by 2030, this paper predicts the energy intensity of China in the next decade based on a DL algorithm. The prediction results show that energy intensity would slightly decrease in the next decade. Different types of energy represent the industrial structure behind economic growth. Therefore, this paper was conducted using ridge regression (RR) to explore the relationship between different types of energy and energy intensity. In addition, the effect (positive or negative) of different energy sources in reducing energy intensity has also been explored. We found that the best way of developing the economy and reducing energy consumption is to use natural gas and hydroelectric instead of coal. This can guide policy development about changing energy source structure.

The policy of "Reform and Opening-up" has greatly promoted economic growth in China. China's economy has been growing rapidly since 1978. 1 However, huge energy cost is behind the rapidly growing economies. Many researchers have found that the economic growth has a strong relationship with energy consumption. Wang et al 2 have found that Granger causal relationship has existed between economic growth and energy consumption. In addition, they consider the relationship would be a bidirectional causal relationship. Therefore, energy consumption has become a double-edged sword to a country's development. On the one hand, energy consumption can bring economic growth. On the other hand, increasing energy demand puts a heavy burden on the environment. Energy consumption has been generally recognized to be the main impact factor to cause CO 2 emissions. 3 Taking China as an example, although it has not yet achieved the tasks of industrialization and urbanization with the rapid economic growth, it has become the largest energy consumer and the world's greatest CO 2 emitter. 4 The contradiction between eliminating poverty, improving citizens' lives, and environmental pollution makes China's economic development still face an arduous journey. For sustainable development and environmental responsibility, one of China's commitments under the Paris Accord is to achieve the peak of carbon dioxide emissions by 2030. 5 Therefore, we take China as a case to study the relationship between energy consumption and other factors and find a reasonable way to predict energy consumption.
Energy consumption prediction has always been a hot issue. Many kind of literature focused on the relationship between energy consumption and economic growth by Granger Causality. 6,7 They tried to find out some relative factors to energy consumption. However, with the development of artificial intelligence, more and more scholars use computer methods to find factors related to energy consumption and predict energy consumption. Artificial neural network (ANN), support vector machine (SVM), and many other algorithms are the most common way to do the prediction. [8][9][10] It is clear that these methods' performance is better than the linear models' when dealing with the huge amount of nonlinear data. (1) Some researchers use the regression method to predict energy consumption. Pao 11 found out the effect of the national income (NI), population (POP), GDP, and consumer price index (CPI) on electricity consumption using linear regression and ANN. Szoplik 12 predicted the natural gas demand in Szczecin (Poland) by multilayer perceptron model (MLP) using temperature and other factors. Günay 13 used population, GDP per capita, inflation, and average summer temperature to regress the annual gross electricity demand of Turkey for the years 2014-2028 by ANN. (2) However, some researchers have used time series to predict energy consumption. Jebaraj et al 14 considered coal consumption in India as time series and predicted it by ANN. Wang et al 15 used support vector machine (SVM) to predict the annual electricity consumption of Beijing.
Some literature showed the superiority of ANN when it is used in the prediction of energy consumption. However, with the rapid development of ANN, deep learning (DL) methods have become a more powerful way to solve all the problems which ANN could do. 16 ANN is the shallow neural network but DL is the deep neural network with stronger learning ability, which means it could found the more precise regular pattern behind the huge nonlinear data.
Therefore, this paper used a DL algorithm to compare the difference between the regression method and time series prediction. On the one hand, some parameters related to energy consumption, such as GDP, population, were mapped to energy consumption through DL to achieve prediction. On the other hand, we constructed a deep learning model to predict by considering energy consumption as a time series. Long short-term memory (LSTM) 17 which is a kind of DL method is chosen in this paper, since it is widely used in presenting many aspects of the economy 18 and energy. 19,20 Selecting variables before performing regression is also an important problem. Some traditional methods such as principal components analysis (PCA), 21 Kernel principal components analysis (KPCA), 22 or Bayes 23 are widely used in the field of energy economics. However, these methods could not give the statistical significance between energy consumption and variables. Therefore, to overcome the multicollinearity between variables, ridge regression (RR) was used in this paper. RR not only selects significant variables for regression but also gives us P-value of each variable. It is widely used for variable selection in various fields. [24][25][26] In this paper, we performed RR to selected variables and figured out the relationship between variables and energy consumption. Then, we used LSTM to predict energy consumption in two ways: variables regression and time series prediction. Then, we compared the results of the two methods and selected a better method to predict the energy intensity. Finally, we found the relationship between energy sources and GDP and energy intensity using RR. This result would provide a way to change industry structure and energy consumption structure so that China can promote economy and reduce energy consumption.

| Work frame
Our work consists of three parts. (1). The first part is the model comparison. In this part, we compared regression 2512 | LI algorithm with time series prediction algorithm and tried to figure out the reason that causes the difference. In addition, we also identified the relationship between variables and energy consumption. (2). Prediction of energy intensity. (3). Relationship between energy type and energy intensity and relationship between energy type and GDP were found.
To explain our work, detailed information of methods we used are introduced by the sections below.

| Algorithms used in part 1
The workflow of part 1 is shown in Figure 1. Firstly, we chose GDP, population, primary sector of the economy, secondary sector of the economy, and tertiary sector of the economy as the potential factors that affect energy consumption. Then, we used RR to select variables. For variables regression, we totally built three models, we used all variables for the first model, the selected variables involved in the second model, and the last one included only GDP.
For time series prediction, we only used energy consumption data to build a model. Finally, we compared the results of these four models.

Feature selection
Since we processed the energy consumption data as a time series, we should reshape the series to a format that LSTM could process. We tried to use 5-year data to predict for the 6th year. Therefore, the length of each sample was 5. The relation between samples and target variables is shown in Formula (1): In Formula (1), the left side is samples and the right side is the corresponding target variables. In this case, m is 5 and n is the whole size of our data.
In order to ensure that the prediction will not break off, after using all the raw data, the predicted data n+1 will be used to generate a new sample, like [data n−3 data n−2 data n−1 data n data n+1 ]. (1)

Series model
The relationship between variables and energy consumption could be obtained by this step | 2513 LI Then, we can obtain (n + 2)th data value data n+2 . Theoretically, based on continuous circulation, the energy consumption of any moment can be obtained. In fact, due to the effects of various factors, the prediction data are unstable. Therefore, it is difficult to do long-term prediction. So we only predict the energy consumption from 2018 to 2028.

Basic principles of LSTM
Long short-term memory is a specific form of recurrent neural network (RNN). Generally, RNN includes three features: (A) RNN can produce an output at each time node, and the connection between hidden elements is cyclic; (B) The output at one time is only connected with the hidden unit of the next time node.
(C) RNN contains hidden units with cyclic connections and can process sequential data and output a single prediction.
Unlike RNN, LSTM has three "gates." They are forget gate, input gate, and output gate. The forget gate makes LSTM better than traditional RNN. Because LSTM adds forget gates, the original neuron adds a state, C, to "reasonably" preserve its long-term state. The added state C is called "cell state" and it replaces the traditional hidden neuron nodes so it is also called "memory block." It is responsible for transferring the memory information from the initial position of the sequence to the end of the sequence.
As we can see in Figure 2, there are three inputs in the current neuron: output from the previous moment, input from this moment, and cell state from the previous moment.
The forget gate is crucial for our work since data from few decades age would not provide much useful information for predicting the future. But we could not know which year the amount of information begins to decrease. So we need to use LSTM to select some data from ages to be forgotten.
C t is the cell state (where the loop occurs), H t denotes the output of the current cell. denotes sigmoid function and tanh denotes hyperbolic tangent function. X t denotes the input vector. Figure 3 shows how LSTM uses the three information we mentioned before to calculate. As we can see there are two frames in Figure 2, Figure 3 just one of the frame in Figure  2. In order to explain the calculation process of Figure 3, we will list the mathematical formulas below.
f t denotes forget gate. i t denotes input gate. Ĉ t denotes the state in the previous moment. C t denotes cell state in the current moment. O t denotes output gate. H t denotes output. All the W denotes the weight of different gates, and b denotes the bias of different gate.

Building model by LSTM
We build a three layers network. Each layer is the structure of LSTM. The input is a time series vector with dimension (1,5), and output is the prediction result. The parameters are set as Table 1.

Feature selection
We find five variables related to energy consumption from three aspects: industry, economy, and population. They are GDP, population, the primary sector of the economy, the Cell state t+1 Output t+1 Output t-1 Cell state t-1 F I G U R E 3 The diagram of a LSTM building block 2514 | LI secondary sector of the economy, and tertiary sector of the economy.
Because the values of these variables are not in an order of magnitude, we use Z-score normalization to transform each variable into normal distribution data.
x′ is data after normalization, mean(x) is the mean value of x, std(x) is the standard deviation of x.

Ridge regression
Undoubtedly, energy sources must be highly correlated to each other and so do GDP and other economic factors. Multicollinearity would cause serious problems. Therefore, we use RR instead of least square, because ridge regression (RR) is an important way to overcome the multicollinearity.
Ridge regression is used to select the variables which are related to energy consumption significantly. Since we got five variables, we could select some significant variables, which can improve the accuracy of prediction.
Unlike least square regression, RR adds a regularization term that represents the complexity of the model. In other words, RR not only considers the fitting degree of data but also the complexity of the model. It can automatically shrink the weights of variables that are not significantly related to energy consumption.
Our aim is to get this equation: w is the coefficient of each variable. x are variables. y′ denotes the regression result. We try to make y′ very close to the real y. so the objective function could be denoted by: Since RR considers the complexity of the model, so the objective function could be denoted as: denotes the regularization term. To solve (11), we derive Equation 11.
Finally, we could obtain the value of w, then we could know which variables are significantly related to energy consumption.

Building model by LSTM
After selecting variables, we could put the significant variables into LSTM to build a model. Then, the input is significant variables and output is energy consumption.
The parameters are set as Table 2. Algorithms used in part 2. Energy intensity is predicted in this section. Energy intensity is calculated as units of energy per unit of GDP.
We will choose a method based on the result of part 1. Algorithms used in part 3.
We used RR to obtain the relationship between energy type and energy intensity and the relationship between energy type and GDP. Our aim is to find the significant factors which affect GDP and energy intensity. In addition, we also can obtain the effect (positive or negative) of different energy sources in reducing energy intensity and promoting the economy.

| Data description
We obtained the energy consumption, GDP, population, primary sector of the economy, the secondary sector of the economy, and tertiary sector of the economic data of China from 1965 to 2017. There are 6 kinds of different energy consumption data in the dataset. They are coal, oil, hydroelectric, nuclear, natural gas, and other (other renewables). We drew a figure of GDP and energy consumption as Figure 4. The blue line is energy consumption. The unit of it is billion-ton oil equivalent. The green line is GDP. The unit of it is billion RMB. As we can see in Figure 4, they are in the same trend, which means they are highly relevant.

| Results of model comparison
Firstly, we should use RR to select the most relevant variables. As we can see in Table 3, all the P-values are shown quite low except primary sector of the economy. It makes sense since the primary sector mainly includes farming, forestry, mining, and fishing. They do not cost too much energy. Another important point is that the coefficient of the tertiary sector is a negative number, which means the development of the tertiary industry will relatively reduce energy consumption.
After screening variables, we used 70% data to build a model and the rest 30% to test for validation, so we used the data from 1965 to 2001 to train and data from 2002 to 2017 to judge the accuracy of the model that we built. We finally built four models. (1) since GDP is generally considered highly correlated with energy consumption, we only used GDP to predict energy consumption. (2) we used all variables to predict energy consumption. (3) we used significant variables (GDP, population, secondary sector, and tertiary sector) to predict energy consumption. (4) we regarded energy consumption data as time series to predict.
The results of the prediction are shown in Figure 5.
As we can see in Figure 5, the precision of these four methods was almost the same. To figure out the best one, we calculated the error of four results.
As we can see in Table 4, there are three evaluating indicators. MAE means mean absolute error. MSE means mean square error. MaxError means the max error. It is easy to figure out that time series prediction is the best. Prediction by GDP is the worst. If we only use the most relevant variables to regress, the precision would increase.
Through these experiments, we could know that the precision of time series prediction is higher than regression method (at least in China's data) since we used the same method (LSTM) to do the experiments in two different ways. The superiority of time series prediction lies not only in high accuracy but also inconvenient calculation. Firstly, we only need energy consumption data to predict future development, while regression methods need more variables. In addition, if we use regression methods to predict energy consumption in 2020, we should at least know the GDP and other factors in that year, however, if we use time series prediction, we only need the energy consumption data from past years (The more data, the higher the accuracy).
Due to the superiority of time series prediction, we used it to predict energy intensity.

| Prediction of energy intensity
Firstly, we converted units of energy consumption to ton oil equivalent and convert units of GDP to billion dollars. Then, we calculated energy intensity as energy consumption/GDP. Finally, we used LSTM to predict the energy intensity of China in the next decade.
To verify the accuracy of our method, we firstly used 80% data to build a model and used 20% data to test the precision of our model. The result is shown in Figure 6.
As we can see in Figure 6, the blue (test result) almost coincides with a red line (true energy intensity), which means that our prediction result is very accurate. The green line is the training result. The MAE of this experiment is 0.059, which proves that our method could be used to predict the energy intensity of China in the next decade (2018-2028). Figure 7 is the result of a prediction. We could see that the energy intensity would decrease in the next decade, but the degree of decline is not very large.
The decline in energy intensity illustrates China's raising awareness of environmental protection and the changes in energy consumption structure. China has gradually reduced its dependence on coal and has developed and applied clean energy. Correspondingly, the proportion of the secondary industry is gradually decreasing, while the proportion of the tertiary industry is rising very fast. According to this trend, China is very likely to reach a peak of carbon emissions before 2030.

| Energy type and energy intensity & GDP
We also wanted to know which kind of energy contributes to the development of GDP most. Therefore, we used RR to calculate the relationship between Energy types and GDP.
As we can see in Table 5, to our surprise, coal and oil are not considered as the main factor to affect GDP. The reason should be that they are always the major energy source and there is no significant change in these two energy sources.
In another way to explain this, coal and oil are the important support for national economic development, but the upper limit of economic development needs other clean and efficient energy sources. Another important information that the table shows is the development of other renewables causes the decrease in GDP (coefficient of "Other" is negative). The reason might be the contradiction between high-cost development technology and low utilization rate. The development of other renewable energy technologies in China is still immature, which undoubtedly has an impact on China's economy.
The other important thing we found is the relevance between energy types and energy intensity.
As we can see in Table 6, oil and hydroelectric are significantly relevant to energy intensity and natural gas has less significant relevance. It is worth noting that natural gas and hydroelectric are significantly relevant to both energy intensity and GDP, which means that development of these two energy sources not only increase GDP but also reduce the cost of energy. In addition, the coefficient of natural gas is negative, which means that it can help reduce energy intensity. It shows China a road of low-carbon and high-speed development.

| DISCUSSION
In this paper, we mainly focused on three problems. (1). We compared variables regression method with time series LI prediction in predicting energy consumption. In addition, we also analyzed the relationship between industrial variables, economy, population, and energy consumption. (2) we predicted the energy intensity of China using the superior performance algorithm in the first part.
(3) we analyzed the relationship between energy types and GDP. We identified an energy type which was most relevant to GDP and promote GDP. In addition, we also identified the most energy intensity-related energy type. After dealing with these problems, we obtained 5 results. (1). The precision of time series prediction is higher than variables regression. The reason might be that many factors affect energy consumption, such as national policies, mining technology, improvements in energy efficiency, even exchange rate, obviously, we cannot take all the variables related to energy consumption into account. These deviations make variables regression less accurate than time series prediction. We can think it in a different way: energy consumption has already included all variables that are related to it and time series prediction is to use energy consumption to predict energy consumption according to the time trend. Therefore, in essence, using energy consumption data to predict energy consumption is to use all variables related to energy consumption to predict energy consumption.
Time series prediction is also easy to achieve since it only needs energy consumption data. Another important advantage of it is that it can conduct real prediction since it can provide 2020 data by given 2019 data, while variables regression needs variables value in 2020 to predict energy consumption in 2020. But variables regression can show the relationship between variables and energy consumption. In addition, it is easy to understand the prediction result we presented.
(2). Energy consumption is significantly relevant to GDP, population, secondary sector and tertiary sector. Primary sector does not make much contribute to energy consumption. Another important discovery is that tertiary sector can reduce energy consumption. Developing tertiary sector might be the best way to promote GDP and reduce energy consumption.
(3). The energy intensity of China will continue to decrease in the next few years, and then, it will be stabilized, which means that China's economic development will gradually get rid of the huge cost of energy consumption. A lower and lower price is achieved to convert energy into GDP.
(4). Hydroelectric, nuclear, and natural gas are significantly relevant to GDP and other renewables are less relevant to GDP. It means that the development of a new clean energy source could achieve economic prosperity too. In addition, due to the cost of development of other renewables, other renewables will lead to a decline in GDP in the short run. (5). Oil and hydroelectric are significantly relevant to energy intensity. Natural gas is less relevant to energy intensity. Combining with (4), we could know that the development of natural gas and hydroelectric can promote economic development while reducing energy intensity.

| CONCLUSIONS
China is experiencing rapid economic growth with great energy consumption. Since China has promised to reach the carbon emission peak in 2030, changing in energy structure and industry structure is essential. In this paper, we give suggestions about changing the energy structure and industry structure by data analysis.
In this paper, we introduced LSTM which is one of the popular algorithms in deep learning to predict the energy consumption in two ways. One is variable regression and the other one is time series prediction. We compared the two methods and found that time series prediction is more accurate and easier to achieve. We also figured out that energy consumption is significantly relevant to GDP, population, secondary sector, and tertiary sector. The tertiary sector is the key to promote GDP and reduce energy consumption. Since time series prediction is better than variables regression, we used it to predict energy intensity. The results showed that the energy intensity of China will continue to decline then remain stable in a few years. Finally, we discovered that hydroelectric, nuclear, and natural gas are significantly relevant to GDP and other renewables are less relevant to GDP. Oil and hydroelectric are significantly relevant to energy intensity. Natural gas is less relevant to energy intensity. Hydroelectric and natural gas might be the key to promote GDP and reduce energy consumption.
Overall, the best way to develop Economy and reduce Energy Consumption is to develop tertiary industry and use natural gas and hydroelectric to replace coal.

CONFLICT OF INTEREST
The authors declare that they have no competing interests.