Research on Prediction Methods of Energy Consumption Data

: This paper analyzes the energy consumption situation in Beijing, based on the comparison of common energy consumption prediction methods. Here we use multiple linear regression analysis, grey prediction, BP neural net-work prediction, grey BP neural network prediction combined method, LSTM long-term and short-term memory network model prediction method. Firstly, before constructing the model, the whole model is explained theoretically. The advantages and disadvantages of each model are analyzed before the modeling, and the corresponding advantages and disadvantages of these models are pointed out. Finally, these models are used to construct the Beijing energy forecasting model, and some years are selected as test samples to test the prediction accuracy. Finally, all models were used to predict the development trend of Beijing's total energy consumption from 2018 to 2019, and the relevant energy-saving opinions were given.


Introduction
Forecasting energy consumption may make the government more convenient and more accurate analysis and evaluation of the city's energy consumption and regulation. By analyzing the overall state of energy consumption, energy consumption can be reduced, and it is determined through actual data analysis and research to determine why there is such a certain amount of energy consumption to determine the factors that contribute to the favorable and unfavorable factors of energy consumption. The next step is to reduce energy consumption.
Beijing is a city with relatively scarce energy resources. The dependence on energy is very high. And with the continued development of the economy, the contradiction between energy supply and demand in Beijing will become more and more prominent. Therefore, how to study Beijing's energy consumption demand and predict the future energy consumption of Beijing is imminent. Based on the data of the total energy consumption and composition of Beijing in the past 38 years, this paper analyzes the energy classification and analyzes and models the energy consumption types of key energy-using enterprises in Beijing. The total energy consumption is a combination of many complicated processes, and there are many kinds of energy consumption. Generally, the total energy consumption is divided into 39 types, and together with the total energy, there are more than 40 kinds. Its characteristics are many variables, and the relationship is more complicated.
To sum up, this article will explore Beijing's energy consumption patterns and their energy consumption characteristics, domestic and international energy consumption and energy efficiency assessment methods. Late stage, the data law of energy consumption in power, coal, coke, natural gas and liquefied petroleum gas will be analyzed to study energy consumption prediction. The energy consumption prediction model predicts the energy consumption of the target time period by using regression prediction, neural network and 3 other algorithms in deep learning, and studies the energy consumption data analysis method of the city. The research results also reflect the energy saving direction of the city. There will be corresponding help [1].

Related Work
Researchers such as Zhang [2] analyzed the error and accuracy of the energy consumption prediction models of several metallurgical enterprises, such as neural network prediction model, genetic algorithm and BP neural network prediction model, discussed the advantages and disadvantages of various methods, and summarized some under construction. The problem you need to pay attention to when modeling.
Ding et al. [3] proposed a short-term energy consumption prediction model based on genetic optimization decision tree. The genetic algorithm was used to optimize the subtree generation process of the gradient decision tree, and the prediction result became this iteration. The evaluation criteria met the predicted goals.
Wang [4] used the method of multiple linear regression model to study the forecast of total output value of Jiangsu construction industry. Based on the data of construction industry in Jiangsu Province from 2000 to 2016, he analyzed a multiple regression model, including wages and electricity. Interpretive variables such as consumption and current assets are important for the province's overall GDP. In terms of inspection, MPE and MAPE (percentage of absolute value of error) were used to explain the percentage, and finally the results of multiple linear regression have better prediction results.
Foreign research scholar use intelligent artificial neural networks to simulate the solar potential of the Souss-Massa region in Morocco, paying special attention to the application of artificial neural networks, especially in the field of renewable energy, especially the prediction of meteorological data such as solar radiation. For this reason, a multi-layer perceptron-based model was developed to predict the evolution of global solar radiation in the Souss-Massa region (southwest Morocco) [5]. In this study, researchers used a large database of a wide range of periods. The database contains a set of metrology and geographic data such as: latitude, longitude, months of the year, average temperature, sunshine duration, relative humidity, and average monthly global solar radiation. After testing a number of models, an appropriate model was given that gave the minimum root mean square error. In addition, the researchers found that they have good correlation coefficients between their measured values and predicted values.
Ramos et al. [6] proposed a combined prediction of gray prediction method and neural network back propagation model, called gray neural network and input-output combination prediction model. The validity of the proposed model is validated using a real energy prediction case. Finally, the model tests show that the proposed model has higher simulation and prediction accuracy in terms of energy consumption than the gray model and other combined methods.
Combining the above learning methods, we can combine the energy consumption situation of Beijing to get corresponding methods to make relevant predictions. A simple theoretical description of the corresponding method will be given in Chapter 3.

Research Methods and Research Objectives
After analyzing the energy consumption data of Beijing, the data selection time is set as the statistical data from 1980 to 2017. In the establishment of the prediction model, the regression algorithm, gray prediction and neural network combined model in deep learning are used to model the data. Finally, the obtained model is verified and the data is compared, and the prediction accuracy of various models is obtained. The verification algorithm uses MSPE, and finally a total model suitable for Beijing energy consumption prediction is obtained. The historical data predicts the model of energy consumption data for several years in the future, and gives corresponding suggestions on energy consumption through data and some previous data.

Multiple Linear Regression
Regression techniques are used to predict response variables that contain numerical outputs, which may be numeric or categorical. However, what these algorithms have in common is that the output is a numerical value. Linear regression is a particularly old way of predicting statistics, dating back to the work of Carl Friedrich Gauss in 1795. The goal of linear regression is to fit a linear model be-tween the corresponding variable and the independent variable and use it to predict the output of a given independent observed variable. Regression analysis is a method that is not required for sample data and can process and predict mid-and long-term data with high accuracy [7]. The principle is simple, the calculation is simple, and it can be applied in many scopes, but its variables have correlation, and there is a perfect test method for the inspection at the later stage, which ensures the credibility of the prediction data.

Grey Prediction
The grey prediction model is an established model based on certain information and the predicted relationship information is uncertain. The model mainly uses the accumulation method to initialize the data to generate a sequence of data with strong regularity.

Artificial Neural Networks
The artificial neural network becomes a mathematical model that simulates the biological neural network to process information. After conducting physiological research on the brain, they rely on the results of the research as the basis, and the ultimate goal is to stimulate the results of some specific functions by simulating some mechanisms in the brain.

BP Neural Network
BP neural network is actually an algorithm with the function of back propagation algorithm. The characteristic of the algorithm is to directly estimate the error of the output layer error on the output layer by using the error after output, and use the error to estimate the error of the previous layer. The error estimate for each layer can form the process of passing the error presented by the output layer to the input layer of the network in the reverse of the input transmission. The propagation process of BP neural network is divided into two directions: forward propagation process and backward propagation process. It consists of three parts: the input layer, the hidden layer, and the output layer. However, the hidden layer can be a multilayer or a single layer. The forward propagation process is that the known information is the input layer, the hidden layer is the limit value law, and the output layer is obtained [8]. The backward propagation process mainly uses the gradient descent method for each neuron, and the error is calculated by adjusting the weight, and the error signal is reduced. Small, making fewer error signals.

LSTM Long-Term and Short-Term Memory Neural Network
LSTM is a special kind of cyclic neural network. In order to solve some problems encountered in training the length sequence, such as gradient disappearance and gradient explosion, it can be said that this network can be in long sequence compared with the general cyclic neural network. There is a good performance. The LSTM network adds a state unit c_t to the hidden layer based on the RNN and controls it with three control switches: the forgetting gate f_t, the input gate i_t, and the output gate o_t. These doors, like the storage in our computer, can perform a series of functions such as storage, writing, and reading [9]. The LSTM modules use these to control the information they flow. Each module has switches to determine storage, read and write, and then use the sigmoid function to determine the output. Here we should note that the output is a standardized number between 0-1 [10].

Definition of the Concept of Total Energy Consumption
The research object of this paper is the total energy consumption in Beijing. In many current literatures, total energy consumption refers to the sum of various energy sources consumed by a certain region (geographical or administrative region) in daily life and production during a certain period of time, including terminal energy consumption, The energy processing conversion loss, the energy transportation and the loss of management process are three parts. It can be seen that the research object of this paper refers to the sum of the terminal energy consumption, the energy processing conversion loss and the energy loss in the energy transportation and management process consumed by Beijing in a certain period of time [11].

Energy Consumption Prediction Model Based on Multiple Linear Regression
For the multiple linear regression model, there are some aspects that are not excellent. First, we analyze the advantages and disadvantages of the overall model. Regression analysis is simpler and more convenient when analyzing multi-factor models. Using the regression model, as long as the model and data are the same, the standard results can be used to calculate the unique results [12]. However, in the form of graphs and tables, the interpretation of the relationship between the data often varies from person to person, and different analysts draw. The fitting curve is probably also different. However, which factor is used in some regression analyses and what expression to use for this factor is only a speculation, which affects the diversity and unpredictability of certain factors, making regression analysis limited in some cases.
There are many factors that affect the total energy consumption, but the relationship is very complicated. The influencing factors will be the regional industrial structure, some local policy reasons, the economic level of the place, the pricing of energy, and the ability of science and technology. However, many of these factors are non-quantitative factors, so the selection of regression variables in the establishment of predictive models of Beijing energy consumption does not consider these non-quantitative factors. According to the data compiled and comprehensive considerations (the unit of all variables here is 10,000 tons of standard coal), the factors selected in the modeling are as follows: The representative of the economic level is the GDP data of Beijing in recent years, which is calculated as x1.
The representative of the industrial structure selects the consumption of the tertiary industry, which accounts for a larger proportion of the total production value in recent years, as a data factor, which is calculated as x2. Select the per capita income level as the influencing factor and record it as x3.
The proportion of urban population to the population can also be seen as a very influential factor, which is considered to be x4.
Y is the actual value of all energy consumption over the years. Based on the above factors, I first established a Y-multiple linear regression model for Beijing's total energy consumption: (1) Data from the sum of Beijing's energy consumption and various explanatory variables from 1980 to 2017 were selected and introduced into STATA15. Using the least squares method to fit the multiple regression model, the following results were obtained. After the multi-collinearity test and correction, heteroscedasticity and auto-correlation test and correction of the above data, we can finally determine the Beijing energy consumption regression model as (Due to space reasons, only the data for nearly ten years is listed here): In the error table of this linear regression, we can see that the error is very large several times, but it was in the 1880s and 1990s. With the passage of time, after 2010, the linear regression model is also very good. The possible error is several years because of the mutation of the explanatory variable or some other reason, but the model can still be approved.

Energy Consumption Prediction Model Based on Grey Prediction and Neural Network
For the statistical model of Beijing's energy consumption from 1980 to 2017, it is divided into the first, second and third industries to calculate the total amount and the gray neural network model for its data. The energy consumption data is as follows: the energy consumption is y, the energy consumption per 10,000 yuan of GDP is recorded as x1 in the table, and the per capita income level is selected as the influencing factor, which is calculated as x2, and the representative of the industrial structure chooses the proportion in recent years. The consumption of the tertiary industry, which accounts for a large proportion of GDP, is counted as x3 as a data factor. Another factor is the choice of the total population of the city [13]. The grey BP model combines the grey prediction model with the BP neural network [14]. We can obtain a new prediction model that contains information about the two prediction models, which can relatively avoid the limitations in the pre-diction of a single model. The specific steps are as follows: First use the 38-year energy consumption data from Beijing from 1980 to 2017 as the original sequence data. Here we first use the gray prediction method to normalize the energy consumption data. In a neural network, all we have to do is initialize the relative weights of the network and its thresholds, and determine some parameters like the learning rate. The original sequence data is divided into training data and test data (test data does not participate in neural network training), wherein the training data selects the data of 1980-2015, and the test data selects the data of 2016-2017. Get the forecast data.
Construct a grey BP neural network prediction model, the specific flow chart is as follows: After understanding the flow of the grey BP neural network model, we will begin to construct the model: First, gray prediction for each data. For the number of neurons, five were selected, and the BP neural network prediction model was trained.
Firstly, we standardized the zero-mean data and brought it into the 3-layer neural network prediction model established by Beijing Energy Consumption Forecast (5 layers in the input layer, 10 hidden nodes, and 1 layer in the output layer) to obtain energy consumption in 2019. The value of the quantity is 80,679,300 tons of standard coal.
The following images are a comparison of predicted and actual values predicted by BP neural networks. After conducting the gray prediction to predict and predict the various characteristic data in 2018 and 2019, the generated data is read again and the BP neural network model is used for training prediction to predict the energy consumption. In the following two years, a comparison table (in 10,000 tons of standard coal) was predicted. Due to space reasons, only the data for nearly ten years is listed here. For the prediction error of this model, the maximum and maximum error rate is a relatively long year in 1983. For the prediction of the past five to ten years, the average error rate is 0.1%, with the year. As the trend moves backwards, the performance of this model is getting better and better, and the error rate is getting lower and lower. It can be said that the combination of gray and BP neural net-work models is the best model in these research models.

LSTM-Based Energy Prediction Model
LSTM is an excellent variant of RNN that inherits the features of most RNN models and solves the problem of Vanishing Gradient due to the gradual reduction of the gradient back-propagation process. Specific to language processing tasks, LSTM is very suitable for dealing with problems that are highly correlated with time series. Although on the classification problem, it seems that the feed-forward network represented by CNN still has performance advantages, but LSTM is more in the long run. The potential for complex tasks is not comparable to CNN [15]. It more realistically characterizes or simulates the cognitive processes of human behavior, logical development, and neural organization. Especially since 2014, LSTM has become a very hot research model in RNN and even deep learning framework [16], and has received a lot of attention and research.
First, the energy consumption data is entered into the csv file by year and the file is saved. The first step in the code is to obtain the pre-processed data. The pre-processing of the data will be divided into three steps: grouping the loaded data, normalizing (normalizing) the grouped data, and performing the data set [17]. Division. The LSTM network model is carried out to initially determine the hyperparameters of the network, send the preprocessed data to the network for training, and calculate the loss function and gradient during the training process. The specific process is as follows: For the establishment of the LSTM energy consumption prediction model, five parameters should be considered, the number of sample training, the number of layers in the input layer, the selection of the number of layers in the hidden layer, the setting of the dimension of the hidden layer, and the selection of the dimension of the output variable. The training step size setting of the sample which is the sample length which is the setting of seq_len, is to consider whether the model can be trained because of these sample lengths. If this value is set too small, the model will not be trained to the corresponding law. If this value is selected too much, it will cause more irrelevant sequence generation, resulting in inaccurate prediction of the model. This model will set different sample lengths and then finalize a value. Since the input variables in this paper are energy consumption, we choose 1 for the input dimension. The number of layers of multiple hidden layers can theoretically fit any function. However, due to the fear of over-fitting, the appropriate amount is selected according to the resources and suggestions on the Internet. It can be seen from the above curves that after the epoch selected 30 times, 300 times, 600 times and 900 times, we can get about 600 times. It can be seen from the figure that the prediction effect is the best, and the error is the smallest. The model with epoch 600 times was chosen as the final energy consumption model.

Conclusion
The analysis results obtained from the above models can be obtained: the pre-diction value of the gray prediction model is the worst compared to the single prediction model, and the best is the linear regression model, the overall fitting effect and the accuracy of the prediction are required. Better than LSTM and gray models. But for the overall model, the grey prediction and BP neural network prediction combined model is the optimal model. In today's social development and steady economic growth, energy consumption will increase in the future. From Tab. 2 (units of 10,000 tons of standard coal), it can be known from the gray BP model that if Beijing's energy consumption and some other factors do not change significantly, Beijing's energy in 2019 The total consumption will be around 80,694,913 tons of standard coal. Judging from the current energy supply and demand situation in Beijing, the demand for future energy consumption supply will be very severe. We should suggest that the relevant units need to do the corresponding defense and supply work.