Short-term load forecasting using machine learning and periodicity decomposition

The accuracy of electricity consumption forecasts is of paramount importance in energy planning, it provides strong support for the effective energy demand management. In this work, we proposed a load forecast through the decomposition of the historical time series in relation to the historical evolution of each hour of the day. The output of these decomposition were served as input to different algorithms of machine learning. We tested our model by five machines learning methods, the achieved results are examined with three of the most commonly used evaluation measures in forecasting. The obtained results were very satisfactory. Keyword: load forecasting; machine learning; periodicity decomposition; time series; smart grid


Introduction
The smart grid is developing, benefiting from the progress of information and communication technologies, and is increasingly becoming an efficient and robust system. In this environment, energy management systems are developed to monitor, optimize and control the energy market of smart grids. Demand management; considered an essential part of the energy management system; provides the means to make appropriate decisions on the exchange of electrical energy between different entities of the electrical grid by ensuring the stability and reliability of the operation of the electrical system [1].
Today's electricity grids are growing rapidly, creating so many concerns about the environment, efficient use, sustainability and energy independence. Electric load forecasting systems is conceived as the primary purpose of energy demand supply management [2][3][4].
Accurate and reliable forecasting techniques can contribute to:  Supply and demand planning.  Strengthen the reliability of the electricity grid by making it easier for operators to plan and make strategic decisions for market players.  Optimizing the load required at peak times by ensuring that the energy offered by producers is minimized.  The harmonious integration of renewable resources helps to achieve environmental and economic objectives.  Save operating and maintenance costs while maintaining the system at a lower cost and reducing network reinforcement investments. Time series as indicating the term 'time series' is a presentation of data classified in order of time (years, months, days, hours...). Time series analysis also makes it possible to describe and explain a phenomenon over time in order to make decisions, including predictive decisions. The methodologies established in this framework aim to implement models that translate the mechanisms involved in the creation of the time series collected, within this framework; several approaches have been made to address the predictive problem that comes from statistics and machine learning [5,6].
In our study, we work on a load forecast according to a decomposition of the historical load data, whose load time series constitutes a periodic variation. The used decomposition subdivides the time series with reference to each hour of the day, to finally constitute 24 time series that represent every historical hour. The 24 time series are the input of five machine learning methods (multilayer perceptron, Support vector machine regression, RBF regressor, Reptree, Gaussian process). The absolute mean percentage error (MAPE), the root mean square error (MSE) and the Mean Absolute Error (MAE) are the evaluation measures, used to test the accuracy of the obtained results.
The Section 2 presents the related work to the electrical load forecasting. Section 3 describe our approach to predicting the time series of electrical charge. Section 4 presents the machine learning methods used in this work. Section 5 displays the experimental results obtained and the interpretations derived from the results. We conclude the document with Section 6.

Related work
The forecasting problem was approached in the first works by using mathematical methods such as (regression, multiple regression, exponential smoothing and iterative technique of weighted least squares) until to the use of the machine learning and fuzzy logic.
Among the first studies to load forecasting, there was a regression [7] whose authors used linear regression for loads forecasting, while Hyde et al and Broadwater et al are developed a method based on a non-linear load regression [8,9], several studies used autoregressive modeling, including R. Huang [10] who proposed an autoregressive model for short-term load forecasting; El-Keib AA et al. [11] in their paper worked on short-term forecasting models using exponentially smoothing. Chen J et al. have used an adaptive ARMA model for load forecasting in which they updated their model with learning forecast errors [12], while Barakat EH et al. [13] adjusted ARMA model (1,6) after analyzing the properties of seasonally adjusted loads for California steps. The ARIMA model was introduced to predict load by taking into account seasonal variation. Taylor JW [14] in this paper implemented a method based on ARIMA which adapts to seasonality from one day to the next and inter-weekly, he adapted the exponential smoothing of Holt-Winters which adapt to these two seasonality. A probabilistic approach was used by Hyndman RJ et al. [15] to predict long-term load, his method is based on predicting the probability distribution of annual and weekly peak electricity demand up to ten years in advance applied to the Australian grid and dividing its model into two effects (annual and a half-hourly) estimated separately.
The authors in the paper [16] used an artificial neural network and fuzzy logic to predict the short-term electrical charge. Using the Asia-Pacific Economic Cooperation Energy Database, Li D et al. [17] have worked in this paper on the problem of short-term forecasting, based on Grey's theory which allows to build a model with limited samples. Yang HY et al. [18] in their study are opted for very short-term load prediction by chaotic dynamic reconstruction using the Grassberger-Procaccia algorithm and the least squares regression method is applied to obtain the value of the correlation dimension to obtain the value of the correlation dimension that will be the basis of the FNS model. Al-kandari AM et al. [19] worked with a fuzzy linear regression model for the summer and winter seasons and solved using the simplex method based on linear programming. Smith D [20] in his paper used the Bayesian semi-parametric regression method to identify daily, weekly and temperature-sensitive periodic components of the load in order to model intra-day electricity load data and obtain short-term load forecasts. Amina M et al. are implemented a neural fuzzy wavelet model on an hourly basis that replaced the classical linear model, which usually appears in the consequent part of a neurofuzzy scheme, the fuzzy rules are derived by the Expectation-Maximization algorithm [21], Hsu C et al. [22] proposed a model based on Grey theory using a technique that combines residual modification with estimation of the sign of the artificial neural network.
Output core learning techniques are used by Fiot J et al. [23] to predict electricity demand measured over several lines of a distribution network, these techniques are adapted to model the complex seasonal effects that characterize electricity demand data, while learning and exploiting correlations between several demand profiles. Gonzalez-Romera E et al. [24] used the monthly energy neural network, two neural networks are formed to predict the trend and fluctuation surrounding it separately that are separated in advance, and that is summed to obtain a global forecast.
Zahedi G et al. [25] have opted for a neuro-fuzzy structure that can be defined as an ANN (artificial neural network) this network is formed by experimental data to find the system parameters of fuzzy inference. A random forest model for short-term electrical load prediction was discussed by Dudek G et al. [26]. This is a comprehensive learning method that generates many regression trees (CART). Chaturvedi DK et al. [27] present a solution methodology using fuzzy logic for short-term load forecasting.
Zahedi G et al. [25] has opted for a neuro-fuzzy structure that can be defined as an ANN (artificial neural network) this network is formed by experimental data to find the system parameters of fuzzy inference. A random forest model for short-term electrical load prediction was discussed by Dudek G et al. [26]. This is a comprehensive learning method that generates many regression trees (CART). Load forecasting models based on deep neural networks (DNN) was applied to an empirical database of demand side loading. Ryu S et al. [28] used a DNNs formed in two different ways: with a limited Boltzmann machine before forming and with the use of the linear unit on the floor without.

Periodicity decomposition
We used the hourly electrical load data of the Moroccan electricity system for the period 2014-2016. In Figure 1, we present a 100-hour view of the load evolution that shows the periodic variations for each hour of the day.
In this context, we will use a daily cycle that contains 24 hours, we decompose the initial series into 24 sub-series, each series containing a sequence of a one-hour period from day '1' to day 'n'.
Assume that: n = 24*m   This decomposition will allow us to predict the day by a separate forecast of each hour of the day j + n.
where n denotes the number of days to be predicted is ℎ ℎ The first day of prediction can be written as follows:

Multi-Layer Perceptron (MLP)
Paul Werbos developed the MLP in 1974, which generalizes simple perception in the non-linear approach by using the logistics function (8) Or the hyperbolic function t ℎ It has become one of the most popular neural networks conceived for supervised learning. The MLP consists of 3 layers, an input layer, an output layer and an intermediate layer which can be formed by at least one layer, the information is transmitted in one direction, from the input layer to the output layer.
By an adjustment iteration set comparing outputs and inputs, the MLP adjusts the weights of neural connections; in order to find an optimal weight structure through the gradient backpropagation method. The network generally converges to a state where the calculation error is low.
The MLP is given by: Of which: : The input vector x with 1, : The weight vector for j-th hidden node , , . . . , : The weights of the output node : The output of the network : The function representing the hidden nodes, in this case a sigmoid function

Support Vector Machine (SVM)
SVM methods are discrimination techniques. Its principle consists in making an optimal separation of two or more sets of points by a hyperplane by projecting the data into a very large space in which the data becomes linearly separable. A particular choice is made from among all the possible separators. An important and unique feature of this approach is that the solution is based only on the data points that are in the margin. These points are called support vectors [29].
The most important feature in SVM is spread over the points on the margin that are the solution to classification. SVM can also extend as a non-linear classifier by a linear transformation of the initial problem called kernel.

Radial Basis Function (RBF)
RBF (radial basis function) is a type of neural feedforward network, with a simpler structure than MLP and having a much faster training process. The RBF neural network has three layer structures; the input layer which is connected to the hidden intermediate layer, this layer is designed to fill the non-linear transformation of the input layer, the third layer is the output layer which provides the answers to activate the model of the input layer [30].

Reptree
The aim of the tree decision is to create a supervised learning arborescent model in which each node verifies a test function with the input vector. The structure of the decision tree consists of the branches that represent the attributes of the observed data, and the leaves that are the target values of the data.
Through an iteration set, the Reptree creates several trees to select the best generated tree based on the principle of calculating the gain of information by entropy and reducing the error resulting from the variance proposed by Quinlan [31].

Gaussian process (GP)
The GP is a probabilistic model that models the evolution of the process through time, therefore, it can determine the probability of each possible state sequence. To remedy latent in some cases parametric models when there is an unknown function, the Gaussian process is constructed as a classical statistical model forming a finite number random model of choice, with a constant Gaussian articular distribution [32].
The GP allows Bayesian inference to be performed directly in the function space. It allows a regression function to be deduced from a set of learning data of input-output pairs, by selecting a covariance function, which defines how the output vector changes when the output vector changes.

Data base processing and settings of machine learning algorithm
The test data are derived from the Moroccan electrical load data for the period (from 01/01/2014 to 30/11/2016). Were used as the training interval for each predictive variable, while the trials were evaluated using data from the following month ( Our goal is to predict the charges for the next 100 hours. This approach shows how simple and successful the model can be. Our task was to predict the electricity consumption for each hour of the day and then form the daily, weekly and monthly consumption. Machine learning algorithms predict the future value of a time series data set by identifying the relationships between the characteristics of historical data and using the relationships revealed to predict.
For MLP, we have opted for a series of variations in the number of neurons in the hidden layer, this number must be high enough to model the problem, but not very high to avoid oversizing. The learning algorithm used for this purpose is the iterative backpropagation algorithm. Additionally, to solve the problem of minimizing the cost function in relation to connection weights, the Gradient Descent algorithm is used in conjunction with the backpropagation algorithm.
The SVM learning algorithm used is SMOreg. This is a supervised machine learning algorithm that implements the learning of the machine vector support for regression. The accuracy of the SVM regression depends on the accuracy of the equation for selecting an appropriate function and parameters of the kernel. The kernel function is used to transform the data of the input space into high-dimensional data of the element space. In our test we chose to use the function of the RBF kernel which gave more precision compared to the linear, Gaussian and polynomial functions.
The SVM learning algorithm used is SMOreg. This is a supervised machine learning algorithm that implements the learning of the machine vector support for regression. The accuracy of the SVM regression depends on the accuracy of the equation for selecting an appropriate function and parameters of the kernel. The kernel function is used to transform the data of the input space into high-dimensional data of the element space. In our test we chose to use the function of the RBF kernel which gave more precision compared to the linear, Gaussian and polynomial functions.
The RBF regressor used in this study is a supervised algorithm that minimizes quadratic error, in which each node has a Gaussian central vector optimized by SimpleKMeans. The initial sigma values are set to the maximum distance between a centre and its nearest neighbour in all centres.
The Reptree algorithm is a variant of the C4.5 algorithm. In our test we have opted for different variation of the data present in a node during the fraction in the regression trees.
For the GP, input and output data are monitored from an underlying functional mapping, via Bayesian inference whose underlying function is estimated in order to make predictions.
In all the tests performed, we calculated 3 measures of accuracy (EMS, MAE and MAPE). Over the entire predicted series of size n, MAPE (Mean absolute percentage error) is the most popular measure of error accuracy of predictions used when forecasting demand at all times. The MSE, related to the standard deviation of forecast errors due to the square function, is more sensitive to outliers and errors less than 1. However, MAE (mean absolute error) is less sensitive to outliers and its scale is equal to that of the forecast data.
3 measurements are calculated as follows: Of which: : The predicted value : The actual value

Results
Forecast results are generated for 100 future hours. These results are assessed with actual measurements and presented in the same graph first and then separately.  . From the extracted curves we clearly notice that the MLP is the closest to the actual load curve. It is followed by the vector machine support (SMO). Then, slightly less so the RBF and RepTree curves, while the Gaussian Process is the furthest from the real curve. Figure 3 shows the evolution of the forecast of the different methods compared to the actual load for 100 hours, for which the MLP faithfully follows the load curve. In the Table 1 we summarize the results obtained by the five methods (MLP, SMO Table 1 presents the accuracy measurements of the five methods (MLP, RBF, SVM, RepTree, and GP), these measurements are calculated based on the values produced from the 100-hour precision measurements. The results obtained for the methods used show that the MLP is the most robust among the others with a MAPE percentage of 0.96, the SVM although it is far its power compared to the MLP, it gives more rigorous results compared to the RBF, Reptree, and GP, the GP on the other hand is the farthest from the actual data.

Conclusion
In this work, we proposed an approach that consists of a periodic decomposition of the series, this decomposition led us to work on 24 time series that each represent the historical evolution of each hour of the day. Following this decomposition we obtained the forecasts for each hour and then formed the entire day. We tested this decomposition by five machine learning algorithms (MLP, SVM; Gaussian Process, RBF, RepTree). The results obtained are conducted in error verification tests (MAPE; MSE, MAE) which gave good results for MLP, and SVM and which proved the robustness of MLP, despite the fact that these results were obtained with a high training time and calculation cost.
According to Figures 1 and 2, we notice that the divergence between the real curve and the predicted curve shows divergences at the level of peak hours and the lowest consumption hours, in perspective of this paper we will take this divergence into account in order to minimize this difference.