1. Introduction
According to the last International Panel on Climate Change (IPCC) report [
1], there is strong evidence of human influence on global climate change, which is characterized by the global average temperature increase, higher ocean levels and the occurrence of catastrophic events, among others. The rise in greenhouse gas (GHG) concentration in the atmosphere is most likely the main driver of the changes in the Earth’s climate. In this sense, the energy sector accounts for a quarter of the world’s GHG emissions [
1], being the one with the largest share. Therefore, it is not possible to discuss climate change without considering the energy sector.
Changes in climate conditions can affect physical, biological and human systems at global and regional scales and require adaptation and mitigation measures. In order to reduce GHG emissions, several national, regional and global agreements have been signed, such as the European Union (EU) 2009/28/EC directive [
2], in which it was agreed that EU countries must fulfil at least 32% of their total energy needs with renewable energy by the year of 2030. More recently, as part of the European Green Deal, the European Commission proposed the first European Climate Law to enshrine the 2050 climate neutrality target into law [
3]. This type of measure is in accordance with the findings presented in [
4], which state that large-scale changes in the energy system are a must to achieve this goal and mitigate the global climate crisis. However, the energy transition towards cleaner sources demands profound and challenging changes in the sector’s infrastructure, policies, regulations, market design and operation.
The aging of conventional power plants, technological advances and cost reductions are allowing cleaner sources, mainly solar and wind-based systems, to boost their share in the electricity mix at the expense of fossil fuels [
5]. Simultaneously, the electrification of significant sectors, such as transport and heating, is increasing the load demand while system decentralization is altering the load patterns and the energy flow, as consumers are changing their roles to become prosumers, i.e., someone who both produces and consumes energy. Moreover, the growth of digital and storage technologies also increases system complexity and can expose it to cyber threats [
6]. All this transformation requires various adaptation measures, which involves technical, economical and political issues [
6] that must be applied to ensure reliable, affordable, safe and high-quality electricity.
The uncertainty of energy supply and demand can cause grid instability issues, such as overvoltage and frequency deviations. To overcome this situation, the system must be flexible and resilient enough in order to cope with rapid generation and load changes and balance them at every moment. In this regard, recent technologies and approaches such as small-scale energy storage systems (e.g., batteries) and demand response programs have been gaining increased attention [
6]. However, they are not always feasible as they can be extremely expensive and still need more improvement to be applied at a large scale. On the other hand, one common way for system operators to deal with this variability is by defining a certain amount of energy reserve that can be used to adjust the system frequency. In liberalized markets, such as the Iberian Electricity Market (MIBEL), the necessity and use of this reserve can configure an extra cost to the system operation and lead to an increase in the electricity price.
The above-mentioned solutions have in common the fact that they try to deal with the system variability by giving immediate responses to instantaneous deviations, by discharging batteries, turning off electrical appliances or increasing the power of a generator, for example. Nonetheless, they all typically have their behavior planned based on power generation and load demand predictions. Therefore, accurate forecasts are essential for these tools to optimize their performances and, consequently, the whole system operation.
The interest in energy demand and supply forecasts has significantly increased since the oil crisis in the 1970s [
7]. At that time, most of the employed models were statistics-based such as linear regression, ordinary least squares and stepwise regression, among others. However, the growing complexity of electrical systems and the unknown relationships between multiple variables impose great difficulties that these simpler models cannot handle.
On the other hand, a boost in the application of computational methods has been observed in recent years. The technological developments in computing allowed the creation of sophisticated and efficient computational methods that typically use advanced statistical concepts. They can be combined with statistical models, used to estimate their parameters or to make forecasts with reduced computational time and improved performance [
7].
In the present paper, a machine learning model based on a feed-forward neural network (FFNN) for a load demand forecast is proposed. The main novelty of this method is that the FFNN is first applied to the historical data of load measurements, and, in a second step, the same method is applied to the results of the load forecast to estimate the errors of the initial load forecast. Finally, the initial load forecast is adjusted considering the errors forecast, providing a very accurate load demand forecast. The consumption forecast can be used by several stakeholders for different purposes such as transmission and distribution system management, support to market participation or energy management in energy communities.
The increased need for electricity and the change in the power generation mix and load patterns are some of the already observed transformations. With all this, electricity grid management becomes more unpredictable and its operation and control more complex, which can lead to greater supply instability. Therefore, techniques and methods capable of increasing system reliability are extremely necessary.
One way to ensure more trustworthiness and better management of the system is by anticipating the load demand. When accurate forecasts are made, the decisions regarding the power system operation, maintenance and planning become more efficient [
8,
9]. Furthermore, improvements on energy policies and tariffs can be achieved. In recent years, much of the research has been focused on the development of models to forecast the electrical load in different time horizons. These periods are often classified as short term, which goes up to 1 week ahead; medium term, from weeks to 1 year; and long term, for future years [
10]. Additionally, each of these timeframes have different applications, with the first being more important for daily operation and cost minimization [
11] and the others for fuel reserves estimation, maintenance and capacity expansion planning [
12].
Several approaches have been employed recently to make these forecasts. These approaches can be separated into three categories: statistics-based, computational intelligence-based and hybrid approaches [
7]. Statistical models usually embraces uni or multivariate time-series models and regression techniques, such as Autoregressive Integrated Moving Average (ARIMA) and Linear Regression, while computational intelligence models are mainly related to ML approaches. Commonly, statistics-based methods are less memory intensive due to their simplicity and, thus, faster to execute. On the other hand, ML models are capable of identifying nonlinear relationships between inputs and outputs and can be extremely time consuming. However, this level of complexity can be necessary to achieve better results [
10]. Finally, hybrid models combine features from statistical and computational intelligence models. They generally use the former to preprocess and/or select the input data that will be fed to the latter.
These forecast models can be implemented using a large range of inputs that can be divided into four major categories: socio-economic, such as the regional average income and GDP; environmental, such as mean temperature; building and occupancy, which is related to building sizes and dwelling types; and time index, which is related to the date stamps used as inputs [
13]. Additionally, electricity demand historical data are generally taken into consideration. However, the choice of these inputs will depend on the time scale and type of the region of the study. Usually, historical, environmental and time index data are more common for short-term forecasts in a region scale [
13].
Recently, several neural network structures have been employed to develop for short-term load forecasting. In [
14], a model composed of a Convolutional Neural Network (CNN), a bidirectional Gated Recurrent Unit (GRU) and a Long Short-Term Memory (LSTM) recurrent neural network (RNN) was presented. First, the authors computed hourly autocorrelation coefficients of hourly loads and temperature, which were used to calculate the kernel size of convolutional layers. Later, two-dimensional convolutional layers with the load and temperature time-series were used to extract features from these data [
14,
15].
In [
16], a hybrid Artificial Neural Network (ANN) to forecast the day-ahead load in a smart grid was proposed. The strategy was divided into three modules: pre-processing, forecasting and optimization. The goal of the first module was to remove irrelevant and redundant features from the dataset by applying a mutual information-based technique. Mutual information (MI) represents the uncertainty reduction about one variable as a consequence of observing another one and is a concept related to the information theory [
17]. In [
18], the authors introduced an Advanced Wavelet Neural Network (AWNN) to forecast one-step-ahead load demand. The proposed approach was composed of four stages: load decomposition, feature selection, prediction model creation and forecasting. The main idea of data decomposition is to break the data into their constituent parts to find universal or functional properties that are not observed in their usual representation [
15].
In [
19], an LSTM network using a cross-correlation-based transfer learning approach was proposed to forecast 15-min-ahead load demand. Transfer learning is a methodology that identifies similarities in different datasets and allows the use of knowledge from other tasks on related ones [
15]. This approach can be extremely useful when the available data are scarce. In this work, energy demand data from several randomly selected buildings over approximately one year were collected for the transfer learning step, while the load demand to be estimated came from data collected from a university building in Turkey over one month. Additionally, both of them were sampled at every 15 min. The results showed that the proposed model was able to outperform the benchmark models in terms of the Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). Moreover, the contribution of the transfer learning approach was evident, as a significant improvement in the LSTM model could be verified. The authors also observed that the best proposed models came from those weights obtained with the data from the buildings with the highest cross-correlation. However, the MAPE was higher than the one observed in other works (about 8%), and details about the performance of these models for each day and time were not provided.
In [
12], ML approaches including ANN, Multiple Linear Regression (MLR), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Support Vector Regression (SVR) were employed to forecast week and year-ahead electricity demand in Cyprus. The inputs for the models were time, environmental data such as temperature, humidity and solar irradiation and socio-economic data such as population, gross national income per capita and electricity price. The results showed that the ANN and the SVR models performed significantly better than the other two for both short-term and long-term forecasts, with the MAPE around 2% in the first scenario and 5% in the second.
In [
20], a hybrid model that combined ARIMA with SVM was applied to forecast hourly load demand. This work used historical data from a state in the south of India to estimate the ARIMA parameters and generate initial load forecasts. Then, the outliers of the initial predictions were detected by means of the percentage error method and corrected using the deviation method. The forecast errors data of the corrected ARIMA model output and the other two variables, namely day and average temperature of the week, were given as inputs for the SVM model to estimate the initial forecast error. Finally, the initial forecasts and the expected errors were added to obtain the final load prediction. The authors found that the proposed ARIMA–SVM model was able to outperform single ARIMA and SVM models in terms of MAPE (4.15% versus 5.16% and 4.97%, respectively). Furthermore, the performance of the proposed model without the outlier detection approach was worse (6.23% versus 4.15%). In [
21], SVM was proposed for fault prediction of specific loads.
Another hybrid model was proposed in [
22], which combined an SVR model and a two-step parameter optimization algorithm using the Grid Traverse Algorithm (GTA) and Particle Swarm Optimization (PSO) to forecast the load demand in several short-term scales (from 5 min to 16 h ahead). The tested data were comprised of 80 days of load from a distribution feeder. In the first moment, these data were pre-processed to eliminate excessively deviating samples using a mapping algorithm. Then, a GTA designed with cross-validation was used to narrow the SVR parameters’ search area. After, the PSO searched for the best parameters of the SVR model in the GTA solution space also using cross-validation.
In [
23], three ensemble learning algorithms, namely Random Forest (RF), Gradient Boosted Regression Tree (GBRT) and Adaboost (AR2), were employed to forecast one-hour-ahead electricity load. Ensemble models are those that combine several models, which are trained separately in order to reduce the generalization error [
15]. The main advantage of this type of model is that, on average, they perform at least as well as any of its members and, if the errors of its members are independent, they will perform significantly better. In this work [
23], historical electricity demand from an office building was collected at every 10 s and averaged to a one-hour basis. Different training strategies merging various features such as historical, time index and environmental data were tested to forecast every hour of a single day. The authors found that using the time index data with the most recent temperature measurements and a few past load observations provided the best forecasts in general. The results of this work were compared to the ones in [
24], which generated forecasts for the same building using SVM and three fuzzy rule-based models. It was observed that the AR2 model outperformed the SVM model in terms of MAPE (5.34% vs. 5.82%). Moreover, the other two ensemble models performed better than the fuzzy rule-based models.
In [
25], a short-term load forecasting model based on error correction using dynamic mode decomposition (DMD) was proposed. Using two years of electricity demand data from a city in China, the authors built several load forecasting models including ANN, SVR and ARIMA. They used historical data, namely previous day, same day in the previous week and similar day loads, as inputs. The latter was obtained by grey relational analysis, which is a method that aims to find highly correlated data. With the errors achieved by these models, the DMD was applied to forecast the errors. The DMD is a data-driven method that can extract complex spatiotemporal features from data. Extensive experiments on small and large geographical scales were conducted to evaluate the proposed methodology. Nearly in all cases, combining the DMD for the error correction with the load forecasting model resulted in better results. Additionally, different decomposition techniques including Wavelet Transform were tested and generally had a worse performance. Finally, it was possible to notice a smaller prediction accuracy for the small area forecasts, which was probably caused by its higher load variability. In
Table 1, a summary of previous works can be seen.
Several works have been published in the field of consumption forecasts. Nevertheless, most of the methods are very simple and not accurate and others are very complex and not useful in many applications where computational resources or time are limited. The present paper has the following main contributions:
- -
Propose a consumption forecast method assuring a balance between the accuracy and the computational effort/costs;
- -
Propose a new methodology comprising the forecast of consumption based on historical data and the forecast of the error of the initial forecast. Combining the two forecasts, we can improve the accuracy of the final results.
2. Proposed Methodology
In the present work, the FFNN was selected to develop the proposed methodology. The term neural network comes from the fact that these models were inspired by biological brains in the sense of how they process information. A neural network model is typically composed of nodes (or neurons) that are distributed across different layers, namely input, hidden and output layers. Each node in a layer is linked to the ones in the next by means of a weight parameter that measures the strength of that connection, forming a fully connected network structure that resembles the nervous system. In
Figure 1, a general neural network is illustrated.
The operating principle of neural networks can be described as a sequence of functional transformations [
17]. For a given layer
, where
L is the number of layers, quantity, called the activation value
, can be calculated as a linear combination of inputs
and weights
in the form
where
and
is a parameter known as bias, which is used to adjust the output. The subscripts
i and
j represent the number of nodes or dimension of layers
and
, respectively.
Then, the activation value
is transformed by a nonlinear, differentiable function
that is named activation function as in Equation (3), resulting in the next layer input vector
. For hidden layers, the activation function is a hyperbolic tangent function (tanh) or a rectified linear unit (ReLU), while for the output layer it is the identity function.
Equations (1) and (3) present recursive calculations that constitute a process known as forward propagation [
17]. This name comes from the fact that the information is flowing forward through the network, and this is the reason why this type of model is called FFNN. There are some particularities about these equations that should be mentioned: the first input vector
comes from the features selected from the dataset, while the remaining are the result of the calculations. Additionally, the final result, observed in
, is the output of the model.
The parameter optimization is performed with gradient descent-based calculations (Equation (4)). With this approach, the required partial derivatives are related to the two parameters of an FFNN model: the weights and biases. Appling Equation (4) to the last layer of this model results in Equations (5) and (6).
where
p represents the model parameters,
is the learning rate (LR) and
E(
p) is the loss function in Equation (5), which is the Mean Squared Error (MSE), also called Euclidean or L2 norm. In Equation (5),
xi is the forecasted value,
ti is the target value and
n is the number of points in the dataset. The search for the loss function minimum is commonly performed by computing its gradient (
E(
p)), which is the vector containing the partial derivatives of
E(
p) [
15]. The partial derivative
indicates how the function changes with a small change in one of the parameters. Therefore, the gradient vector points to the direction of the steepest increase in the function. As the learning algorithm goal is to minimize the error, with this approach the parameters can be updated at each iteration
i by going in the opposite direction.
Using the chain rule, one can write these partial derivatives as Equations (8) and (9):
where the dot (.) symbol stands for matrix multiplication and the circle (
) symbol for the Hadamard or element-wise product. At this stage, it is useful to introduce the following notation:
where
is a value known as delta and represents the error that the layer
L−1 sees. Moving on to the layer
L−1, the calculations are as in Equations (11)–(13).
From layer
L−1 to the first layer, it is possible to write the next deltas as Equation (14):
and, therefore, the partial derivatives can be obtained as Equations (15) and (16):
Finally, the parameter updates can be performed using Equations (15) and (16) with Equations (6) and (7), respectively. The process presented above constitutes the backpropagation learning algorithm. With this approach, the algorithm goes through each layer in reverse, measuring the error contribution from each connection by means of the deltas and updating the parameters accordingly [
26]. By computing the gradient in reverse, the backpropagation algorithm avoids unnecessary calculations as it reuses previous ones. This is the major reason for this method’s higher computational effectiveness when compared to numerical methods such as finite differences [
17] and one of the cornerstones to FFNN’s popularity.
To improve the accuracy of the forecast methodology proposed in the present work, the FFNN algorithm is executed two times. First, electric load measurements are used as input for FFNN to obtain the first iteration of the load forecast. Afterwards, comparing the obtained values with the measured values, it is possible to compute the errors of the method. These errors are used as inputs for a second FFNN predictor. Finally, the errors forecasts are merged with the initial loads forecasts to obtain the global loads forecasts. This process is illustrated in
Figure 2.
4. Conclusions
The proposed methodology utilized an FFNN for both initial and error forecasts as the error time series often contains useful information that can be used to improve initial estimations. After a careful state-of-the-art review and an extensive data preliminary analysis, the models were created. A search for the best model inputs and parameter configurations was also conducted.
With regard to the electrical load demand forecasts, the data referred to the measured load demand in an industrial area connected to the medium voltage grid and were sampled every 10 min. The forecasts were made for three time horizons: 10 min, 1 h and 12 h ahead. The results demonstrated that the proposed initial models outperformed the linear regression model for the last two horizons, while, for the first, the results were worse than those achieved with a simple persistence model. By comparing the results for the three time scales, it was verified that the forecast accuracy decreased for longer time horizons, which highlights the higher difficulty to make longer forecasts due to the higher uncertainty of the data. Additionally, for very short time scales, the findings suggest that a simpler model can provide better estimates.
The initial electrical load demand forecasting errors analysis showed that the model might not have extracted all the information from the input data as the error distribution was not centered at zero and there was a correlation between a given error and the previous ones. Indeed, the error forecasting models achieved good forecasting accuracy, with a lower RMSE and MAE than the initial models. Furthermore, by combining the predicted error with the initial forecasts, it was possible to improve the initial results significantly for all time scales. Especially for the 10-min-ahead forecasts, the application of the proposed methodology resulted in a more accurate model than the persistence model.
The proposed methodology, including two forecast steps, based on historical data and errors, achieved accurate results in all time horizons when compared with the baseline method. This is a good indicator of the applicability of the method in real scenarios when short-term forecasts are required. To prove the effectiveness of the method, application in larger datasets considering consumers with different profiles will be tested. Depending on the type of consumers, residential, industrial or service buildings, some adjustments can be included in the method, namely the use of more hidden layers in the FFNN. These hidden layers can increase the complexity of the method as well as the execution time. A balance between the effectiveness and efficiency of the method should be identified according to the specific applications.