1. Introduction
The application of novel machine learning techniques to the prediction problem is a cornerstone for improving the penetration of Distributed Energy Resources (DERs) in modern flexible smart energy systems and grids [
1,
2]. Forecasting is becoming a fundamental building block of any energy management approach [
3,
4], including generation scheduling, planning and management [
5], utility side management [
6] and demand side management [
7]. Goals such as resource conservation, cost minimization, widespread access to energy, market and production planning, and demand management can be achieved only by endowing expert systems with advanced and improved forecasting capabilities. The development of future cooperative Virtual Power Plants (VPPs) [
8] and the integration of Local Energy Markets (LEMs) in energy communities [
9] will rely on good and reliable forecasts, particularly concerning the output of Renewable Energy Sources (RESs). Due to their intermittent and fluctuating nature, RESs’ power output forecasting can be considered as one of the most challenging and up-to-date prediction problems in the energy framework [
10,
11].
For practical reasons, forecasting in the energy context is usually carried out using univariate approaches, especially regarding time series retrieved from single independent physical quantities [
12,
13,
14,
15]. While this is a valuable approach, it does not consider how other data can be incorporated in the prediction scheme, with the drawback of specifically tailoring the method to the specific time series being predicted. The most used models for this purpose are statistical regression [
14,
16] or computational intelligence techniques [
12,
17,
18], based on different mathematical backgrounds. Fuzzy predictors are used in [
19] to predict RESs generation and load data in a microgrid framework, with the advantage of being able to incorporate the uncertainty of the future predictions, but, at the same time, they are stuck with those predicted errors for actual energy management. Evolutionary-genetic algorithms are used in [
20] for short-term wind speed forecasting, with the strict vertical application of refining the model parameters, and do not provide a comprehensive view of the role of those techniques in the prediction realm. Artificial neural networks are used in [
21] for short-term load forecasting in a microgrids environment, employing a very simple multilayer perceptron structure and basic clustering algorithms. Support Vector Machines (SVM) are used in [
2] for solar radiation forecasting, in which a good review of other prediction methods can be found.
In some contexts, several different observations can be drawn from two or more quantities of a time series. These multivariate data can pertain to a particular business sector [
22], e.g., price, revenue, shares [
23]. Such a multivariate approach calls for more complex models to describe the relationships inherent in the time series, and, consequently, more complex predictors for the forecasting problem. In the broad energy context, multivariate prediction approaches using machine learning have been applied to load time series [
24], solar irradiation [
25,
26], wind speed [
27], and RESs [
28,
29]. All of these techniques have the drawback of being specifically tailored to a single optimization due to the specific nature of the time series being forecasted; in the present work, we want to contribute to the provision of a general framework for multivariate prediction optimization.
In this article, the intuition lies in how the information contained in the time series is exploited. The time series samples are combined into a data structure that resembles a sequence of video frames. This sequence, due to its structure, can be sent as input to a Convolutional Neural Network (CNN) [
30]. Thus, the idea is to explore time correlations among different time series related to solar power plant production, giving an analysis of how the sequences can be combined, using observations from different physical phenomena related to the same RES quantity (in our case, the Output Power of the plant). The main contribution of this article is the description of a deep learning scheme to solve the multivariate energy time series prediction problem. In this context, the main obstacles to a satisfactory prediction result are not to be searched for in the dimensionality or the structure of the convolutional operator, but rather in the overall construction of the layers’ architecture in relation with the time series’ properties. The actual prediction is carried out by the Long Short-Term Memory (LSTM) network, which is a special kind of Recurrent Neural Network (RNN) able to efficiently manage long-term dependencies through the ‘gates’ located in each cell.
Considering the broader context of data analysis in general, and without limiting the literature to multivariate prediction problems, there are indeed several approaches relying on the combination of CNN and LSTM. These methods do not follow the multivariate paradigm but are worth mentioning because of the similarity in their network structure compared to the present work. In [
31], the authors combine CNN, LSTM and Deep Neural Networks (DNNs) into one unified architecture, validating the performance on a variety of vocabulary tasks. CNNs mainly stem from image and video analysis; in fact, authors in [
32,
33] combine them with LSTM networks to recognize human activity in video sequences. The CNN-LSTM approach is also used in biometric applications for both recognition [
34] and anti-spoofing methods [
35]. Instead, regarding univariate forecasting problems, CNN and LSTM are recently combined for stock-market analysis [
36] and environmental time series prediction [
37]. Relative to the relation between the number of time series and the CNN’s ability to process low-dimension frames, there is evidence that the convolutional approach in time series analysis performs well regardless of the absolute value of the size of the filter [
38,
39]. Almost counter-intuitively, it has been suggested [
40] that, in certain situations, the smaller the single-layer filter size, the better the overall CNN accuracy. In fact, the main obstacles to a satisfactory prediction result are not to be looked for in the dimensionality or the structure of the convolutional operator, but in the overall construction of the layers’ architecture in relation with the time series, which is, indeed, the point of this work.
To the best of our knowledge, the pure multivariate prediction approach is scarcely studied in the literature. Only recently, some approaches using a combination of CNN and LSTM layers have been proposed, such as the one in [
41]. In fact, the authors firstly use a convolutional network to extract the interrelationship of different variables at the same time step (as well as by regularizing the dimension of the input structure). Then, a dual-attentioning mechanism combined with LSTM is used. While there are some similarities in the combination of the CNN and LSTM in the deep structure, the main difference compared to our approach is that those authors apply the CNN to the input frame, constructed as a combination of different samples taken from different sequences, at the same instant in time. Instead, we not only use the CNN to extract the spatial correlations among different sequences, but also to take into account the correlation in time among those sequences, similarly to a multivariate intrinsic embedding mechanism. Indeed, we construct the input frame in a different fashion, by feeding the network with different samples taken from different sequences at several times, as described more clearly in
Section 2.
In a similar fashion, there are some works which use a multivariate scheme based on LSTM regarding vertical applications in specific sectors: cybersecurity [
42], time series classification [
43], industrial anomaly detection [
44], point process analysis [
45]. By following the approach proposed in this work, it is possible to view the novel adopted scheme as a superposition of neural layers, forming a stacked deep network. Each layer feeds the output to the input of the sequent layer. This increase in the dimension of the NN towards a deeper architecture has been linked to a positive correlation with prediction accuracy [
46].
The paper is organized as follows: in
Section 2, the novel 2-D convolutional embedding process is explained, in
Section 3, the architecture of the forecasting system is described, experiments are reported in
Section 4, and conclusions are drawn in
Section 5.
2. 2-D Convolutional Embedding
It is widely known that LSTM networks are a staple technique for prediction purposes. In this work, our aim is to enhance the LSTM structure to achieve a deep NN model in which feature extraction (i.e., data representation) can be obtained in a robust and automatic fashion, considering a higher abstraction level [
47].
In most use cases, known, past samples of the time series subject to forecasting are input into the LSTM model, achieving an univariate prediction method. In the present article, the aim is to extend this approach to a multivariate framework: the predictor’s input will be built using several time series. We will do this by employing a bidimensional CNN layer which has the role of filtering all the known samples of the available time series to acquire the expanded set of generalized features. These features are then flattened and used as a multivariate (multivalued) time series to be processed by the actual LSTM.
Let
, with
, be the scalar time series to be predicted. Let
,
, be further
scalar time series correlated with
, i.e., the time series
contain additional information that can be used for forecasting
. Our aim is to predict the sample
, where
k represents the prediction distance, considering that all samples prior to time step
n (including the latter) are known. The input frames
are sent into the CNN layer as if they were a sequence of images, where the single frame is given by
which is a one-channel
frame, where
M represents the number of correlated time series and
D the number of past samples (for each time series), both used to forecast
. This operation represents the main novelty of the proposed approach. In fact, the input frame structure allows the network to efficiently recognize the temporal correlation among the considered sequences. Furthermore, in the training phase, the back-propagation learning algorithm is able to infer different temporal correlation structures through the use of several convolutional filters of size
H. The result is stored in data structures called feature maps, as illustrated in
Figure 1.
Usually, in the context of time series, forecasting is carried out using an embedding procedure as a preprocessing step in order to select the meaningful past samples of the time series through statistical or heuristic techniques [
48]. In an univariate context, the estimated output is obtained as follows
where
T is called ‘time lag’ and
D ‘embedding dimension’ [
49].
The aforementioned inference process of the data structures in the input matrix
can be considered as an ‘extended version’ of the classic embedding. This is because, in the proposed approach, we use different time series and autonomously select the past samples. The sample
is estimated by the deep neural network through a general non-linear, dynamical (recurrent) model
as
where the two embedding approaches are the same when
and
.
4. Experimental Results
We evaluate the performance of the proposed forecasting approach considering a real photovoltaic power production plant, identified geographically by the following coordinates: N, W, elevation 1855 m. The photovoltaic plant is named ‘NWTC’ and is located in Denver, CO, USA. Irradiance data, together with those relating to other meteorological factors, are retrieved through the Measurement and Instrumentation Data Center (MIDC) database. The output power to be predicted, which is denoted as in kW, is calculated from irradiance using a system balance of and by applying an inverter curve with MPPT.
In addition to the sequence to be predicted, we also consider time series relative to temperatures, wind speed and wind direction, turbulence, humidity and pressure levels. Some of them are taken both at ground level (2 m above ground level) and at a higher altitude (80 m above ground level). The used sequences and their attributes are summarized in
Table 1. The used sequences and their attributes are summarized in
Table 1. All time series are collected in the same plant and sampled every hour (i.e., 24 samples a day). Each time series refers to the years 2017 and 2018 and is normalized between −1 and 1, for the sake of regularization, before applying the learning procedures. The extremes of normalization are given by the physical functioning of the plants (also reported in
Table 1).
To assess the validity of the proposed approach, we have carried out some experiments with classic and widely used benchmark architectures. In particular, we used the standard LSTM as the baseline model. Given that the goal of this work is to provide a reliable and sound solution to the multivariate prediction problem, and knowing the vast superiority of the LSTM approach in time series forecasting with respect to other algorithms (e.g., shallow predictors), it is sufficient to prove that our algorithm outperforms a multivariate implementation of the LSTM. Here, we will briefly indicate the methods considered for the tests:
V-LSTM: this is the elementary multivariate implementation of the classic LSTM model, in which the multivariate sequences are fed directly to the LSTM layer, with a vector input of dimension equal to the number of considered sequences. When the number of adopted sequences is reduced to 1 (first row of the numerical results reported in the tables), this case coincides with the elementary univariate LSTM;
2D-CNN: this is the proposed method, where the convolutional layer is stacked before the LSTM predictor, as previously described. Similarly, for the 2D-CNN, the univariate case is considered as a benchmarking level to evaluate the multivariate performance.
A training set of 3 months (i.e., 90 consecutive days) is used for the experiments and it is associated with test sets whose lengths are 3 days (i.e.,
) and 7 days (i.e.,
) after the last available sample of the training set, respectively (the training set contains the known samples that are used to forecast the future ones). In all of the experiments, we set the embedding dimension to
, the number of filters in the CNN layer to
and the number of hidden units in the LSTM layer to
. These numbers were determined by using a grid search procedure on the training data in order to avoid overfitting. We used the ADAM algorithm [
51] to train the deep neural network; the initial learning rate was set to
with 50% reduction every 30 iterations, the gradient decay factor was set to 0.9 and the maximum number of iterations to 300. All of the experiments were performed using Matlab
® R2019a on a machine provided with an Intel
® Core™ i7-3770K 64-bit CPU at 3.50 GHZ and with 32 GB of RAM.
For the sake of illustration, we considered several different situations to incorporate all seasonal effects in the analysis, relative to eight months of the year 2018: each test is composed of a single day, either the 15th day of the month (for March, June, September and December) or the 30th (for January, April, July and October), and by the successive 2 or 6 days after it, resulting in the 3-day and 7-day test sets introduced earlier. Ordinarily, in real-world applications, it is true that NN algorithms that are seasonally dependent must be retrained when the environmental changes affect the performance. For our solution, since training times are much smaller than the sampling intervals of the time series, the training of the network can be done at every iteration. For this reason, we care about the difference in seasonal performance and we test various times of the year, retraining the network each time, to ensure the stability and consistency of the results. In order to evaluate the possible improvements brought about by the proposed multivariate forecasting approach, we considered several options by testing different combinations of data to give an exhaustive study on how the relationship between data influences the forecasting procedure. Every network is trained on the same dataset considering 10 different runs. For each run, a different (random) initialization of the network parameters is performed. The performance is then evaluated considering the average RMSE in the 10 training sessions. The general optimization procedure is summarized in Algorithm 1.
The numerical results are reported in the
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9, which are one per considered month of 2018. This was done to explicitly evaluate the seasonality effect in different periods with different irradiation/meteorological conditions, along with the comparison between the 3-day and 7-day test sets. Additionally, as stated previously, by looking at the first rows of each table, it is possible to gain an insight into how well the univariate approach is performing; this makes the benchmarking more valuable and puts all the numerical results in the right perspective. As the reader may notice, we used quite a lot of combinations to analyze the composition of the input data, to better understand the information carried by each sequence in relation with the others. In particular, we tested the output power by itself and by coupling it with each and every other sequence. Then, we also tested the performance when constructing the input data frame using the sequences grouped in relation with their physical meaning (all the time series relative to temperature, all relative to wind, all relative to the same altitude). Specifically, we reported the RMSE for these combinations of more than two sequences incorporated in the input data frame. A benchmarking test has also been carried out using all the considered sequences.
Algorithm 1: Pseudocode of the validation scheme used for all the considered DNN architectures |
- Input:
observed time series of 2017 and 2018; a generic model N to be trained and optimized, liked with the architectures (i.e., either V-LSTM or 2D-CNN). For brevity, we refer solely to the multivariate case. The algorithm is the same for the univariate case, holding the same steps but only considering . - 1:
data preparation: normalize time series, fill data, general cleaning, etc. - 2:
experiment setup: define a number of sets of hyperparameters of the training algorithms (i.e., regularization, learning rate, number of epochs, etc.) and the architecture (number of states, dimension of filters, etc.). Then, characterize a grid search space , where each point is a set combination taken from the parameters just described. Perform experiments with variation in input data and conditions of the tests. If we denote one of the multivariate approaches with the superscript ‘s’, we can use superscript ‘m’ to denote the test month (January 2018, March 2018, etc.), and superscript ‘d’ to identify the chosen test set length (3 or 7 days). - 3:
loop {for each s, m, d} - 4:
training/test setup: for each sequence, organize the training and test sets, respectively identified with and ; - 5:
training/validation setup: extract a particular validation set from to be able to evaluate the performance during the training phase. The rest of the data for actual model training. We obtain the validation set by extracting some samples from it. In detail, we use the sequence of the latest 3 or 7 days as the target and the preceding samples as inputs. In both test cases, the training will end one day before. - 6:
end loop - 7:
loop {for each point } - 8:
loop {for each s, m, d} - 9:
network training: actual training of the chosen model, carried out by employing with the set of hyperparameters chosen in p. The resulting trained network is denoted as ; - 10:
network validation: validate the network training by assessing the performance of using , let be the accuracy (RMSE) of the DNN. - 11:
end loop - 12:
model performance: evaluate the model performance, averaging the errors over all trained networks on the different dataset , let be the mean performance indicator . - 13:
end loop - 14:
network optimization: let be the resulting optimal set of hyperparameters to be chosen when doing the actual training in the next final stage. - 15:
loop {for each s, m, d} - 16:
final training: let be the resulting optimal network model obtained via training with the setup and the complete training set ; - 17:
final inference: compute the final network error for the specific test set resulted using . - 18:
end loop - Output:
|
In the January results reported in
Table 2, the RMSE is quite stable and there are no large variations in terms of performance. The univariate benchmark is satisfactory and accurate, while the multivariate approach and the proposed 2D-CNN model are better or on par. These considerations can be linked to the stability of the January set in terms of irradiation, given that the irradiation itself is quite low in winter.
For the results of mid-March, which can be seen in
Table 3, similar considerations can be drawn relative to the differences between V-LSTM and 2D-CNN approaches. Additionally, the accuracy of the multivariate solution is considerably better than the univariate one.
In the April tests shown in
Table 4, since the irradiation starts to grow for the increasing number of sun hours in spring in the Boreal Hemisphere, the accuracy starts to degrade a bit, still retaining the balance in terms of RMSE that was highlighted in the previous comments. This consistency is expected and proves that our solution is robust.
The June results illustrated in
Table 5 are very interesting, since they prove the vast superiority of the proposed approach with respect to both the univariate and multivariate LSTM adoption. In fact, when every time series is considered in a fully multivariate solution, the RMSE of 2D-CNN is far below almost all the results of the V-LSTM.
The representative behavior of 2D-CNN in June is confirmed by the figures shown for both the 3-day and 7-day tests: for 3-day tests, the univariate case is reported in
Figure 3, the best case in
Figure 4, and the multivariate case using all data in
Figure 5; accordingly, for 7-day tests, the univariate case is shown in
Figure 6, the best one in
Figure 7, and the fully multivariate case in
Figure 8. Once again, we remark that some predictions can be influenced by the meteorological variations in the training set, which can cause a worsening of the prediction for some hours continuously (e.g., the hours of the first day in
Figure 5). Furthermore, it has to be noted that the 7-day test’s accuracy is worse than the 3-day one, while remaining quite acceptable and stable, given the different meteorological implications of predicting longer sequences.
The results for the July tests, which are reported in
Table 6, show a similar behavior given the small difference in terms of irradiation and physical phenomena variations with respect to June. Additionally, the increasing stability in weather conditions gives a prominent effect that can be seen in the similarities of the RMSE values in all cases. The results for the September tests in
Table 7 are definitely on par with the aforementioned cases. Here, the difference between the V-LSTM and 2D-CNN is evident, especially for the 3-day test set. The same analysis can be drawn for the October results shown in
Table 8. The only noticeable difference is the general higher error due to the worsening of weather conditions, resulting in a higher variability in the associated physical quantities.
Lastly, some interesting remarks can be drawn from the analysis of the December results reported in
Table 9. In fact, along a similar line with the analysis of the June test set, the December case proves the high stability and accuracy of our proposed method. The V-LSTM model and both the univariate methods are always worse than the proposed 2D-CNN method applied on a multivariate dataset. Furthermore, we show, for December tests, the graphical results of 2D-CNN for both the 3-day and 7-day tests. For 3-day tests, the univariate case is reported in
Figure 9, while the best case coincides, in this case, with the multivariate one using all data in
Figure 10; for 7-day tests, the univariate case is shown in
Figure 11, and the best one is associated with the fully multivariate case in
Figure 12. These figures are in accordance with the numerical results and they highlight that the variability is scarce in all cases. While the error is still low, due to the lowest irradiance for the winter season, the accuracy of the multivariate 2D-CNN is consistently better.
Summarizing the results by a general outlook, the most significant improvement with respect to the univariate approach is met when considering the time series relative to wind and the ones relative to a higher altitude. Overall, this is in accordance with [
27], because irradiance is strictly linked with cloud movement, which is predictable using wind information at high altitudes. In fact, as in December, the variation in weather is more frequent, the prediction performance is evidently improved by the multivariate approach. This improvement is less evident in more stable months, as in June.
As expected, the results of the 7-day case are slightly worse than the ones of the 3-day case. While this is obviously due to the longer forecasting horizon, the difference is not so large; the proposed method can be valued as stable and consistent for the tested methodologies. It is also evident that the proposed convolutive structure gives advantages over the pure LSTM, despite the low dimensionality of the frame processed by the convolutional filter. In fact, all the numerical results sport the same trend, despite the heavy changes during different seasons.
An additional remark can be drawn from the visual analysis of the results; there is quite a difference in the absolute value of the analyzed time series. This is the result of a different general dynamic in the physical system, which is quite complicated. For this matter, the complex multivariate analysis, along with the higher variability, impose that changes should be carried out in the training phase, and thus the different dynamics are reflected in the network tuning. In fact, we suggest that the latter is retrained when those drastic changes occur in actual operational applications.