Short-Term Load Forecasting Based on VMD and Deep TCN-Based Hybrid Model with Self-Attention Mechanism

Xiong, Qingliang; Liu, Mingping; Li, Yuqin; Zheng, Chaodan; Deng, Suhui

doi:10.3390/app132212479

Open AccessArticle

Short-Term Load Forecasting Based on VMD and Deep TCN-Based Hybrid Model with Self-Attention Mechanism

¹

School of Information Engineering, Nanchang University, Nanchang 330031, China

²

Jiangxi Provincial Key Laboratory of Interdisciplinary Science, Nanchang University, Nanchang 330031, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(22), 12479; https://doi.org/10.3390/app132212479

Submission received: 23 September 2023 / Revised: 3 November 2023 / Accepted: 15 November 2023 / Published: 18 November 2023

(This article belongs to the Section Energy Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

:

Due to difficulties with electric energy storage, balancing the supply and demand of the power grid is crucial for the stable operation of power systems. Short-term load forecasting can provide an early warning of excessive power consumption for utilities by formulating the generation, transmission and distribution of electric energy in advance. However, the nonlinear patterns and dynamics of load data still make accurate load forecasting a challenging task. To address this issue, a deep temporal convolutional network (TCN)-based hybrid model combined with variational mode decomposition (VMD) and self-attention mechanism (SAM) is proposed in this study. Firstly, VMD is used to decompose the original load data into a series of intrinsic mode components that are used to reconstruct a feature matrix combined with other external factors. Secondly, a three-layer convolutional neural network is used as a deep network to extract in-depth features between adjacent time points from the feature matrix, and then the output matrix captures the long-term temporal dependencies using the TCN. Thirdly, long short-term memory (LSTM) is utilized to enhance the extraction of temporal features, and the correlation weights of spatiotemporal features are future-adjusted dynamically using SAM to retain important features during the model training. Finally, the load forecasting results can be obtained from the fully connected layer. The effectiveness and generalization of the proposed model were validated on two real-world public datasets, ISO-NE and GEFCom2012. Experimental results indicate that the proposed model significantly improves the prediction accuracy in terms of evaluation metrics, compared with other contrast models.

Keywords:

short-term load forecasting; variational modal decomposition; temporal convolutional network; long short-term memory; self-attention mechanism

1. Introduction

Power load forecasting is the foundation of the operation and planning of a power system. The generation, transmission, distribution and consumption of electricity occur almost simultaneously. It is well known that massive storage of electric power is still difficult for current technologies. Thus, power supply sides and demand sides should be balanced dynamically to ensure the safety and reliability of the power grid [1]. However, the high penetration of renewable energy sources and electric devices in modern power systems result in significant uncertainties in the load demand. Motivated by this standpoint, power load forecasting has become increasingly important in modern power systems.

Power load forecasting involves forecasting the load demand for a time span in the future, using historical load data and other external factors such as meteorological conditions, seasonal effects, economic factors and social activities. Accurate load forecasting can help power companies to make efficient decisions, avoid resource waste and improve grid stability and reliability [2]. Therefore, load forecasting has attracted increasing attention from researchers all over the world. Short-term load forecasting (STLF) is an approach that predicts power load consumption within an interval of an hour to a week, based on historical load data and other external factors [3]. STLF provides a basis for generating units to schedule their start-up and shut-down times, prepares the rotating reserve and carries out in-depth analyses of the limitations in the transmission system [4]. It has been demonstrated that a 1% increase in STLF error increases the operating cost by GBP 17.7 million [5] or EUR 4.55 to 9.1 million [6]. In addition, the use of distributed generation and intelligent devices generates load data with more nonlinear patterns and dynamics [7]. Furthermore, the scale of the load data also sharply increases with the widespread use of smart meters [8]. Therefore, an accurate STLF model is a premise for the safe, stable and economic operation of an electric grid.

In the early years, a variety of prediction models were developed based on statistical methods and artificial intelligence methods. The statistical methods usually include autoregressive moving average [9], autoregressive integrated moving average [10], exponential smoothing [11], etc. Although these methods are easy to implement without additional feature inputs, they are not suitable for handling load data with nonlinear patterns and dynamics [12,13]. In recent years, artificial intelligence (AI) methods have been widely used in power load forecasting, along with progressive computer technologies. As a typical AI algorithm, artificial neural networks (ANNs) have strong nonlinear modeling capabilities, and can obtain any nonlinear function without having information of the relationships between the training model and data in advance [14]. However, ANNs set hyperparameters manually, and are thus unable to effectively tackle large-scale load data. Thus, it is difficult to obtain greater prediction accuracies of load data in modern power systems [15]. Moreover, the output of the training model depends not only on the current input, but also on the previous input [16]. Therefore, traditional artificial intelligence methods cannot fully extract the features of time series to achieve greater prediction accuracies.

Recently, deep learning methods have become increasingly popular in the field of time series. Long short-term memory (LSTM) developed from a recurrent neural network (RNN) can deal with long-term dependencies of time series, avoiding gradient vanishing and exploding of the RNN [17]. A convolutional neural network (CNN) can effectively capture the local relationships among adjacent time points [18] and is also widely used in many fields, such as image recognition [19], renewable energy forecasting [20], as well as load forecasting [21]. However, a large number of hyperparameters of CNN can easily result in overfitting of the training model. It should be pointed out that one-dimensional (1D) CNNs can overcome the overfitting problem due to there being fewer learnable parameters. Furthermore, the temporal convolutional network (TCN) developed from CNN has a unique dilated convolutional module to obtain a large receptive field with fewer layers [22], which is conducive to extracting nonlinear features of time series. Although the TCN has better predictive performance in time series compared with LSTM and CNN [23,24], it cannot learn the dependency of long-range positions inside the sequence and extract internal correlations of input. In addition, self-attention mechanism (SAM) focuses on capturing internal correlations of input features and can effectively deal with long time series, which helps the training model to identify key features. For instance, a novel model based on bidirectional LSTM, XGBoost and SAM was proposed for power load forecasting in [25]. An improved LSTM optimized with SAM was used to predict the concentration of air pollutants in [26]. A new model based on SAM and multi-task learning was developed to predict ultra-short-term photovoltaic power generation in [27]. These experimental results mentioned above indicate that SAM has the capability to select key information from complex features and avoid irrelevant information during model training.

Although these deep learning methods improve the prediction accuracy, single models cannot fully extract the input features, especially in-depth features. Therefore, scholars are increasingly focusing on hybrid models, which combine the advantages of each single model [28]. For example, a hybrid approach combining singular spectrum analysis-based decomposition and ANN for day-ahead hourly load forecasting was proposed in [29]. A TCN–LSTM hybrid model was used to forecast weather on meteorological data in [30]. A day-ahead load forecasting model based on CNN and TCN was conducted to achieve superior performance in [31]. A hybrid model based on TCN combined with attention mechanism was proposed to fully exploit the nonlinear relationships between load data and external factors in [32]. Although the above hybrid models improved the prediction performances in various aspects, the raw load data were directly fed into the training models without being smoothed and denoised in advance, which cannot be conducive to further enhancing the accuracy of STLF.

In order to avoid the effects of nonlinear patterns and dynamics of load data on the forecasting accuracy, decomposition methods have been used to decompose the load data into multiple smoothing and stable sub-series at different frequency bands. For instance, Wang et al. [33] decomposed the load data into a series of sub-signals using wavelet decomposition (WD). The sub-signals were predicted by different models according to their frequencies. Liang et al. [34] used empirical mode decomposition (EMD) to decompose load signals into multiple intrinsic mode functions (IMFs) to weaken the nonlinearity and dynamics of the load data. Zhu et al. [35] proposed a hybrid model for carbon price forecasting based on EMD and evolutionary least squares to support vector regression. An ensemble empirical mode decomposition (EEMD) was used to decompose load data into different frequency components in [36]. The low- and high-frequency components were predicted via multivariable linear regression LSTM. However, WD cannot meet the requirements of complex and dynamic load series in time-frequency analysis [33]. In addition, EMD and EEMD have the disadvantages of mode aliasing and end effect, and lack the support of mathematical theory [37]. Then, a variational mode decomposition (VMD) algorithm with a solid mathematical model was used as an alternative to overcome their drawbacks mentioned above. VMD has good data decomposition accuracy, and obtains a group of stable sub-signals without noise interference [38]. A hybrid model based on VMD and LSTM for short-term load forecasting was proposed in [39]. A CNN and TCN hybrid model combined with VMD-based data processing was developed to predict power load forecasting in [40]. A hybrid prediction model based on VMD, TCN and an error correction strategy for electricity load forecasting was presented in [41]. A hybrid model based on TCN and VMD for short-term wind power forecasting was proposed in [42]. A hybrid model based on GRU and TCN combined with VMD decomposition was developed to predict load forecasting in [43].

Although these studies mentioned above verified that deep learning methods based on VMD techniques can improve the prediction accuracy of load demand, some fields still require further study and improvement. For example, many studies often adopted deep learning models with shallow networks to improve the training efficiency, and they rarely considered using the models with deep networks to learn full features from the decomposed sub-series. Moreover, most studies scarcely focused on the correlations of sub-series and external factors, such as temperature, seasons, holidays, etc. Deep learning models with multiple hidden layers can indeed improve their feature extraction capabilities, but it is difficult to train these models with some sub-series and external factors. Therefore, it is necessary to construct a deep learning model with shallow networks to extract in-depth features from several sub-series and external factors. As a result, a novel STLF model based on VMD and a deep TCN-based hybrid method with SAM is proposed to fully capture the in-depth features of multiple sub-series and external factors. The main advantages of this research are as follows:

(1): The raw load data are decomposed into multiple stable sub-series using VMD. Furthermore, the external factors are also considered as input variables and reconstructed as a feature matrix, along with the sub-series for the training model.
(2): A three-layer 1D-CNN network is constructed as a deep network to eliminate overfitting of the training model due to fewer hyperparameters, and reshapes the extracted features into time series for the TCN module. The TCN extracts the long-term temporal dependencies of the input matrix. Moreover, LSTM is used to further enhance the extraction of temporal features.
(3): SAM can amplify important information, and then weaken irrelevant information of the feature matrix. Thus, SAM dynamically adjusts the correlation weight of complex features to obtain important formation from the input data.
(4): Compared with traditional models or other benchmarking models, the novel hybrid model performs more effectively and achieves greater prediction accuracies.

The rest of this study is organized as follows. Section 2 introduces the basic methodologies of VMD, 1D-CNN, TCN, LSTM, SAM, as well as the framework of the proposed hybrid model. Section 3 describes the data preprocessing, evaluation criteria and experimental analysis. Finally, conclusion and future research are drawn in Section 4.

2. Methodologies

In this section, an STLF integration framework will be introduced in detail, including VMD, TCN, 1D-CNN, LSTM and SAM. The following sections briefly describe each of the modules used in this study. Finally, the architecture of the proposed hybrid model will be presented, and its main hyperparameters are also listed.

2.1. VMD

VMD is an adaptive and fully non-recursive method with modal change and signal processing [44], which decomposes time series data into multiple IMFs. It can not only retain the inherent information of original signals, but also avoids the overlap of variable information. Moreover, VMD has superior performance in sampling and denoising compared to WD, EMD and EEMD [45]. The process of decomposition using VMD can be described as follows:

Step 1: The analytic signal

f (t)

can be transformed by the Hilbert theory into a unilateral frequency spectrum for a given mode, as follows:

H i l b e r t [f (t)] = \frac{1}{π} \int_{- \infty}^{+ \infty} \frac{f (v)}{t - v} d v

(1)

Step 2: Each mode is added by an exponential term, and its frequency spectrum is shifted to the baseband as follows:

[(δ (t) + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t}

(2)

where

δ (t)

is the Dirac distribution,

u_{k}

is the

k

th mode,

ω

is the angular frequency and j is an imaginary unit.

Step 3: The Gaussian smoothness, i.e., the square norm of the gradient, is used to calculate the bandwidth of the signal. Then, the constrained variational expression can be written as follows:

\begin{matrix} m i n \\ \{u_{k}\}, \{ω_{k}\} \end{matrix} \{\sum_{k} {‖\partial_{t} [(δ_{t} + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t}‖}_{2}^{2}\}, s_{.} t_{.} \sum_{k} u_{k} = f

(3)

where

ω_{k}

is the center frequency of the

k

th mode.

Step 4: Lagrange multiplier theory is an effective method to enforce constraints. For the convenience of analysis, the constrained variational problem needs to be transformed into an unconstrained one, as shown below:

L (\{u_{k}\}, \{ω_{k}\}, λ) ≔ α \sum_{k} {‖ \partial_{t} [(δ_{t} + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t} ‖}_{2}^{2} + {‖ f (t) - \sum_{k} u_{k} (t) ‖}_{2}^{2} + 〈 λ (t), f (t) - \sum_{k} u_{k} (t) 〉

(4)

where α is a quadratic penalty factor and λ is the Lagrange multiplier.

Step 5: It is easy to achieve the saddle point of Equation (4) using the alternate direction method of multipliers to further achieve the frequency domain update of each mode. Thus, the expressions of modal component and center frequency are given as follows:

{\hat{u}}_{k}^{n + 1} (ω) = \frac{\hat{f} (ω) - \sum_{i \neq k} {\hat{u}}_{i} (ω) + \frac{\hat{λ} (ω)}{2}}{1 + 2 α {(ω - ω_{k})}^{2}}

(5)

ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω | {\hat{u}}_{k} (ω) |^{2} d ω}{\int_{0}^{\infty} | {\hat{u}}_{k} (ω) |^{2} d ω}

(6)

where

n

represents the number of iterations. Additionally, Lagrange multiplier λ should be satisfied with the following expression:

{\hat{λ}}^{n + 1} (ω) \leftarrow {\hat{λ}}^{n} (ω) + τ (\hat{f} (ω) - \sum_{k} {\hat{u}}_{k}^{n + 1} (ω))

(7)

Step 6: Solve the saddle point of Equation (2) repeatedly until the modal component satisfies the following convergence criterion:

\sum_{k} {‖{\hat{u}}_{k}^{n + 1} - u_{k}^{n}‖}_{2}^{2} ∕ {‖u_{k}^{n}‖}_{2}^{2} < ε

(8)

where

ε

is the convergence tolerance that determines the accuracy and iteration times.

2.2. 1D-CNN

The 1D-CNN is often used in signal processing and time series models [46]. It has the ability of translation and scaling due to the local links and weight sharing. Compared with multi-dimensional CNNs, the 1D-CNN can change the number of channels with retaining the feature size to realize dimensional reduction. Furthermore, it deepens the network structure and introduces more nonlinear calculations without increasing the receptive field. The 1D-CNN is calculated as follows:

y_{t} = \sum_{k = 1}^{K} Z_{k} X_{t - k + 1}

(9)

where

Z_{k}

is the convolution kernel and

X_{t - k + 1}

is the time series.

2.3. TCN

The TCN has been developed on the basis of the 1D-CNN, and has shown excellent performance in many fields of time series processing [47]. It includes causal convolution, dilated convolution and residual block, features which are briefly introduced in the following.

2.3.1. Causal Convolution

Causal convolution has the capability to overcome the disclosure of future information. It means that an output

y_{t}

at moment

t

only depends on the input at previous moments, i.e., (

x_{t}

,

x_{t - 1}

,

x_{t - 2}

,

x_{t - 3}

, …). This just reflects the strong causal relationship of time series [48]. The calculation process of causal convolution is presented in Figure 1.

2.3.2. Dilated Convolution

The expansion of the receptive field leads to an increase in the hyperparameters, and then cause gradient vanishing and exploding. Dilated convolution can address this problem [49]. According to Figure 2, the formula for dilated convolution is listed in the following.

For an input series X = [

x_{0}, \dots x_{N}

] and a filter f with size k, the dilated convolution operation F of the sequence element N can be written as follows:

F (N) = (X_{* d} f) (N) = \sum_{i = 0}^{k - 1} f (i) \times X_{N - d \cdot i}

(10)

where k is the filter size, d is the dilation factor and

d \cdot i

indicates the orientation of the convolution. An increase in the dilation factor d enables the expansion of the receptive field without increasing the computational cost, which significantly improves the training efficiency.

2.3.3. Residual Block

Figure 3 illustrates the structure of residual block [50]. It is obvious that the residual block includes two layers of dilated causal convolution and nonlinear units. Thus, the rectified linear unit (ReLU) is used as an activation function, and regularizes after each dilated convolution. A

1 \times 1

convolution connects back to the input to make the input and output compatible. This suggests that the stacked residual blocks not only improve the training efficiency of the network, but also avoid gradient vanishing. Moreover, it can continuously update the residual characteristics to improve the transmission efficiency of the feature information.

2.4. LSTM

LSTM has superior performance in extracting the long-term dependencies of historical and future information for sequence data [51]. Figure 4 presents the structure of LSTM, in which there are three gates, including input gate

i_{t}

, forget gate

f_{t}

and output gate

o_{t}

. The unique gates system is introduced in the LSTM model to control the information flow. Firstly, the input gate

i_{t}

is used to decide how much the memory cell state will be updated by the block input. Secondly, what information from the previous cell state should be forgotten is decided by the forget gate

f_{t}

. Finally, the output gate

o_{t}

controls which part of the current memory cell state should be output.

The formulas of the control parameters can be written as follows:

σ (x) = \frac{1}{1 + e^{- x}}

(11)

f_{t} = σ (W_{f} \times [h_{t - 1}, x_{t}] + b_{f})

(12)

i_{t} = σ (W_{i} \times [h_{t - 1}, x_{t}] + b_{i})

(13)

\tilde{c} = t a n h (W_{c} \times [h_{t - 1}, x_{t}] + b_{c})

(14)

t a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(15)

c_{t} = f_{t} c_{t - 1} + i_{t} \tilde{c}

(16)

o_{t} = σ (W_{o} \times [h_{t - 1}, x_{t}] + b_{o})

(17)

h_{t} = o_{t} \times t a n h (c_{t})

(18)

where

x

is the input variable,

x_{t}

and

h_{t}

are the input and output at time step t, respectively,

W

and

b

are weight matrices and bias matrices, respectively,

σ (\cdot)

and

t a n h (\cdot)

are two activation functions for the update of cell state and the selective output, respectively, and

c_{t}

and

\tilde{c}

denote the current values and new candidate values for the cell state, respectively.

2.5. SAM

SAM pays attention to the internal correlations of input features and assigns different weights according to their importance [52]. Firstly, the input sequence X is linearly transformed into the query (Q), key (K) and value (V) using three different weight matrices

W^{Q}

,

W^{K}

and

W^{V}

, respectively. Secondly, the similarity between Q and K can be calculated, and is further normalized by a softmax function to achieve the self-attention matrix (W). Finally, the output matrix H can be obtained by multiplying the matrix W with V. SAM can dynamically adjust the weights of different features and obtain the long-range dependencies of the sequences. Figure 5 depicts the structure of SAM, and the process of SAM is formulated as follows:

\{\begin{matrix} Q = W^{Q} X \\ K = W^{K} X \\ V = W^{V} X \end{matrix}

(19)

W = S o f t m a x (\frac{Q K^{T}}{\sqrt{D_{K}}})

(20)

H = A t t e n t i o n (Q, K, V) = W V = V softmax (\frac{K^{T} Q}{\sqrt{D_{K}}})

(21)

where

D_{K}

is the dimension of key K and Softmax (·) is the function of normalization by column.

2.6. Proposed Model

The framework of the proposed model is illustrated in Figure 6, and mainly includes three parts, i.e., feature engineering, feature extraction and load forecasting. In the feature engineering stage, the load data are decomposed into multiple sub-series by VMD. A new feature matrix is constructed with these sub-series and external factors. In the feature extraction stage, the 1D-CNN network is used as a deep network to extract in-depth spatial features, and the TCN extracts the long-term temporal dependencies of the input matrix. Furthermore, the temporal features hidden in load series are further extracted using LSTM. Then, SAM can dynamically adjust the weight of spatiotemporal features to obtain an important feature matrix. Finally, load forecasting can be obtained through the fully connected layer. Table 1 lists the hyperparameters of each algorithm used in the proposed model. Moreover, the steps of the proposed model are described in the following:

Step 1: VMD is employed to decompose the raw load data into eight IMFs. These IMFs are normalized along with temperature, while seasons, holidays and weekends are processed by using a one-hot encoder. The feature matrix can be reconstructed by combining IMFs and external factors in parallel.

Step 2: The three-layer 1D-CNN is used as a deep network to extract deep spatial features between adjacent time points, and then the TCN is utilized to globally capture temporal features from the load data.

Step 3: LSTM can further enhance the extraction of long-term dependencies. Furthermore, the SAM dynamically adjusts the correlations of different features, strengthens long-range dependencies and obtains the important spatiotemporal features.

Step 4: The feature matrix processed from SAM is reshaped into time series that are used to predict load data through the fully connected layer.

Table 1. The hyperparameters of each algorithm used in the proposed model.

Layer	Dropout	Convolutional Kernel	Dilation	Output Shape
1st 1D-CNN	-	1	-	[128, 72, 17]
2st 1D-CNN	-	4	-	[128, 84, 14]
3st 1D-CNN	-	5	-	[128, 42, 10]
TCN	0.2	2	[1, 2, 4, 8, 16]	[128, 20, 10]
LSTM	0.2	-	-	[128, 20, 16]
SAM	-	-	-	[128, 20, 16]
Reshape	-	-	-	[128, 256, - ]
Linear	-	-	-	[128, 160, - ]
Linear1	-	-	-	[128, 1, - , - ]

3. Experiments and Results Analysis

In this section, two real-world datasets collected from different regions were adopted in this study to verify the effectiveness of the proposed model. The two datasets are described in the following.

The first dataset is the ISO-NE (New England) public dataset [53], which includes load data, temperatures, as well as day types from 1 March 2003 to 31 December 2014. The dataset was collected every hour, and the data from 1 March 2003 to 8 October 2007 were selected for this study. The second selected dataset is the GEFCom2012 public dataset [54], including power load and temperature. The dataset was collected every hour, and a total of 38,065 sets of data from 1 January 2004 to 29 June 2008 were selected as the study sample. The two datasets were divided into a training set, a validation set and a test set at a ratio of 8:1:1.

Experiments on the two datasets were performed on a PC with AMD-4800H CPU and NVIDIA GeForce RTX GPU with 16 GB of memory. The software platform used was Python 3.7 based on Pytorch 1.7.

3.1. Data Preprocessing

In this subsection, the ISO-NE dataset was chosen as a typical case to elaborate the characteristics of load data in detail. In addition, the GEFCom2012 dataset, which is similar to the ISO-NE dataset, will no longer be discussed.

The first step was to determine the number of decomposition models

k

, the penalty factor

α

and the updating step size

τ

using the central frequency observation and the residual index minimization method [55]. In this research, it was convenient to obtain these parameters, i.e.,

k = 8

,

α = 419

and

τ = 0.19

. Figure 7 shows the decomposition results of load data using VMD. Firstly, the changing trend of the low-frequency component is smoother than that of the original signal, and is roughly consistent with the latter. It indicates that the noise interference would be effectively removed by VMD. Secondly, the influence of seasonal factors on load forecasting needed to be considered in the data preprocessing due to the smoothing and stability of the low-frequency component. Finally, all IMF components showed good periodicity and stability without irregular information, which facilitated the extraction of features in the subsequent stage.

Changing trends in electric demand usually present continuous and periodic characteristics without breaking. However, the power consumption is easily influenced by external factors such as seasons, temperature, day types, etc. This results in complex patterns and dynamics in the load data [56]. The relationship between power load and temperature is illustrated in Figure 8, where it shows the distributions of power load and temperature from 1 March 2003 to 8 October 2007. One can clearly see that the power load demand will reach a peak as the temperature becomes very high or low. Thus, temperature has a strong correlation with power load data. Moreover, Figure 9 presents the evolutions of electric consumption in different time horizons. The time horizons of Figure 9a–c are one year from 1 January 2004 to 1 January 2005, one week from 7 to 13 April 2003, and the Christmas period from 22 to 28 December 2003, respectively. One can see from Figure 9a that the power consumption is higher in January, June, July, August and December due to the summer and winter climates. Additionally, the power consumption on weekdays slightly fluctuates and is higher than that on weekends, as shown in Figure 9b. As an important holiday, one can see from Figure 9c that the power consumption during Christmas is significantly lower than at other times. Therefore, the day types such as weekdays and holidays also have important influences on prediction accuracy of the power load.

According to the discussions mentioned above, this study selected power demand, holidays, quarters, temperature and weekends as the input variables for load forecasting. One-hot encoding was performed on quarters, holidays, weekends and weekdays. However, a min–max normalization was used to normalize the power load and temperature over a range of (0, 1). The formula of min–max normalization is shown as follows:

\hat{x} = \frac{x - \min (X)}{\max (X) - \min (X)}

(22)

where X is the input series, x indicates the data to be normalized,

\hat{x}

is the normalized data and max (·) and min (·) represent the maximum and minimum of the input series, respectively.

3.2. Evaluation Criteria

To evaluate the performance of the proposed model, the mean absolute percentage error (MAPE), root mean square error (RMSE) and R-squared (R²) were selected as evaluation indices. These statistical metrics are defined as follows:

M A P E = \frac{1}{N} \sum_{t = 1}^{N} |\frac{y_{t} - {\hat{y}}_{t}}{y_{t}}|

(23)

R M S E = \sqrt{\frac{\sum_{t = 1}^{N} {(y_{t} - {\hat{y}}_{t})}^{2}}{N}}

(24)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{t} - {\hat{y}}_{t})}^{2}}{\sum_{i = 1}^{n} {(y_{t} - y^{2})}^{2}}

(25)

where N is the number of samples, and

y_{t}

and

{\hat{y}}_{t}

represent the actual value and the predicted value, respectively.

3.3. Comparative Analysis of Experimental Results

Generally, the performances of single models, such as TCN, LSTM and CNN, are inferior to those of hybrid models. Moreover, the TCN-based hybrid models in STLF are comparatively inferior to other hybrid models. Thus, the experimental results of the proposed model were mainly compared with other TCN-based hybrid models with VMD decomposition that include VMD–CNN–TCN–SAM (VCTA), VMD–TCN–LSTM–SAM (VTLA), VMD–CNN–LSTM–SAM (VCLA), VMD–CNN–TCN–LSTM (VCTL) and VMD–CNN–TCN–GRU–SAM (VCTGA). It must be stressed that the CNN in these hybrid models means a three-layer 1D-CNN network. Furthermore, in order to overcome the randomness of the AI algorithms, all of the models mentioned above were trained and tested enough times until the results became stable and reliable.

3.3.1. ISO-NE Dataset

In order to further elaborate its performance for load forecasting, the proposed model was mainly compared to the hybrid models with VMD decomposition. Table 2, Table 3 and Table 4 show the maximum, minimum and average values of the MAPE, RMSE and R², respectively. Compared with the VCLA, one can find that the VTLA reduces the MAPE by 13.6% and the RMSE by 10.4%, due to the large receptive field of TCN. The VCTA model is conducive to the extraction of long-term in-depth features, and improves the performance of load forecasting. For example, the VCTA greatly decreases the MAPE by 48.6% and the RMSE by 48% compared to the VTLA. When the VCTA is further stacked with GRU, i.e., VCTGA, one finds that its MAPE and RMSE can further be reduced by 20.8% and 30.8%, respectively. It means that the GRU can enhance the extraction of long-term dependencies. However, the proposed model even outperforms the VCTGA by 17.5% and 7% in terms of the MAPE and RMSE, respectively. This again demonstrates that LSTM has a stronger capability to capture temporal dependencies compared with GRU. Compared with VCTL, one can see that SAM is very important to globally enhance the key features for the proposed model. For example, compared with the VCTL, the proposed model significantly decreases the MAPE by 39% and the RMSE by 49%. Therefore, SAM is extremely important for globally adjusting and retaining key features to improve the superior performance of the proposed model. Additionally, the R² of the proposed model increases from 0.04% to 1.02% compared with that of the contrast models and reaches up to 99.9%, which infers that VDM decomposition is indeed effective in improving prediction accuracy, and that the proposed model results can be trusted.

To further see the deviations between each model and the actual load data, Figure 10 shows the predicted load over 24 h and the actual load data when the MAPE is a maximum and a minimum, respectively. One can find that the VMD–CNN–TCN (VCT)-based hybrid models approximately capture the changing trends in the actual load. However, the VCLA and VTLA models obviously deviate from the actual load. This means that the stacked model based on CNN and TCN plays a critical role in feature extraction for the training model. Similarly, load forecasting during the peak or valley areas is still difficult for the models to accurately predict. However, it can be seen from the subplots that the hybrid model proposed in this article achieves the smallest deviation compared with other VCT-based hybrid models, especially during the turning area. Furthermore, Figure 11 shows bar charts of the RMSE for each model. One can clearly see that the VCTA-based hybrid models are more stable compared with other models, and the proposed model has the best stability among all of these models. At the same time, SAM is also important to improve the stability of the VCT-based models.

3.3.2. GEFCom2012 Dataset

To further validate the generalizations of the proposed model, all of the contrast experiments mentioned above were performed using the GEFCom2012 dataset. Table 5, Table 6 and Table 7 show the maximum, minimum and average values of the MAPE, RMSE and R², respectively. One can see that compared with VCLA, the VCTA model significantly reduces the MAPE by 45.7% and the RMSE by 30%. This again demonstrates that the VCT-based models effectively extract the in-depth features from the load series. Furthermore, the proposed model that is stacked with LSTM based on the VCTA outperforms the latter by 12.7% and 14.5% in terms of the MAPE and RMSE, respectively, which further proves that LSTM can enhance the capture of temporal dependencies. Additionally, the influence of SAM on the extraction of important features from the load data can also demonstrated. One can find that the proposed model further decreases the MAPE by 15.2% and the RMSE by 12.2% compared with the VCTL. At the same time, the R² results of all of the models are over 99%, which demonstrates the effectiveness of VDM decomposition; moreover, the R² value of the proposed model is the highest among these models. Therefore, these above analyses show that the proposed model has good generalization capabilities in power load forecasting.

Figure 12 shows the predicted load over 24 h and the actual load data when the MAPE is a maximum and a minimum, respectively. As shown in Figure 12, all of the models are in accord with the trends in the actual load during the rising or falling stages. However, the proposed model is better able to capture trends in the actual load compared with the deviations of the other models, especially during the turning area. Furthermore, in comparing Figure 10 and Figure 12, one can see that the changing trends of the GEFCom2012 dataset have more peaks and valleys, which proves that the GEFCom2012 dataset has more complexity and volatility compared to the ISO-NE dataset. Thus, the evaluation indices of the load forecasting for all of the models in the GEFCom2012 dataset are higher than those of the corresponding models with the ISO-NE dataset. In addition, Figure 13 shows the maximum, minimum and average values of the RMSE for all of the models with the GEFCom2012 dataset. It can be seen that although the GEFCom2012 dataset has obviously nonlinear and volatile characteristics, the proposed model still achieves the best reliability and the highest prediction accuracy and stability.

3.3.3. Discussion

After VMD decomposition of the two datasets, the predictive results of all of the VMD-based models in this research became stable, which infers the importance of smoothing and stabilizing the load data for accurate prediction. Moreover, the combination of TCN and LSTM is helpful to fully extract the long-term temporal dependencies and local dependencies of the load data. Due to stacking multiple single modules, there is a lot of irrelevant feature information that is extracted from the load data. Then, SAM is used to dynamically adjust the correlation weights of different important features to obtain the key formation. Therefore, the novel architecture of the proposed model is very beneficial for feature extraction from large-scale load data with nonlinear patterns and dynamics. It should be pointed out that the computational cost of the proposed model increases greatly compared to other single models, due to its complicated structure. However, the computational cost problem can easily be overcome with the rapid development of computing power. In addition, all of the models were trained and tested several times in this study to obtain a series of stable results. Figure 14 shows a boxplot of the MAPE for all of the models based on the ISO-NE and GEFCom2012 datasets. One can see that the proposed model not only achieves the lowest MAPE value among all of the models for the two datasets, but it also achieved the smallest variation range in its MAPE value. This indicates that the novel model proposed in this study has excellent generalization and reliable prediction accuracy.

3.3.4. Comparison with State-of-the-Art Models

It is necessary to compare the proposed model with other state-of-the-art hybrid models, all of which are TCN-based or data decomposition models published in recent years. Table 8 lists the comparative results of the proposed model and other baseline models. It is obvious that the proposed model significantly outperforms state-of-the-art models in terms of the MAPE. Even if the same datasets are used in [14,54] for one-hour-ahead forecasting, the MAPE values of the proposed model are significantly reduced by 37% and 36%, respectively. It should be emphasized that [14] used an EMD decomposition method, and achieved high prediction accuracy. However, compared with the results of [14], the proposed model in this research reduces the MAPE by 37% and the RMSE by 39%. This also demonstrates that VMD decomposition outperforms EMD decomposition in terms of data smoothing and linearization. Furthermore, the hybrid models with data decomposition all achieved high accuracies in load forecasting, which proves that the smoothing and stabilizing of load data is a very important process for load forecasting. It should be noted that the RMSE values of some comparative models are lower than that of the proposed model due to their simpler structures and fewer hyperparameters. Overall, the proposed model can extract in-depth spatiotemporal features, and then globally enhance key features to achieve reliable load forecasting with greater accuracy.

4. Conclusions

In order to globally extract in-depth features from load data with nonlinear patterns and dynamics, a novel model needs to be developed to achieve greater accuracy in load forecasting to ensure the reliable and economic operations of an electric grid. This study proposed a new hybrid model based on VMD and deep TCN-based networks with SAM for STLF. The load data were decomposed into multiple sub-series that were reshaped into a feature matrix, along with external factors. A three-layer 1D-CNN network was used as a deep network to extract in-depth features, thus avoiding the over-fitting and gradient vanishing of the training model. The TCN extracted the long-term temporal dependencies, and LSTM further enhanced the extraction of temporal features. These features were globally adjusted via SAM to obtain important features. The load forecast was eventually obtained from the fully connected layer. The effectiveness and generalization of the proposed model were validated on two real-world datasets, ISO-NE and GEFCom2012. Compared with other benchmarking models, the experimental results demonstrated that the proposed model improved the prediction accuracy by 17% to 71% in terms of the MAPE and 7% to 70% in terms of the RMSE for the ISO-NE dataset, or by 15% to 53% in terms of the MAPE and 12% to 43% in terms of the RMSE for the GEFCom2012 dataset. Moreover, the R² of the proposed model achieved up to 99.9% for the ISO-NE dataset and 99.83% for the GEFCom2012 dataset. Therefore, the proposed model can be effective in globally extracting the in-depth features from load data and external factors to obtain greater accuracy for load forecasting.

In future research, we would like to further optimize the deep learning techniques to reduce their computational costs and achieve greater prediction accuracies. Moreover, future research could improve the TCN-based models to better adapt to complex time series, and then apply these models to other areas such as wind speed, photovoltaic power and integrated energy systems.

Author Contributions

Conceptualization, Q.X. and M.L.; methodology, Q.X. and M.L.; software, Q.X. and Y.L.; validation, C.Z. and S.D.; formula analysis, M.L. and Q.X.; writing—original draft preparation, M.L. and Q.X.; writing—review and editing, M.L. and Y.L.; data curation, Q.X. and Y.L.; supervision, C.Z. and S.D.; funding acquisition, M.L. and S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under grant 62065012, and in part by the Natural Science Foundation of Jiangxi Province of China under grant 20212BAB202031, and in part by the Interdisciplinary Innovation Fund of Natural Science, Nanchang University under grant 9167-28220007-YB2111, and in part by the Innovation Fund Designated for Graduate Students of Jiangxi Province of China under grant YC2022-S132.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. The datasets can be found here: https://www.iso-ne.com/isoexpress/web/reports/load-and-demand (accessed on 10 November 2020), and https://www.sciencedirect.com/science/article/pii/S0169207013000745#s000065 (accessed on 8 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Ekonomou, L.; Christodoulou, C.A.; Mladenov, V. A short-term load forecasting method using artificial neural networks and wavelet analysis. Int. J. Power Syst. 2016, 1, 64–68. [Google Scholar]
Hao, J.; Sun, X.; Feng, Q. A novel ensemble approach for the forecasting of energy demand based on the artificial bee colony algorithm. Energy 2020, 13, 550. [Google Scholar] [CrossRef]
Wang, R.; Wang, J.; Xu, Y. A novel combined model based on hybrid optimization algorithm for electrical load forecasting. Appl. Soft Comput. 2019, 82, 105548. [Google Scholar] [CrossRef]
Pavlatos, C.; Makris, E.; Fotis, G.; Vita, V.; Mladenov, V. Utilization of artificial neural networks for precise electrical load prediction. Technol. 2023, 11, 70. [Google Scholar] [CrossRef]
Guo, Z.; Zhou, K.; Zhang, X.; Yang, S. A deep learning model for short-term power load and probability density forecasting. Energy. 2018, 160, 1186–1200. [Google Scholar] [CrossRef]
Song, Z.; Niu, D.; Qiu, J.; Xiao, X.; Ma, T. Improved short-term load forecasting based on EEMD, guassian disturbance firefly algorithm and support vector machine. Intell. Fuzzy Syst. 2016, 31, 1709–1719. [Google Scholar] [CrossRef]
Da Silva, P.G.; Ilić, D.; Karnouskos, S. The impact of smart grid prosumer grouping on forecasting accuracy and its benefits for local electricity market trading. IEEE Trans. Smart Grid 2014, 5, 402–410. [Google Scholar] [CrossRef]
Zheng, Z.; Chen, H.; Luo, X. A kalman filter-based bottom-up approach for household short-term load forecast. Appl. Energy 2019, 250, 882–894. [Google Scholar] [CrossRef]
Vu, D.H.; Muttaqi, K.M.; Agalgaonkar, A.P.; Bouzerdoum, A.J. Short-term electricity demand forecasting using autoregressive based time varying model incorporating representative data adjustment. Appl. Energy 2017, 205, 790–801. [Google Scholar] [CrossRef]
Alberg, D.; Last, M. Short-term load forecasting in smart meters with sliding window-based ARIMA algorithms. Open Access 2018, 5, 241–249. [Google Scholar] [CrossRef]
Liu, M.; Qin, H.; Cao, R.; Deng, S. Short-term load forecasting based on improved TCN and DenseNet. IEEE Access 2022, 10, 115945–115957. [Google Scholar] [CrossRef]
Cai, M.; Pipattanasomporn, M.; Rahman, S. Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques. Appl. Energy 2019, 236, 1078–1088. [Google Scholar] [CrossRef]
Lydia, M.; Kumar, S.S.; Selvakumar, A.I.; Kumar, G. Linear and non-linear autoregressive models for short-term wind speed forecasting. Energy Convers. Manag. 2016, 112, 115–124. [Google Scholar] [CrossRef]
Zhang, X.; Wang, J.; Gao, Y. A hybrid short-term electricity price forecasting framework: Cuckoo search-based feature selection with singular spectrum analysis and SVM. Energy Econ. 2019, 81, 899–913. [Google Scholar] [CrossRef]
Aly, H. A proposed intelligent short-term load forecasting hybrid models of ANN, WNN and KF based on clustering techniques for smart grid. Electr. Power Syst. Res. 2020, 182, 106191. [Google Scholar] [CrossRef]
Liu, M.; Sun, X.; Wang, Q.; Deng, S. Short-term load forecasting using EMD with feature selection and TCN-based deep learning model. Energy 2022, 15, 7170. [Google Scholar] [CrossRef]
Ciechulski, T.; Osowski, S. High precision LSTM model for short-time load forecasting in power systems. Energy 2021, 14, 2983. [Google Scholar] [CrossRef]
Liang, Y.; Cao, Z.; Yang, X. Deepcloud: Ground-based cloud image categorization using deep convolutional features. IEEE Trans. Green Commun. Netw. 2017, 55, 5729–5740. [Google Scholar]
Lee, S.; Kim, H.; Lieu, Q.X.; Lee, J. CNN-based image recognition for topology optimization. Knowl.-Based Syst. 2020, 198, 105887. [Google Scholar] [CrossRef]
Mbae, M.; Nwulu, N.I. Day-ahead load forecasting using improved grey verhulst model. J. Eng. Des. Technol. 2020, 18, 1335–1348. [Google Scholar] [CrossRef]
Huang, Q.; Li, J.; Zhu, M. An improved convolutional neural network with load range discretization for probabilistic load forecasting. Energy 2020, 203, 117902. [Google Scholar] [CrossRef]
Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Zhu, R.; Liao, W.; Wang, Y. Short-term prediction for wind power based on temporal convolutional network. Energy Rep. 2020, 6, 424–429. [Google Scholar] [CrossRef]
Zhao, W.; Gao, Y.; Ji, T.; Wan, X.; Ye, F.; Bai, G. Deep temporal convolutional networks for short-term traffic flow forecasting. IEEE Xplore 2019, 7, 114496–114507. [Google Scholar] [CrossRef]
Dai, Y.; Zhou, Q.; Leng, M.; Yang, X.; Wang, Y. Improving the Bi-LSTM model with XGBoost and attention mechanism: A combined approach for short-term power load prediction. Appl. Soft Comput. 2022, 130, 109632. [Google Scholar] [CrossRef]
Liu, H. Prediction of air pollutant concentration based on self-attention mechanism LSTM model. In Proceedings of the International Conference on High Performance Computing and Communication (HPCCE 2021), Guangzhou, China, 18 February 2022. [Google Scholar]
Ju, Y.; Li, J.; Sun, G. Ultra-short-term photovoltaic power prediction based on self-attention mechanism and multi-task learning. IEEE Access 2020, 8, 44821–44829. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar]
Stratigakos, A.; Bachoumis, A.; Vita, V.; Zafiropoulos, E. Short-term net load forecasting with singular spectrum analysis and LSTM neural networks. Energy 2021, 14, 4107. [Google Scholar] [CrossRef]
Hewage, P.; Behera, A.; Trovati, M.; Pereira, E.; Liu, Y. Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station. Soft Comput. 2020, 24, 16453–16482. [Google Scholar] [CrossRef]
He, S.; Li, C.; Liu, X.; Chen, X.; Shahidehpour, M.; Chen, T.; Zhou, B.; Wu, Q. A per-unit curve rotated decoupling method for CNN-TCN based day-ahead load forecasting. IET Gener. Transm. Distrib. 2021, 15, 2773–2786. [Google Scholar] [CrossRef]
Tang, X.; Chen, H.; Xiang, W.; Yang, J.; Zou, M. Short-term load forecasting using channel and temporal attention based temporal convolutional network. Electr. Power Syst. Res. 2022, 205, 107761. [Google Scholar] [CrossRef]
Wang, H.; Ouyang, M.; Wang, Z.; Liang, R.; Zhou, X. The power load’s signal analysis and short-term prediction based on wavelet decomposition. Clust. Comput. 2019, 22, 11129–11141. [Google Scholar] [CrossRef]
Liang, Y.; Niu, D.; Hong, W. Short term load forecasting based on feature extraction and improved general regression neural network model. Energy 2018, 166, 653–663. [Google Scholar] [CrossRef]
Zhu, B.; Han, D.; Wang, P.; Wu, Z.; Zhang, T.; Wei, Y. Forecasting carbon price using empirical mode decomposition and evolutionary least squares support vector regression. Appl. Energy 2017, 191, 521–530. [Google Scholar] [CrossRef]
Deng, D.; Li, J.; Zhao, J.; Zhang, Z.; Huang, Q. A novel hybrid short-term load forecasting method of smart grid using MLR and LSTM neural network. IEEE Trans. Ind. Inform. 2021, 17, 2443–2452. [Google Scholar]
Li, W.; Quan, C.; Wang, X.; Zhang, S. Short-term power load forecasting based on a combination of VMD and ELM. Pol. J. Environ. Stud. 2018, 27, 2143–2154. [Google Scholar] [CrossRef]
Li, W.; Shi, Q.; Sibtain, M.; Li, D.; Mbanze, D.E. A hybrid forecasting model for short-term power load based on sample entropy, two-phase decomposition and whale algorithm optimized support vector regression. IEEE Access 2020, 8, 166907–166921. [Google Scholar] [CrossRef]
Yu, P.; Fang, J.; Xu, Y.; Shi, Q. Application of variational mode decomposition and deep learning in short-term power load forecasting. J. Phys. Conf. Ser. 2021, 1883, 012128. [Google Scholar] [CrossRef]
Shen, Y.; Ma, Y.; Deng, S.; Huang, C.; Kuo, P. An ensemble model based on deep learning and data preprocessing for short-term electrical load forecasting. Sustainability 2021, 13, 1694. [Google Scholar] [CrossRef]
Zhou, F.; Zhou, H.; Li, Z.; Zhao, K. Multi-step ahead short-term electricity load forecasting using VMD-TCN and error correction strategy. Energy 2022, 15, 5375. [Google Scholar] [CrossRef]
Tang, J.; Chien, Y.R. Research on wind power short-term forecasting method based on temporal convolutional neural network and variational modal decomposition. Sensors 2022, 22, 7414. [Google Scholar] [CrossRef]
Cai, C.; Li, Y.; Su, Z.; Zhu, T.; He, Y. Short-term electrical load forecasting based on VMD and GRU-TCN hybrid network. Appl. Sci. 2022, 12, 6647. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
Xiao, D.; Ding, J.; Li, X.; Huang, L. Gear fault diagnosis based on kurtosis criterion VMD and SOM neural network. Appl. Sci. 2019, 9, 5424. [Google Scholar] [CrossRef]
Rizvi, S. Time series deep learning for robust steady-state load parameter estimation using 1-CNN. Arab. J. Sci. Eng. 2021, 47, 2731–2744. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
Robinson, J.; Kuzdeba, S.; Stankowicz, J.; Carmack, J.M. Dilated causal convolutional model for RF fingerprinting. In Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference (Ccwc), Las Vegas, NV, USA, 6–8 January 2020. [Google Scholar]
Ibrahim, E.A.; Geilen, M.; Huisken, J.; Li, M.; Gyvez, J.P. Low complexity multi-directional in-air ultrasonic gesture recognition using a TCN. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (Date 2020), Grenoble, France, 9–13 March 2020. [Google Scholar]
Ding, A.; Liu, T.; Zou, X. Integration of ensemble googlenet and modified deep residual networks for short-term load forecasting. Electronics 2021, 10, 2455. [Google Scholar] [CrossRef]
Li, C.; Xie, C.; Zhang, B.; Chen, C.; Han, J. Deep fisher discriminant learning for mobile hand gesture recognition. Pattern Recognit. 2018, 77, 276–288. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
ISO-NE Data Set. Available online: https://www.iso-ne.com/isoexpress/web/reports/load-and-demand (accessed on 10 November 2020).
GEFCom2012 Data Set. Available online: https://www.sciencedirect.com/science/article/pii/S0169207013000745#s000065 (accessed on 8 June 2022).
Huang, N.; Wu, Y.; Cai, G.; Zhu, H.; Xing, E. Short-term wind speed forecast with low loss of information based on feature generation of OSVD. IEEE Access 2019, 7, 81027–81046. [Google Scholar] [CrossRef]
Li, H.; Tan, J.; Han, J.; Ge, Y.; Guo, D. Sensitivity analysis and forecast of power load characteristics based on meteorological feature information. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Dalian, China, 2020; Volume 558, p. 052060. [Google Scholar]
He, W. Load forecasting via deep neural networks. In Proceedings of the 5th International Conference on Information Technology and Quantitative Management, Washington, DC, USA, 23–26 April 2017. [Google Scholar]
Xie, J.; Chen, Y.; Hong, T.; Laing, T.D. Relative humidity for load forecasting models. IEEE Trans. Smart Grid 2018, 9, 191–198. [Google Scholar] [CrossRef]
Nowotarski, J.; Liu, B.; Weron, R.; Hong, T. Improving short term load forecast accuracy via combining sister forecasts. Energy 2016, 98, 40–49. [Google Scholar] [CrossRef]

Figure 1. The structure of causal convolution.

Figure 2. Structure of dilated convolution.

Figure 3. Structure of the residual block.

Figure 4. Structure of LSTM.

Figure 5. The structure of SAM.

Figure 6. Flow chart of the proposed model.

Figure 7. Decomposition results of the load data using VMD for the ISO-NE dataset.

Figure 8. Evolutions in power load and temperature from 1 March 2003 to 8 October 2007 for ISO-NE dataset.

Figure 9. Evolutions in power load in different time horizons for ISO-NE dataset. (a) One year, (b) one week, (c) from 22 to 28 December 2003.

Figure 10. Load forecasting profiles over 24 h based on ISO-NE dataset when the MAPE is a maximum (a) and a minimum (b).

Figure 11. Maximum, minimum and average values of RMSE for each model based on ISO-NE dataset.

Figure 12. Load forecasting profiles over 24 h based on GEFCom2012 dataset when the MAPE is a maximum (a) and a minimum (b).

Figure 13. Maximum, minimum and average values of RMSE for each model based on GEFCom2012 dataset.

Figure 14. Boxplots of MAPE for each model based on (a) ISO-NE dataset and (b) GEFCom2012 dataset.

Table 2. Maximum, minimum and average values of MAPE for ISO-NE dataset.

Models	VCLA	VTLA	VCTA	VCTL	VCTGA	Proposed
Max MAPE (%)	2.08	1.97	1.11	1.28	0.71	0.56
Min MAPE (%)	1.23	1.07	0.53	0.56	0.45	0.41
Average MAPE (%)	1.62	1.40	0.72	0.77	0.57	0.47

Table 3. Maximum, minimum and average values of RMSE for ISO-NE dataset.

Models	VCLA	VTLA	VCTA	VCTL	VCTGA	Proposed
Max RMSE(MW)	387.17	398.52	210.26	303.41	136.25	116.51
Min RMSE(MW)	305.54	213.04	118.88	122.25	88.22	80.46
Average RMSE(MW)	344.87	308.32	159.65	201.29	110.17	102.23

Table 4. Maximum, minimum and average values of R² for ISO-NE dataset.

Models	VCLA	VTLA	VCTA	VCTL	VCTGA	Proposed
Max R² (%)	99.31	99.53	99.88	99.87	99.91	99.93
Min R² (%)	97.90	98.27	99.51	99.32	99.80	99.87
Average R² (%)	98.88	99.20	99.75	99.73	99.86	99.90

Table 5. Maximum, minimum and average values of MAPE for GEFCom2012 dataset.

Models	VCLA	VTLA	VCTA	VCTL	VCTGA	Proposed
Max MAPE (%)	1.93	1.33	1.11	1.13	1.03	0.91
Min MAPE (%)	1.12	1.02	0.93	0.98	0.91	0.85
Average MAPE (%)	1.50	1.18	1.02	1.05	0.96	0.89

Table 6. Maximum, minimum and average values of RMSE for GEFCom2012 dataset.

Models	VCLA	VTLA	VCTA	VCTL	VCTGA	Proposed
Max RMSE (MW)	412.50	437.96	278.36	273.77	257.04	236.49
Min RMSE (MW)	348.79	253.50	261.19	243.01	229.45	217.92
Average RMSE (MW)	378.57	325.29	264.35	256.76	241.00	225.32

Table 7. Maximum, minimum and average values of R² for GEFCom2012 dataset.

Models	VCLA	VTLA	VCTA	VCTL	VCTGA	Proposed
Max R² (%)	99.73	99.76	99.83	99.77	99.85	99.85
Min R² (%)	99.14	99.59	99.75	99.51	99.78	99.82
Average R² (%)	99.42	99.69	99.79	99.71	99.81	99.83

Table 8. Comparison of the proposed model with state-of-the-art models.

Study	Year	Method	Dataset	Forecast Horizon	MAPE (%)	RMSE (MW)
He et al. [31]	2021	CNN–TCN	Guangzhou, China	24 h	2.39	378.29
Tang et al. [32]	2022	SAM–TCN	Chongqing, China	1 h	5.35	52.14
Liu et al. [16]	2022	EMD–CNN–TCN–SAM–LSTM	ISO-NE	1 h	0.75	168.19
He et al. [57]	2017	Parallel CNN–RNN	GEFCom2012	1 h	1.405	-
Xie et al. [58]	2018	RH–ANN	GEFCom2012	24 h	4.90	-
Nowotarski et al. [59]	2016	Sister model	ISO-NE GEFCom2012	1 h	(2.10–5.44)	-
Proposed model	-	VCTLA	ISO-NE	1 h	0.47	102.23
Proposed model	-	VCTLA	GEFCom2012	1 h	0.89	225.32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiong, Q.; Liu, M.; Li, Y.; Zheng, C.; Deng, S. Short-Term Load Forecasting Based on VMD and Deep TCN-Based Hybrid Model with Self-Attention Mechanism. Appl. Sci. 2023, 13, 12479. https://doi.org/10.3390/app132212479

AMA Style

Xiong Q, Liu M, Li Y, Zheng C, Deng S. Short-Term Load Forecasting Based on VMD and Deep TCN-Based Hybrid Model with Self-Attention Mechanism. Applied Sciences. 2023; 13(22):12479. https://doi.org/10.3390/app132212479

Chicago/Turabian Style

Xiong, Qingliang, Mingping Liu, Yuqin Li, Chaodan Zheng, and Suhui Deng. 2023. "Short-Term Load Forecasting Based on VMD and Deep TCN-Based Hybrid Model with Self-Attention Mechanism" Applied Sciences 13, no. 22: 12479. https://doi.org/10.3390/app132212479

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Load Forecasting Based on VMD and Deep TCN-Based Hybrid Model with Self-Attention Mechanism

Abstract

1. Introduction

2. Methodologies

2.1. VMD

2.2. 1D-CNN

2.3. TCN

2.3.1. Causal Convolution

2.3.2. Dilated Convolution

2.3.3. Residual Block

2.4. LSTM

2.5. SAM

2.6. Proposed Model

3. Experiments and Results Analysis

3.1. Data Preprocessing

3.2. Evaluation Criteria

3.3. Comparative Analysis of Experimental Results

3.3.1. ISO-NE Dataset

3.3.2. GEFCom2012 Dataset

3.3.3. Discussion

3.3.4. Comparison with State-of-the-Art Models

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI