Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning

: Short-term load forecasting is critical to ensuring the safe and stable operation of the power system. To this end, this study proposes a load power prediction model that utilizes outlier correction, decomposition, and ensemble reinforcement learning. The novelty of this study is as follows: ﬁrstly, the Hampel identiﬁer (HI) is employed to correct outliers in the original data; secondly, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is used to extract the waveform characteristics of the data fully; and, ﬁnally, the temporal convolutional network, extreme learning machine, and gate recurrent unit are selected as the basic learners for forecasting load power data. An ensemble reinforcement learning algorithm based on Q-learning was adopted to generate optimal ensemble weights, and the predictive results of the three basic learners are combined. The experimental results of the models for three real load power datasets show that: (a) the utilization of HI improves the model’s forecasting result; (b) CEEMDAN is superior to other decomposition algorithms in forecasting performance; and (c) the proposed ensemble method, based on the Q-learning algorithm, outperforms three single models in accuracy, and achieves smaller prediction errors.


Introduction
Electric load forecasting is an important aspect of modern power system management and a key research focus of power companies [1]. It comprises long-term, medium-term, and short-term forecasting, depending on the specific goals [2]. Notably, short-term load forecasting plays an important role in power generation planning and enables relevant departments to establish appropriate power dispatching plans [3,4], which is crucial for maintaining the safe and stable operation of the power system and enhancing its social benefits [5]. In addition, it facilitates the growth of the power market and boosts economic benefits [6]. Therefore, devising an effective and precise method for short-term load forecasting is of significant importance.
With the need for accurate energy forecasting in mind, various forecasting methods have been developed. Early studies produced several models for short-term power load forecasting, including the Auto-Regressive (AR), Auto-Regressive Moving Average (ARMA), and Auto-Regression Integrated Moving Average (ARIMA) models. A case in point is the work of Chen et al. [7], who employed the ARMA model for short-term power load forecasting. This method utilizes observed data as the initial input, and its fast algorithm produces predicted load values that are in line with the trend in load variation. However, it falls short in terms of accounting for the factors that affect such variation, thus leaving room for enhancement in prediction accuracy.
In recent years, scholars have turned to machine learning [8] and deep learning [9] to improve electric load forecasting accuracy and uncover complex data patterns. Among existing combination weights of load power ensemble prediction models lack diversity and should take into account different weight distribution strategies for the prediction results generated by different base learners. The literature shows that weight ensemble based on reinforcement learning can offer advantages in wind speed prediction [33,34].
To address the aforementioned research gaps, this paper presents a short-term load forecasting model (HI-CEEMDAN-Q-TEG) based on outlier correction, decomposition, and ensemble reinforcement learning. The contributions and novelty of this paper are summarized as follows: • This paper employs an outlier detection method to correct outliers in the original load power data. Such outliers may arise due to human error or other situations. Directly inputting the original data into the model without processing could lead to problems.
To address this and identify and correct outliers in the data, this paper utilizes the Hampel identifier (HI) algorithm. This step is crucial as it provides the nonlinear information in the data to the forecasting model; • This paper utilizes a decomposition method to extract fully waveform characteristics of the data. Specifically, the CEEMDAN method is utilized in this study to decompose the raw non-stationary load power data. By decomposing the load power data into multiple sub-sequences through CEEMDAN, the waveform characteristics of the data can be extracted thoroughly, ultimately enhancing the performance of the predictor; • This paper introduces an ensemble learning algorithm based on reinforcement learning. It is necessary to consider varying weights when combining preliminary predictions from different base learners. This study employs three single models to predict processed load power data, followed by the utilization of the Q-learning method to obtain cluster weights that are suitable for the ensemble forecast. Compared to other ensemble learning algorithms, the Q-learning method deploys agents to learn in the environment through trial and error, resulting in an innovative and superior method.

Framework of the Proposed Model
This study presents a novel forecasting model, namely the HI-CEEMDAN-Q-TEG, for predicting load power. The model framework, as depicted in Figure 1, consists of three distinct steps with specific details as follows: Step 1: Using HI to detect and correct outliers. The original load power data is characterized by fluctuations, randomness, and nonlinearity; therefore, outliers can arise as a result of either equipment or human factors. By using HI, outliers can be identified and corrected in the training set, which eliminates the likelihood of their interference with model training. This approach serves as a valuable tool for enhancing the precision of load power prediction; Step 2: Applying CEEMDAN to decompose original data into subseries. Given its prominent cyclical characteristics, the load power data can be perceived, from a frequency domain perspective, as a composite of several components with varying frequencies. The CEEMDAN method can adaptively decompose this data into multiple subseries, thereby reducing the model's non-stationarity and enhancing the predictor's modeling efficiency and capacity; Step 3: Using the Q-learning ensemble method for prediction. The load power data prediction is achieved by employing three base learners: the temporal convolutional network (TCN); gate recurrent unit (GRU); and extreme learning machine (ELM), which are referred to as TEG. After correcting for outliers, the TEG is used to make accurate predictions. Ensemble weights for different single models are determined using the Qlearning method. This algorithm updates the weights repeatedly through trial-and-error learning, thereby optimizing the diversity and appropriateness of the ensemble weights. method. This algorithm updates the weights repeatedly through trial-and-error learning, thereby optimizing the diversity and appropriateness of the ensemble weights.

Hampel Identifier
HI is a widely used method for detecting and correcting outliers [35]. Due to its excellent effectiveness, many researchers employ this method. To apply the HI algorithm to input data A = [a 1 , a 2 , . . . , a k ], set the sliding window length as w = 2n + 1. For each sample a i , obtain the median m i , as well as median absolute deviation (MAD) from the samples of length n around the specific center point. Set the evaluation parameter as α = 0.6745, and calculate the standard deviation σ i using MAD and a [36]. The formulas for calculating m i , MAD, and σ i are as follows [32]: Based on the 3d statistical rule, if the difference between a sample value and the window median exceeds three standard deviations, the window median will replace the sample data [37]: The use of HI allows for the outliers to be corrected in the raw data, which, if left untreated, could potentially disrupt the model training process. The incorporation of HI into data preprocessing leads to an enhanced nonlinear fitting performance of the data.

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
CEEMDAN is a decomposition algorithm used to analyze time series data for nonlinearity and non-stationarity [38]. By smoothing the overall data and extracting information about multiple frequencies from the original data, CEEMDAN can decompose the data into sub-sequences with varying frequency and time information. The CEEMDAN algorithm is adaptive, meaning it can automatically select the appropriate noise level based on the unique characteristics of a given signal. This adaptability and robustness make the CEEMDAN algorithm ideal for processing nonlinear and non-stationary signals [39].
Based on the EMD algorithm, the CEEMDAN algorithm makes the signal more stable and accurate in the decomposition process by introducing a noise signal. Meanwhile, it adopts multiple decompositions and average methods to improve the accuracy and stability of signal decomposition [40].
The CEEMDAN algorithm has the advantage of solving mutual interference and noise interference problems between intrinsic mode functions (IMFs). This leads to improved accuracy and stability of signal decomposition.

Temporal Convolutional Network
The TCN algorithm is a commonly used convolutional network in time series predictions [41]. Because of the causal relationship between load data over time, the prediction at time t depends on previous times, and the TCN network effectively maintains this temporal order and causality. TCN consists of three parts: causal convolution; expansive convolution; and residual convolution.
In TCN, causal convolution ensures that the output of the upper layers of the network at time t only depends on the input of the lower layers before time t. Expansion convolution involves setting hyperparameters of the expansion factor to adjust the convolutional interval. To reduce the limitations of downward transmission after nonlinear transformation in the original network structure, TCN adds multiple direct channels to the original network structure, allowing the input information to be directly transmitted to later layers.

Extreme Learning Machine
ELM is an efficient artificial neural network whose principle is based on fully random projections and the least squares method [42]. Fully random projection refers to the projection of input data into a high-dimensional space. This increases the separability of data in the feature space [43]. Through random initialization of the weights of the input and hidden layers, the ELM algorithm can minimize training errors very quickly, facilitating rapid learning and prediction. ELM can be expressed mathematically as follows [32]: where β represents the output weight matrix, g(x) represents the activation function, W represents the input weight matrix, and b represents the vector of bias.
With H representing the output matrix and Y representing the true value matrix, the matrix expression for Extreme Learning Machine (ELM) is as follows: where H is a matrix whose rows represent the output of the hidden layer for each input sample, and β is a matrix of output weights.

Gate Recurrent Unit
In 2014, Cho proposed the Gated Recurrent Unit (GRU) as an improvement on Long-Short-Term Memory (LSTM) [44]. The GRU has two gates, the reset gate and the update gate, which, respectively, determine whether to add historical information to the current state and the relevance of historical information. Compared to the LSTM, the GRU uses fewer parameters in the model and the important features are preserved, resulting in faster running speeds.
The formulas for the update gate as well as reset gate calculation are as follows: where x t represents the current input value; h t−1 represents the state of the previous hidden; W represents the matrix of weight.

Ensemble Reinforcement Learning Method
As a distinct machine learning method, reinforcement learning is different from supervised learning or unsupervised learning due to its continuous interactions with the environment as an agent, which guides subsequent actions by providing feedback on the reward received, aiming to maximize the rewards [45]. The Q-learning method is a reinforcement learning algorithm based on estimated values [46]. Q-learning generates a Q-value table that captures the relationship between each action taken and state. Each value in this table represents the obtained reward for actions taken in each state.
The Q-table approach selects the action with the highest potential reward and uses a penalty and reward mechanism to keep the Q-table in the update until the optimal result is achieved. This happens when a specific condition is met, signifying that the algorithm has found the optimal action for each state. [47]. In this study, we employ the Q-learning method to combine the forecasting outcomes of TCN, ELM, and GRU. As a result, different ensemble weights are generated for each base learner to effectively address the issue of weak robustness associated with a single weight as well as a single model.

Data Description
To verify the practicality of the proposed model, three sets of load power data from Pecan Street datasets were utilized in this study [48]. The Pecan Street datasets contain the load power data of 25 households in the Austin area of the United States, recorded at a sampling interval of 15 min in 2018. Figure 2 showcases the load power datasets #1, #2, and #3 collected from the 1st to the 15th of each month in January, April, and September, respectively, in the Austin area. Each dataset comprises 1440 samples, divided into two parts: 1240 training set samples and 200 test set samples. The training sets are utilized to train the single models and the Q-learning ensemble method, while the testing set is utilized to evaluate the performance of all the models discussed in this paper. respectively, in the Austin area. Each dataset comprises 1440 samples, divided into two parts: 1240 training set samples and 200 test set samples. The training sets are utilized to train the single models and the Q-learning ensemble method, while the testing set is utilized to evaluate the performance of all the models discussed in this paper. Table 1 lists the statistical characteristics of three load power datasets. As observed from Figure 2 and Table 1, these three sets of load power data possess distinct statistical characteristics; however, they all exhibit non-stationarity and volatility.    Table 1 lists the statistical characteristics of three load power datasets. As observed from Figure 2 and Table 1, these three sets of load power data possess distinct statistical characteristics; however, they all exhibit non-stationarity and volatility.

Performance Evaluation Indexes
To provide a comprehensive evaluation of the forecasting performance of the models, three statistical indexes are employed in this study: mean absolute error (MAE); root mean square error (RMSE); and mean absolute percentage error (MAPE). The smaller the values of these indexes, the higher the model's prediction accuracy. The definitions of these indexes are shown as follows: where y(t) is the original load power data at time t,ŷ(t) is the forecasted load power data at time t, and T is the number of samples in y(t).

Forecasting Results and Analysis
The experiments aimed to compare the proposed hybrid HI-CEEMDAN-Q-TEG model with other relevant models. The main experimental parameters of our hybrid model are given in Appendix A. The experiments were divided into three parts: In Part I, the models with HI were compared to those without HI to demonstrate the potential efficacy of HI and the performance improvements attainable by using HI in load power forecasting; Part II compared four commonly used intelligent models running with HI (namely, HI-TCN, HI-ELM, HI-GRU, and HI-BPNN) to demonstrate the superiority of HI-TCN, HI-ELM, and HI-GRU in different datasets. Furthermore, HI-Q-TEG was compared with HI-TCN, HI-ELM, and HI-GRU to demonstrate the effectiveness of the Q-Learning ensemble method; Part III aimed to verify the advantages of the decomposition method by comparing the results of the HI-Q-TE method with those obtained using the HI-CEEMDAN-Q-TE decomposition algorithm. In addition, different decomposition algorithms were compared to show the superiority of the CEEMDAN decomposition algorithm proposed in this study.

Experimental Results of Part I
In this part, we investigate the impact of employing HI in load power forecasting. Figure 3 depicts the outlier points and the dissimilarity between the original power load data and the data after HI. Table 2 displays the sample entropy (SampEn) values for both the original load power data and the data post HI application. To further investigate the potential gains from HI, the accuracy of HI-based models is compared to that of models sans HI, and we present the percentage enhancements in all three performance evaluation indices in Table 3. • The application of the HI model leads to the identification and correction of outlier points, which improves the overall quality of the dataset. Figure 2 depicts the presence of outlier points in the original power load data, which can interfere with model training and negatively impact forecasting accuracy; • The HI model effectively reduces the complexity of the original data, as evidenced by a lowered value of SampEn. SampEn is a statistical measure that quantifies the complexity of a time series. A lower value of SampEn indicates a higher degree of self-similarity in the sequence, whereas a higher value implies greater complexity. Table 2 indicates that for all three datasets, the values of SampEn were lower in the data processed with the HI model compared to the original load power data; • The HI model improves forecasting accuracy compared to models without the HI model. The comparative analysis of HI-CEEMDAN-Q-TEG with CEEMDAN-Q-TEG shows an improvement in MAPE accuracy by 2.6104%, 3.7628%, and 3.2095%, respectively, for datasets #1, #2, and #3, as listed in Table 3. The improvement is due to the correction of outliers. The findings demonstrate that the implementation of the HI model reduces the load power prediction error in all three series.    Based on the results presented in Figure 3 and Tables 2 and 3, this study draws the following conclusions:

•
The application of the HI model leads to the identification and correction of outlier points, which improves the overall quality of the dataset. Figure 2 depicts the presence of outlier points in the original power load data, which can interfere with model training and negatively impact forecasting accuracy; • The HI model effectively reduces the complexity of the original data, as evidenced by a lowered value of SampEn. SampEn is a statistical measure that quantifies the complexity of a time series. A lower value of SampEn indicates a higher degree of self-similarity in the sequence, whereas a higher value implies greater complexity. Table 2 indicates that for all three datasets, the values of SampEn were lower in the data processed with the HI model compared to the original load power data; • The HI model improves forecasting accuracy compared to models without the HI model. The comparative analysis of HI-CEEMDAN-Q-TEG with CEEMDAN-Q-TEG shows an improvement in MAPE accuracy by 2.6104%, 3.7628%, and 3.2095%, respectively, for datasets #1, #2, and #3, as listed in Table 3. The improvement is due to the correction of outliers. The findings demonstrate that the implementation of the HI model reduces the load power prediction error in all three series.

Experimental Results of Part II
This part of the experiment compares four commonly utilized single intelligent models (HI-TCN, HI-ELM, HI-GRU, and HI-BPNN) with the HI-Q-TEG method. The MAE values for the four single intelligent models across three datasets are displayed in Figure 4, while Table 4 presents the performance evaluation indexes for all four models. In addition, Figures 5-7 provide the forecasting results and errors of HI-Q-TEG, HI-TCN, HI-ELM, and HI-GRU across the three datasets. The effectiveness of the Q-Learning ensemble method is presented in Table 5, which highlights the improvement percentages of each method. Notably, the bolded data within the table represents the model evaluation results that resulted in the lowest forecasting error for the respective dataset.   The values in bold represents the model evaluation results that resulted in the lowest forecasting error.     The values in bold represents the model evaluation results that resulted in the lowest forecasting error.    This part of the experiment compares four decomposition algorithms (WPD, EMD, EEMD, and CEEMDAN) by showcasing their improvement percentages of three performance evaluation indexes for different datasets in Table 6. Additionally, Figures 8-10 depict scatter diagram comparisons between the HI-CEEMDAN-Q-TEG method and other decomposition models. The closer the scatter plot points are to the diagonal line, the better the prediction effect of the corresponding model. From Table 6 and Figures 8-10, the following conclusions could be drawn: • When comparing models that utilize decomposition algorithms to those that do not, consistent improvements in percentage can be observed. For instance, comparing HI-   This part of the experiment compares four decomposition algorithms (WPD, EMD, EEMD, and CEEMDAN) by showcasing their improvement percentages of three performance evaluation indexes for different datasets in Table 6. Additionally, Figures 8-10 depict scatter diagram comparisons between the HI-CEEMDAN-Q-TEG method and other decomposition models. The closer the scatter plot points are to the diagonal line, the better the prediction effect of the corresponding model. From Table 6 and Figures 8-10, the following conclusions could be drawn: • When comparing models that utilize decomposition algorithms to those that do not, consistent improvements in percentage can be observed. For instance, comparing HI-  The findings from Figures 4-7 and Tables 4 and 5 support the following conclusions: • The prediction performance of the same single models varied across different datasets due to varying volatility and nonlinearity, as evidenced by the differing precision orders for the same dataset across different performance evaluation indexes. However, overall, HI-TCN, HI-ELM, and HI-GRU exhibited the best prediction accuracy across three different datasets, respectively, with HI-TCN producing the most accurate predictions for Dataset #1, HI-ELM for Dataset #2, and HI-GRU for Dataset #3. Thus, incorporating the three mentioned single models as base learners for the ensemble method is recommended; • The Q-Learning ensemble algorithm yielded improved forecasting accuracy for load power compared to single intelligent models.

Experimental Results of Part III
This part of the experiment compares four decomposition algorithms (WPD, EMD, EEMD, and CEEMDAN) by showcasing their improvement percentages of three performance evaluation indexes for different datasets in Table 6. Additionally, Figures 8-10 depict scatter diagram comparisons between the HI-CEEMDAN-Q-TEG method and other decomposition models. The closer the scatter plot points are to the diagonal line, the better the prediction effect of the corresponding model. The proposed decomposition model that is based on the CEEMDAN algorithm provides better forecasting outcomes than other decomposition algorithms. For Dataset #2, the improvement percentage of MAE for HI-WPD-Q-TEG, HI-EMD-Q-TEG, HI-EEMD-Q-TEG, and HI-CEEMDAN-Q-TEG is 25.8923%, 20.9483%, 19.9478%, and 38.3934%, respectively. The CEEMDAN algorithm is highly effective at decomposing both high and low-frequency data, allowing for better handling of the high volatility of raw data. This results in optimal performance for forecasting.     MAE for HI-WPD-Q-TEG, HI-EMD-Q-TEG, HI-EEMD-Q-TEG, and HI-CEEMDAN-Q-TEG is 25.8923%, 20.9483%, 19.9478%, and  38.3934%, respectively. The CEEMDAN algorithm is highly effective at decomposing both high and low-frequency data, allowing for better handling of the high volatility of raw data. This results in optimal performance for forecasting.

Conclusions
Load forecasting is crucial for maintaining the stable operation of the power grid. This paper proposes an outlier correction, decomposition, and ensemble reinforcement learning model for load power prediction. The HI-CEEMDAN-Q-TEG model uses the HI outlier correction method to eliminate outliers. The CEEMDAN decomposition method is employed to break down raw load power data into various subseries to reduce volatility. Furthermore, the commonly used reinforcement learning method Q-learning is utilized to generate optimal weights by combining the forecasting results of three single models: TCN, ELM, and GRU. Based on the aforementioned experiments, some conclusions can be drawn as followed: 1. The utilization of HI significantly improves prediction accuracy. HI detects and eliminates outliers in the original data, reducing their interference in model training, improving its data fitting ability, and ultimately enhancing its forecasting performance; 2. Using TCN, ELM, and GRU as the base learners confer significant advantages, and the ensemble model employing the Q-learning method yields superior forecasting performance compared to individual base learners. As a type of reinforcement learning method, the Q-learning optimizes the weights of base learners via trial and error within the given environment; 3. Out of the four decomposition algorithms examined in this study, CEEMDAN exhibited superior forecasting performance. Unlike the other algorithms, CEEMDAN effectively handles non-stationary data and mitigates the impact of unsteady compo-

Conclusions
Load forecasting is crucial for maintaining the stable operation of the power grid. This paper proposes an outlier correction, decomposition, and ensemble reinforcement learning model for load power prediction. The HI-CEEMDAN-Q-TEG model uses the HI outlier correction method to eliminate outliers. The CEEMDAN decomposition method is employed to break down raw load power data into various subseries to reduce volatility. Furthermore, the commonly used reinforcement learning method Q-learning is utilized to generate optimal weights by combining the forecasting results of three single models: TCN, ELM, and GRU. Based on the aforementioned experiments, some conclusions can be drawn as followed: 1. The utilization of HI significantly improves prediction accuracy. HI detects and eliminates outliers in the original data, reducing their interference in model training, improving its data fitting ability, and ultimately enhancing its forecasting performance; 2. Using TCN, ELM, and GRU as the base learners confer significant advantages, and the ensemble model employing the Q-learning method yields superior forecasting performance compared to individual base learners. As a type of reinforcement learning method, the Q-learning optimizes the weights of base learners via trial and error within the given environment; 3. Out of the four decomposition algorithms examined in this study, CEEMDAN exhibited superior forecasting performance. Unlike the other algorithms, CEEMDAN ef-

Conclusions
Load forecasting is crucial for maintaining the stable operation of the power grid. This paper proposes an outlier correction, decomposition, and ensemble reinforcement learning model for load power prediction. The HI-CEEMDAN-Q-TEG model uses the HI outlier correction method to eliminate outliers. The CEEMDAN decomposition method is employed to break down raw load power data into various subseries to reduce volatility. Furthermore, the commonly used reinforcement learning method Q-learning is utilized to generate optimal weights by combining the forecasting results of three single models: TCN, ELM, and GRU. Based on the aforementioned experiments, some conclusions can be drawn as followed: 1.
The utilization of HI significantly improves prediction accuracy. HI detects and eliminates outliers in the original data, reducing their interference in model training, improving its data fitting ability, and ultimately enhancing its forecasting performance; 2.
Using TCN, ELM, and GRU as the base learners confer significant advantages, and the ensemble model employing the Q-learning method yields superior forecasting performance compared to individual base learners. As a type of reinforcement learn-

Appendix A
The main experimental parameters of our hybrid model are given in Table A1.