Electrical Energy Demand Forecasting Model Development and Evaluation with Maximum Overlap Discrete Wavelet Transform-Online Sequential Extreme Learning Machines Algorithms

: To support regional electricity markets, accurate and reliable energy demand ( G ) forecast models are vital stratagems for stakeholders in this sector. An online sequential extreme learning machine (OS-ELM) model integrated with a maximum overlap discrete wavelet transform (MODWT) algorithm was developed using daily G data obtained from three regional campuses (i.e., Toowoomba, Ipswich, and Springﬁeld) at the University of Southern Queensland, Australia. In training the objective and benchmark models, the partial autocorrelation function (PACF) was ﬁrst employed to select the most signiﬁcant lagged input variables that captured historical ﬂuctuations in the G time-series data. To address the challenges of non-stationarities associated with the model development datasets, a MODWT technique was adopted to decompose the potential model inputs into their wavelet and scaling coe ﬃ cients before executing the OS-ELM model. The MODWT-PACF-OS-ELM (MPOE) performance was tested and compared with the non-wavelet equivalent based on the PACF-OS-ELM (POE) model using a range of statistical metrics, including, but not limited to, the mean absolute percentage error ( MAPE% ). For all of the three datasets, a signiﬁcantly greater accuracy was achieved with the MPOE model relative to the POE model resulting in an MAPE = 4.31% vs. MAPE = 11.31%, respectively, for the case of the Toowoomba dataset, and a similarly high performance for the other two campuses. Therefore, considering the high e ﬃ cacy of the proposed methodology, the study claims that the OS-ELM model performance can be improved quite signiﬁcantly by integrating the model with the MODWT algorithm.


Introduction
To promote the application of appropriate strategic measures and provide accurate scheduling of electrical power in energy security platforms, a forecasting model that can reliably and precisely forecast the electricity energy demand (G), is required. Arising from a shift in consumer energy usage, the large fluctuations evident in G data records leave traditional machine learning models, for example, artificial neural network (ANN) [1], multivariate adaptive regression spline (MARS) [1][2][3], support vector regression (SVR) [2,3], M5 model tree [3], online sequential extreme learning machine (OS-ELM) [4], and multiple linear regression (MLR) [1] needing to improve their capability to accurately forecast the G data. To achieve this task, a data pre-processing method, to be implemented before running To outline how these goals were achieved, this paper is organized as follows. The theory of the OS-ELM and MODWT algorithms are presented in Section 2. Section 3 describes the study area, data, and methods, while model evaluation criteria, results, and discussions are shown in Section 4. Finally, the study limitations showing future work opportunities, and conclusions are summarized in Sections 5 and 6, respectively.

Online Sequential Extreme Learning Machine Model (OS-ELM)
In this paper, the machine learning data intelligent ELM-based model architecture was employed to design a single-layer feed-forward neural network (SLFN), expressed as [4,24]: where k = 1, 2, . . . ; N, M is the hidden nodes of N inputs (lagged variables generated from partial autocorrelation function (PACF) for G data or the decomposition of the lagged data resulting from MODWT) x k (t) = {x k } N k=1 ∈ R N , and output (G forecasted) y k (x) = y k N k=1 ∈ R in the training period). A high-level flowchart is presented in Figure 1 to show how these input variables were used to feed the OS-ELM model. f (.) is the activation function, c i ∈ Γ denotes the threshold of the ith hidden node, the weight vectors that connect the ith hidden node with the input and output nodes are w i = [w i1 , w i2 , . . . , w in ] T and ρ i = [ρ i1 , ρ i2 , . . . , ρ im ] T , respectively, and the term w i .x k refers to the inner product of w i and x i .
According to [24], Equation (1) can be simplified to the form below: where H = is the hidden layer output matrix of the neural The following output weight is calculated by applying the least square solution of the linear systems as follows: where H * denotes the inverse matrix of H.
With the classical ELM model, all N samples of data are used during the learning process, making the model relatively time consuming [4]; however, the OS-ELM model, such as the one developed in the present study, addresses this issue: data are only used once within the two learning stages of initialization and sequential learning [4]. The hidden layer output matrix is designed in the initialization step by allocating the input node (w i ) and the threshold (c i ) to a small piece of initial training data, while the second step of sequential learning is then launched on a one-by-one basis to stop reusing training data [4,22,23,25].
where is an input time-series vector with values; = 1,2, … , where is the decomposition level at the time ; ℎ , and , are the th level wavelet ( , ) and scaling ( , ) filters of MODWT, respectively, and denotes the width of the th level filters.

Maximum Overlap Discrete Wavelet Transform (MODWT)
Serving as a pre-processing method, the MODWT algorithm [26] was implemented before running the model to address non-stationarity issues in time-series datasets by decomposing the input data into high and low-pass filters, resulting in MODWT wavelet and scaling coefficients, respectively ( Figure 1). Basically, those components are defined as follows [6,19,26]: where X is an input time-series vector with N values; j = 1, 2, . . . J, where J is the decomposition level at the time i; h j,l and g j,l are the jth level wavelet (W j,i ) and scaling V j,i filters of MODWT, respectively, and L j denotes the width of the jth level filters.

Study Area and Data
In the present case study, the ability of the MPOE model to forecast daily electricity demand (G) was tested using electricity use data from three regional university campuses (Toowoomba, Ipswich, and Springfield) of the University of Southern Queensland (USQ), Australia. The historical data were provided by the university campus services for the periods of 1 January 2013 to 31 December 2014 for the main feed of Toowoomba, 1 September 2015 to 31 August 2016 for the main feed and Building A block of Ipswich and Springfield campuses, respectively. Data were recorded every 15-min (96 times per day) in kilowatts (kW), with a total of 70,080 values including 60 zeros for Toowoomba, 35,136 points each for Ipswich and Springfield including 30 zeros and non-zeros, respectively. Zeros were filled in by taking the average values for the points at the same time of day, across the previous month. Daily data were then obtained by summing each set of the 96 values, resulting in 730 points (days) for Toowoomba and 366 points (days) each for Ipswich and Springfield.
Descriptive statistics for the daily time-series datasets are given in Table 1, while plots of the series datasets for the three university campuses are shown in Figure 2 to demonstrate the electricity demand values recorded for each day. The current G data clearly showed large fluctuations in G values, resulting in the need to implement wavelet transformation through MODWT to address non-stationary issues.

Study Area and Data
In the present case study, the ability of the MPOE model to forecast daily electricity demand (G) was tested using electricity use data from three regional university campuses (Toowoomba, Ipswich, and Springfield) of the University of Southern Queensland (USQ), Australia. The historical data were provided by the university campus services for the periods of 1 January 2013 to 31 December 2014 for the main feed of Toowoomba, 1 September 2015 to 31 August 2016 for the main feed and Building A block of Ipswich and Springfield campuses, respectively. Data were recorded every 15-min (96 times per day) in kilowatts (kW), with a total of 70,080 values including 60 zeros for Toowoomba, 35,136 points each for Ipswich and Springfield including 30 zeros and non-zeros, respectively. Zeros were filled in by taking the average values for the points at the same time of day, across the previous month. Daily data were then obtained by summing each set of the 96 values, resulting in 730 points (days) for Toowoomba and 366 points (days) each for Ipswich and Springfield.
Descriptive statistics for the daily time-series datasets are given in Table 1, while plots of the series datasets for the three university campuses are shown in Figure 2 to demonstrate the electricity demand values recorded for each day. The current G data clearly showed large fluctuations in G values, resulting in the need to implement wavelet transformation through MODWT to address nonstationary issues.

Forecast Model Development and Validation
In this study, the proposed MPOE model and its traditional non-WT equivalent POE were developed under the MATLAB environment running on an Intel i7 processor at 3.60 GHz. The original (non-wavelet) dataset with its statistically significant lagged variables, identified using the partial autocorrelation function (PACF) operating in a 95% confidence interval, was used as an input to develop the classical POE model. Figure 3 illustrates the number of those lags used to build the model where the first two significant lags were selected using the data from all three study sites.

Forecast Model Development and Validation
In this study, the proposed MPOE model and its traditional non-WT equivalent POE were developed under the MATLAB environment running on an Intel i7 processor at 3.60 GHz. The original (non-wavelet) dataset with its statistically significant lagged variables, identified using the partial autocorrelation function (PACF) operating in a 95% confidence interval, was used as an input to develop the classical POE model. Figure 3 illustrates the number of those lags used to build the model where the first two significant lags were selected using the data from all three study sites. On the other hand, to construct the MPOE model, wavelet transformation through MODWT was employed on the individual PACF lagged variables, and the wavelet outputs (wavelet and scaling coefficients) were used along with the PACF lagged components as model inputs. The critical task in achieving a robust model with wavelet transformation is to identify the type of wavelet scaling (ℎ , / , ) filter and the decomposition level ( ). As no single technique to select the best filter and the level of decomposition can be confirmed in the literature [6,27], a trial-and-error method was employed in the present case. Defining a total of 30 wavelet filters, four different widely tested wavelet families (e.g., [5,6,12,17,19]) were used: Daubechies ( , = 1,2, ⋯ ,10), where 1 is the same as the Haar wavelet (haar); Fejer-Korovkin ( , = 4, 6,8,14,18,22) ; Coiflets ( , = 1,2, ⋯ ,5); and Symlets ( , = 2,3, ⋯ ,10) . The maximum level of decomposition ( ) was computed using Equation (6) [6,17,28]: where is the number of daily data points in this work, and () is the function that returns the nearest integer. For example, for the Toowoomba campus data, a value of = 9 was computed, so all possible levels of decomposition ( = 1,2, … 9) were tested. More details about the MODWT filter and the decomposition level are shown in Table 2. Figure 4 shows the two MODWT wavelet coefficients (WC1 and WC2) and the MODWT scaling coefficient (SC) using lag 1 data from the Toowoomba campus with the best wavelet filter ( 8 ) and decomposition level (2). The results of MODWT (wavelet and scaling coefficients) with 2 and 3 were found to be the same as those with 2 and 3 , for all study datasets, respectively. While forecasting models must go through training, validation, and testing datasets, there is no single agreed-upon scenario for data splitting [2,3,5]. Accordingly, these data were divided into On the other hand, to construct the MPOE model, wavelet transformation through MODWT was employed on the individual PACF lagged variables, and the wavelet outputs (wavelet and scaling coefficients) were used along with the PACF lagged components as model inputs. The critical task in achieving a robust model with wavelet transformation is to identify the type of wavelet scaling (h j,l /g j,l ) filter and the decomposition level (J). As no single technique to select the best filter and the level of decomposition can be confirmed in the literature [6,27], a trial-and-error method was employed in the present case. Defining a total of 30 wavelet filters, four different widely tested wavelet families (e.g., [5,6,12,17,19]) were used: Daubechies (db i , i = 1, 2, · · · , 10), where db 1 is the same as the Haar wavelet (haar); Fejer-Korovkin ( f k i , i = 4, 6,8,14,18,22); Coiflets (coi f i , i = 1, 2, · · · , 5);. and Symlets (sym i , i = 2, 3, · · · , 10). The maximum level of decomposition (J) was computed using Equation (6) [6,17,28]: where N is the number of daily data points in this work, and int(). is the function that returns the nearest integer. For example, for the Toowoomba campus data, a value of J. = 9 was computed, so all possible levels of decomposition (J = 1, 2, . . . 9) were tested. More details about the MODWT filter and the decomposition level are shown in Table 2. Figure 4 shows the two MODWT wavelet coefficients (WC1 and WC2) and the MODWT scaling coefficient (SC) using lag 1 data from the Toowoomba campus with the best wavelet filter ( f k 8 ) and decomposition level (2). The results of MODWT (wavelet and scaling coefficients) with db 2 . and db 3 were found to be the same as those with sym 2 and sym 3 , for all study datasets, respectively.   While forecasting models must go through training, validation, and testing datasets, there is no single agreed-upon scenario for data splitting [2,3,5]. Accordingly, these data were divided into 70:15:15 for training: validation: testing (Table 1). Data normalization, a very common practice in machine learning, was applied using Equation (7) to scale values down to a range of (0 1), thereby avoiding large numbers in the predictor values of datasets [29]. De-normalization was then applied on predicted data to scale those data back to their original range before models were evaluated.
The MATLAB-based OS-ELM function [23], was used to build the present OS-ELM models in this paper. The most important step in developing an OS-ELM model is the selection of the activation function ( f (.)) and the hidden neuron size (M; Equation (1)). The radial basis function (RBF) was employed as the activation function in developing the present model, while values of hidden neuron size from 1 to 100 were tested, resulting in 100 POE models for each of the three stations. Additionally, many MPOE models were developed as a result of the numbers of hidden neuron size (M), wavelet filters and decomposition levels (J); for example 100 (M) × 30 (wavelet filters) ×9(J) = 27000 models for the Toowoomba site alone. Table 2 summarizes the details of model development including those factors tested in the training period.
The model accuracy statistics of Pearson correlation coefficient (r) and root-mean square error (RMSE; kW) were used to assess the performance of the POE and MPOE models in the training and validation periods, and thereby to identify the best wavelet and model parameters ( Table 2).
where n is the total number of values of forecast (or observed) G, G f or i and G obs i are the ith forecasted and observed values, while G f or and G obs are the means of the forecasted and observed values, respectively, in the training period.
The statistical metrics of r and RMSE indicate the greatest model accuracy when they approach 1 and 0, respectively. For all the three sites, MPOE models outperformed POE models. For example, for the Toowoomba campus, the MPOE training/validation model accuracy statistics were r = 0.96/0.94 and RMSE = 7260.42/8026.74 kW with f k 8 , 2 and 90 as the best wavelet filter, decomposition level, and hidden neuron size, respectively. Comparatively, the POE training/validation model accuracy was poorer: r = 0.70/0.74 and RMSE = 17,715.42/16,284.72 kW for the best hidden neuron size of 9.
Further model accuracy indexes include the Willmott's Index (WI), Nash-Sutcliffe model efficiency coefficient (E NS ) and Legates and McCabe's Index (LM) below where values closest to 1 indicate the best performance.
, and 0 ≤ WI ≤ 1 (13) Two statistical tests were also used to show that the MPOE model performs better than the POE model. Those were Wilcoxon Signed-Rank test [42][43][44] and T test [2]. They were performed with 0.05 significance level and 2-tailed hypothesis.

Results and Discussion
Using a plethora of model accuracy statistics (i.e., Equations (9)-(15)), the capability and accuracy of the MPOE model in forecasting daily electricity demand (G) was evaluated and compared to that of a traditional POE model, both drawing on testing datasets obtained from three USQ campuses (Toowoomba, Ipswich, and Springfield). While several models were developed and evaluated in this study, only results from optimum models, selected from these several trained models, are shown in Table 3 (Table 3) demonstrated the MPOE model to have yielded a better performance than the non-wavelet POE model. Table 3. Optimum model performance in the testing phase for daily forecast horizon based on Willmott's Index (WI), Nash-Sutcliffe model efficiency coefficient (E NS ), root-mean square error (RMSE; kW), mean absolute error (MAE; kW), mean absolute percentage error (MAPE%), relative root-mean square error (RRMSE %), as well as Legates and McCabes Index (LM) for the three stations. The models in boldface are the optimal (best performing) models. Additionally, to ensure the superiority of the proposed approach and support the results introduced in Table 3, Wilcoxon Signed-Rank test and T test have been presented in Table 4 for the forecasted error statistic |FE| = G f or i − G obs i generated by the MPOE model against the |FE| generated by the POE model. With 0.05 significance level and 2-tailed hypothesis, significant results were shown in both tests (p value < 0.05). These results clearly indicate that the MOPE model receives the significance than the POE model. To further examine the success of the MPOE model over the POE model for G forecasting in the testing period, observed and forecasted values were plotted as ordinate and abscissa for each model (Figures 5 and 6), and models' absolute forecasted errors |FE| (Figure 7). Time-series plots in Figure 5 show that the wavelet models have achieved greater accuracy (closer to observed) than the wavelet-free models for all sites.
Scatterplots ( Figure 6) show the coefficient of determination R 2 and the linear regression line (G f or = aG obs + b where a is the slope and b is the ordinate intercept) between the observed and forecasted values. The greater R 2 and values of a and b closer to 1 and 0, respectively, showed that the MPOE model outperformed the POE model when using each of the campus datasets. For the Ipswich campus dataset, the MPOE model yielded R 2 = 0.93, a = 0.99 and b = 993.15, in contrast to R 2 = 0.42, a = 0.47 and b = 24601 for the POE model.  Using boxplots (Figure 7) and all station datasets, the MPOE and POE models were compared based on their 25%, 50% and 75% quartiles (lower, middle, upper line of box) as well as maximum and minimum values of the forecasted error statistic |FE| = G f or i − G obs i . Consequently, the statistical error criteria generated for the MPOE model were significantly lower than those for POE.
Overall, from the results in Tables 3 and 4, as well as Figures 5-7, we can conclude that the MPOE model has achieved better forecasting performance than the POE model by generating lower values from MAE, MAPE%, RMSE, and RRMSE% (Table 3), larger values from WI, ENS and LM (Table 3), significant p values (Table 4), closer forecasted values to the observed values ( Figure 5), better values from R 2 , a and b ( Figure 6) and lower |FE| (Figure 7). The reason behind this is that the MODWT method has successfully addressed the non-stationarity issues in time-series datasets before running the POE model to enhance the forecasting accuracy.  Using boxplots (Figure 7) and all station datasets, the MPOE and POE models were compared based on their 25%, 50% and 75% quartiles (lower, middle, upper line of box) as well as maximum and minimum values of the forecasted error statistic | | = | − | . Consequently, the statistical error criteria generated for the MPOE model were significantly lower than those for POE.  Overall, from the results in Tables 3 and 4, as well as Figures 5-7, we can conclude that the MPOE model has achieved better forecasting performance than the POE model by generating lower values from MAE, MAPE%, RMSE, and RRMSE% (Table 3), larger values from WI, ENS and LM (Table 3), significant p values (Table 4), closer forecasted values to the observed values ( Figure 5), better values from 2 , and ( Figure 6) and lower |FE| (Figure 7). The reason behind this is that the MODWT method has successfully addressed the non-stationarity issues in time-series datasets before running the POE model to enhance the forecasting accuracy.

Challenges and Future Work
While this study was the first to apply the best suitable wavelet transforms to energy forecasting datasets, thereby achieving a high performance MPOE model, some limitations should be addressed in upcoming works, in particular, the incorporation of external datasets, such as climate variables, which can be downloaded from different sources (e.g., SILO [45], the European Centre for Medium Range Weather Forecasts (ECMWF) and global numerical weather prediction models [1,46] and NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) [47][48][49][50]). These can be decomposed together with lag values using MODWT. The OS-ELM model will be then fed by the wavelet and scaling components resulting from the climate and lag variables, to develop a very highdimensional model. Although this study has used a good amount of data (730 days and 366 days), a larger power grid model should be built and evaluated using larger datasets from electricity demand (G) to support national electricity markets. This could be achieved by testing the proposed method of this work with a larger study area or incorporating new datasets from the University of Southern Queensland (study area) when these data are available. However, given the large number of input variables that would be generated by MODWT, a method to select and narrow down the best input variables or a very fast model would be necessary to speed up the development step. Accordingly,  Figure 7. Boxplots of the absolute forecasted error |FE| in the testing dataset for the three study sites with the optimal models of POE and MPOE.

Challenges and Future Work
While this study was the first to apply the best suitable wavelet transforms to energy forecasting datasets, thereby achieving a high performance MPOE model, some limitations should be addressed in upcoming works, in particular, the incorporation of external datasets, such as climate variables, which can be downloaded from different sources (e.g., SILO [45], the European Centre for Medium Range Weather Forecasts (ECMWF) and global numerical weather prediction models [1,46] and NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) [47][48][49][50]). These can be decomposed together with lag values using MODWT. The OS-ELM model will be then fed by the wavelet and scaling components resulting from the climate and lag variables, to develop a very high-dimensional model. Although this study has used a good amount of data (730 days and 366 days), a larger power grid model should be built and evaluated using larger datasets from electricity demand (G) to support national electricity markets. This could be achieved by testing the proposed method of this work with a larger study area or incorporating new datasets from the University of Southern Queensland (study area) when these data are available. However, given the large number of input variables that would be generated by MODWT, a method to select and narrow down the best input variables or a very fast model would be necessary to speed up the development step. Accordingly, different pre-processing techniques (e.g., iterative input selection (IIS) [51], grouping genetic algorithm (GGA) [52] or coral reef optimization (CRO) [53,54]), along with a fast forecasting method (e.g., deep learning strategy or long short term memory network [55]) should be integrated with MODWT to improve G data forecasting accuracy.

Conclusions
This study has developed a new energy forecasting model by integrating wavelet transformation based on MODWT with the PACF-OS-ELM model to improve the forecasting accuracy of electricity demand (G) data using the datasets from three regional campuses (Toowoomba, Ipswich, and Springfield) from the University of Southern Queensland (USQ). The MPOE model's testing phase accuracy of prediction was then evaluated and compared to that of its classical non-wavelet model equivalent (i.e., POE) using several statistical criteria including correlation coefficient (r), root-mean square error (RMSE), mean absolute error (MAE), relative root-mean square error (RRMSE%), and relative mean absolute error (MAPE%), Willmott's Index (WI), Nash-Sutcliffe efficiency coefficient (E NS ) and Legates and McCabe's Index (LM) as well as two statistical tests of Wilcoxon Signed-Rank test and T test. The MPOE model outperformed the POE model for all campus datasets.
Although better accuracy was yielded by the MPOE model developed, than the basic POE model, future work is needed to address some limitations associated with the data and methods used in this work. External datasets, such as climate variables and a pre-processing technique used to select the best inputs from those variables, such as IIS, could be employed to further reduce forecasting errors.
To sum up, accurate and reliable G forecasting can be supplied by the MPOE model, and can therefore help regional electricity markets to improve their system by delivering more precise decisions. However, improved model performance can be provided by future works that could address the challenges above.