Hybrid ARIMAX quantile regression method for forecasting short term electricity consumption in east java

The need for energy supply, especially for electricity in Indonesia has been increasing in the last past years. Furthermore, the high electricity usage by people at different times leads to the occurrence of heteroscedasticity issue. Estimate the electricity supply that could fulfilled the community’s need is very important, but the heteroscedasticity issue often made electricity forecasting hard to be done. An accurate forecast of electricity consumptions is one of the key challenges for energy provider to make better resources and service planning and also take control actions in order to balance the electricity supply and demand for community. In this paper, hybrid ARIMAX Quantile Regression (ARIMAX-QR) approach was proposed to predict the short-term electricity consumption in East Java. This method will also be compared to time series regression using RMSE, MAPE, and MdAPE criteria. The data used in this research was the electricity consumption per half-an-hour data during the period of September 2015 to April 2016. The results show that the proposed approach can be a competitive alternative to forecast short-term electricity in East Java. ARIMAX-QR using lag values and dummy variables as predictors yield more accurate prediction in both in-sample and out-sample data. Moreover, both time series regression and ARIMAX-QR methods with addition of lag values as predictor could capture accurately the patterns in the data. Hence, it produces better predictions compared to the models that not use additional lag variables.


Introduction
Quantile regression is an estimation method that can overcome the data distribution that is not uniform. Data that has a non-uniform pattern is called heteroscedasticity, or a non-fulfillment of homoscedasticity assumption on regression analysis using Ordinary Least Square (OLS) method is also often called heteroscedasticity. This problem can be overcome by using the quantile regression method. The parameter estimation in the OLS method only provide a solution of the mean problem, so Koenker and Basset developed an alternative method of quantile regression [1]. Quantile regression extends the calculation of coefficient values across various quantiles, in order to provide more complete picture of the data conditions. This method is robust against outlier data, so it is highly recommended to be used to analyze a number of data that is not symmetrical and has a not homogeneous distribution. Forecasting using a quantile regression approach will produce better accuracy than the classical conditional mean method [2].
Combining two or more methods or hybrid method tend to improve the accuracy of prediction [3], so many researchers are using and developing it now. Hybrid method has been widely used in various  (QRNN) has been used to estimate the conditional density of multiperiod return value and compared with the results of the estimation using the GARCH-based Quantile Regression, and it show that the QRNN method is able to predict conditional density well [4]. Other studies apply hybrid ARIMA-LR (Linear Regression) method to predict the arrival of patients in an ER in China [5]. In 2015, Arunraj and Ahrens proposed hybrid SARIMAX-Quantile Regression model then applied it to forecast daily food sales and this study shows that hybrid SARIMAX-QR model provides better forecast accuracy for out-sample data than other individual method and provides better interval prediction [6].
This research will conduct a study about the development of hybrid models for forecasting the electricity energy consumption data in East Java. The data is suspected to have a calendar variation effect, resulting in the presence of certain patterns such as seasonal patterns, holiday effects, and effects of special days. This cause indicates that there were heteroscedasticity case in the electricity energy consumption data, so generating the forecast value of its data will not be easy. Based on this condition, this study aims to develop hybrid method using hybrid ARIMAX Quantile Regression (ARIMAX-QR) method for forecasting of electric energy consumption data in East Java. This method is expected to be able to yield forecasting model for data of electric energy consumption in East Java properly and minimize the cause of any forecasting errors. Forecasting of data on electricity consumption in East Java is very important, because it can be used as a material for distribution of electricity plan more efficient, so the distribution of electricity energy to consumers in East Java can be done optimally and fulfills the needs.

ARIMA
ARIMA is a time series analysis method composed of two models, namely autoregressive model (AR) and Moving Average (MA). ARMA model is composed of AR order p and MA order q, while ARIMA (p, d, q) is ARMA model (p, q) which get differencing as much as d. In the short-run electricity consumption data, double seasonal patterns are present in seasonal per-half hour and seasonal daily so it called has double seasonal multiplicative or Double Seasonal ARIMA patterns [7]. The Double Seasonal ARIMA model is written with ARIMA 12  1  1  1  2  2  2 ( , , )( , , ) ( , , ) ss p d q P D Q P D Q which has the following common form, In performing ARIMA model, there are several steps that must be done, such as model identification, parameter estimation and testing, model diagnostic, model selection, and forecasting [8].

Time Series Regression
The models in time series regression for the electricity forecasting are involved by trend, seasonal patterns, special days, and autocorrelated errors [7]. The effect of trend on time series regression is expressed by t. Weekly seasonal pattern consists of 7 variables i.e. Monday ( From the time series regression equation above there are 64 predictors of dummy variables. If the error does not fulfill the white noise assumption then lag of series will be used as an additional predictor variable. The selection of lag can be determined based on ACF and PACF plots [9]. Here are the steps of model building in the time series regression.

a. Parameter Estimation
Parameter estimation using Ordinary Least Square (OLS) minimizes the sum of squares of deviations or errors of observation values. The equation of a simple linear regression model is as follows [10]: with being a vector containing the parameters that will be estimated from the model. Equation 3 then differentiated to each parameter and the estimated parameter that obtained from this process can be written as, where,

b. Parameter Significance Test
Parameter significance test is done partially. Partial testing is a test that can be done as many parameters in the regression model [10]. The hypothesis testing parameters can be written as, The t statistic will be used to test the significance of the parameters,

Quantile Regression
Quantile is a technique of dividing a group of data into several equal parts, after the data is sorted from the smallest or the largest [11]. Quantile regression estimation is more robust to the outlier(s), and heteroscedasticity. It is also does not need any assumption testing. Heteroscedasticity or unequal variation implies that there is more than one slope representing the relationship between response and predictor measured on a subset of these factors. Quantile regression estimates the slope(s) from minimum to maximum response, giving a more complete explanation of the relationship between variables that are sometimes missed by other regressions [12]. The quantile regression approach assumes the various quantile functions of a Y distribution as a function of X. The use of this regression method is performed by divide or separate the data into several groups that suspected to have different alleged values on the quantile. The estimation of the response on the quantile regression uses the value of quantile- of Y with certain X. The model of a quantile regression equation can be written as, where, Quantile regression weighted  for positive error and (1 )   for negative error so the parameter estimation on quantile regression [11] can be expressed in the following form, where ˆ( )  β is a regression coefficient on quantile- .

ARIMAX Quantile Regression
This method combined some univariate methods with quantile regression approach. The forecast values are done against quantile 0.50 or often called as median estimate. There are two models of hybrid ARIMAX Quantile Regression that used in this research.

ARIMAX Quantile Regression Model 1
In this ARIMAX-QR modelling its uses the dummy variable and lag of the data as predictor variable in quantile regression. ACF and PACF plot from stationary error can be used to determine the lag. Thus, ARIMAX-QR model with additional lag variables could be written as follows: where ˆt Z is prediction value of t Y that developed using ARIMA model.

Model Evaluation
One of the purposes of forecasting is to improve the accuracy of forecast results and minimize the forecast error. This research using cross-validation approach to evaluate the model, using the model goodness criteria based on Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Median Absolute Percentage Error (MdAPE), with the following formula [13],

Data and Variable
The data used in this research was electricity energy consumption per half hour data in East Java. The data was secondary data which obtained from the subsidiary of PT. PLN (Persero) is PLN P2B (Distribution and Control Load Center) in East Java region. Period of this data was from September 1, 2015 to April 30, 2016. In analyzing this research, the data was divided into two parts: in-sample data used for modeling, and out-sample data to validate the method used. In-sample data include data from 01 September 2015 to 29 February 2016 in period, and out-sample includes data from 1 March 2016 to 30 April 2016.
The dummy variables used in this study include trend dummy, per-half hour, days and special days (which occur other than on Sunday), and can be presented in Table 1. Special days used as dummy variables (L i,t ) are special days or national holidays occurring in the period of observation of September 2015 to April 2016 which occur other than Sunday. The list of special days used in this research is presented in Table 2.

Exploration of Electricity Consumption Characteristics
The amount of electricity consumption was influenced by the needs of each customer. The diverse customer activity that causes electricity consumption in East Java is fluctuating. Based on data on electricity consumption used from September 1, 2015 until September 30, 2016 consists of 19,008 observations. The average electricity consumption in East Java is 4,064.7 MW with standard deviation of 484,9 MW. Electricity consumption in East Java is the lowest at 07.00 with electricity consumption of 3,530.0 MW. This happens because at 7:00 am many people are just leaving for work so that not many people who use electronic equipment such as television, lights, and computers. This causes the consumption of electricity is still a little not too high, while the highest consumption occurred at 18.30 with a consumption of 4,743.2 MW. It was allegedly because the public had returned home after work and used various electronic equipment so that the use of electricity consumption reached the highest consumption.
The volatility of electricity consumption was not only influenced by hours of use but also affected by day. The highest average electricity consumption occurred on Tuesday at 4,029.8 MW. Generally, electricity consumption on weekdays is higher than on weekend (Saturday and Sunday). This happens because on Saturday and Sunday offices, factories or government agencies do not use electricity as much as on weekdays, such as turning on production machines in factories, turning on computers, lights, air conditioners and so on. The lowest electricity consumption occurred on Sunday with an average of 3,639.9 MW. This is supposedly happened because on weekend (Saturday and Sunday) most people often spend time for recreation outside home, and enjoying their break time, so the electricity consumption not as much as on weekdays.

Figure 1. Time Series (a) and Mean Plot (b) of Electricity Consumption in East Java
Time series plot of electricity consumption in East Java Region shown in Figure 1. It is shown that from September 1, 2015 to April 30, 2016 the electricity consumption was increase. The electricity consumption on the weekdays from Monday to Thursday is relatively the same as shown in Figure 1b. But on Monday, from 0:30 to 05:00 and Friday at 12:00 to 12:30 shows the pattern of electricity consumption tend to be lower than the other weekdays in the same hour. On Monday at 0.30 to 05.00, electricity consumption is low, then it starts to increase significantly at 07.00 because most people started their activities. Otherwise, electricity consumption on Friday at 12.00 to 12.30 is lower than other working days because most people in East Java are doing the Friday prayer service.

Time Series Regression
Forecasting electricity consumption using time series regression was done by regressing the electricity consumption which is the response with the predictor such as dummy trend, hours, days, and special days (non-weekly national holidays). The model formed is shown in Table 3. From this model it is found that the residual still does not fulfill the white noise assumption, allegedly due to the effects of autocorrelated error. Therefore, modeling the data with adding lag of the series as a predictor was also done. The ACF pattern of the data indicates a peak at lags 48, 96, 144, ... which indicates the daily seasonal period, and the peak in lag 336, 672, 1008, ... which denotes a seasonal period of the week. The PACF pattern shows a slow down to zero patterns. So it is concluded that the data has not been stationary. The process of differencing three times lag 1, 48, and 336 was done to make the data becomes stationary. From this result, lags used as predictors are lag 1, 48, and 336 ( 1 48 336 ,, , where these lags show the seasonal patterns present in the data. In addition to these lags, also used its multiplicative lags such as lag 49, 337, 384 and 385 (   49  337  384  385 , , , Both models were used to predict the out-sample data to check its performance in forecasting the several periods ahead data. These forecast values will be compared with its actual data (out-sample). The out-sample forecast plot of each model formed is shown in Figure 2. It is shown that the forecast values obtained by time series regression model were good enough to capture the electricity consumption data. After adding lag of the data as a predictor (Figure 2b), the forecast values tend to be better to capture the pattern of fluctuations than model that do not use lag as predictor (Figure 2a).

ARIMAX Quantile Regression
As we know, there are two models of ARIMAX Quantile Regression (ARIMAX-QR) applied in this research. ARIMAX-QR model 1 use lag and dummy variables as predictors. Determination of lag that used was same as the time series regression model using lag (subsection 4.2), lag 1, 48, and 336 ( 1 48 336 ,, ) and its multiplicative lags such as lag 49, 337, 384 and 385 (   49  337  384  385 , , , ) will be used as predictors. The last ARIMAX-QR model is model that used dummy variables and predicted values of ARIMA model for the data. Electricity consumption data contains two seasonal patterns, which are per-half hour and weekly patterns so Double Seasonal ARIMA need to be formed. For convenience, this research use 48 336 ARIMA(0,1,1)(0,1,1) (0,1,1) model as ˆt Z . The model formed can be shown in Table 4.
From each ARIMAX-QR model formed, it will be used to forecast the out-sample data. The result of this process then compared to the actual as shown in Figure 3.  The predicted values generated by ARIMAX-QR model 1 (Figure 3a) show good results, where the forecast values tend to capture patterns from the data. The use of ARIMA prediction results in ARIMAX-QR model 2 aims to handle patterns that could not be captured using the decomposition model. The ARIMAX-QR model 2 prediction results as shown in Figure 3b show the disadvantages of ARIMA method, where ARIMA method are suitable only for short-term forecasting. For long-term forecasting, ARIMA tends to produce a prediction that is getting wider or farther from actual data. Hence, this model shows that the longer the period to be predicted then the results will tend to farther from the actual data. It implies model 2 yields more accurate for short-term forecasting.

Comparison between Time Series Regression and ARIMAX Quantile Regression
Based on the results at the previous chapter about the forecasting of electricity consumption in East Java using Time Series Regression (TSR) and ARIMAX Quantile Regression (ARIMAX-QR) method, Table 5 shows goodness values at out-sample data, i.e. RMSE, MAPE and MdAPE for each model. The best model formed from modelling electricity consumption data using time series regression (TSR) method is a model involving lag and dummy variables as predictors compared to models involving only dummy variables (Table 5). For modeling using hybrid ARIMAX Quantile Regression (ARIMAX-QR), the best selected model is model 1 which use dummy and lag as predictors in the quantile regression model compared to the other model. When comparing the best TSR model with the best ARIMAX-QR model it was found that the ARIMAX-QR model yields a prediction value with a smaller error rate than other models. To find out the predicted performance of the best TSR model and the best ARIMAX-QR model, an adaptive approach for RMSE, MAPE and MdAPE values of each model was used. The adaptive RMSE, MAPE and MdAPE plots of each model for 7-days ahead forecast period are shown in Figure 4.  It is shown that the accuracy of the ARIMAX Quantile Regression model is better than Time Series Regression model (Figure 4). This is indicated by a lower forecast error value than Time Series Regression model, for forecasting the next 1 day, 2 days ahead, and still low until the next 7 days forecast. PT PLN (Persero) set the benchmark maximum forecast error of 2%. Both models produce an adaptive MAPE and MdAPE less than the benchmark set by the PLN, which means both models are well used to forecast data on short-term power consumption in East Java. Based on MAPE criteria, it is also known that Time Series Regression model is good to forecast electricity load in East Java up to 6 days ahead. Otherwise, it will cause prediction up to 7 days ahead become greater than benchmark set by PLN, i.e. 2%.

Conclusion
The results show that electricity consumption in East Java is depend on hours, days, and national holidays. Forecasting the electricity consumption in East Java using time series regression with the addition of lag as predictor produces the smallest forecast error values. Otherwise, ARIMAX Quantile Regression model that involves lag and dummy variables as predictors yield more accurate forecast than the model without lag and dummy variables as predictors. Generally, based on the evaluation criteria, it could be concluded that the ARIMAX Quantile Regression model yields better forecast performance than the time series regression model. Thus, the proposed method can be a potential alternative to forecast short-term electricity in East Java. Moreover, both time series regression and ARIMAX-QR methods with additional lag values as predictors could capture accurately the patterns in the data and produce better predictions than those which not using lag of variables.