Nonlinear ARIMAX model for long – term sectoral demand forecasting

Article history: Received: November 26, 2017 Received in revised format: January 31, 2018 Accepted: April 26, 2018 Available online: April 26, 2018 With the rapid increase of energy demand, it is becoming increasingly important to obtain accurate energy demand forecasts. To incorporate long time causal relationships, autoregressive with exogenous regression components models have received increasing attention from many researchers in this field. These are linear models applied through hybrid methodology of time series and econometrics, however, some recent studies find evidences that nonlinear models outperform over linear ones in long term peak demand forecasting. This paper proposed a nonlinear Auto Regressive Integrated Moving Average with Exogenous Inputs (N-ARIMAX) model to forecast sectoral peak demand using a case study of Iran. The results indicate that significant improvements in forecasting accuracy are obtained with the proposed models compared to the existing models. © 2018 by the authors; licensee Growing Science, Canada


Introduction
Power demand forecasting is an essential component for energy system planning.A demand forecast is developed for system generation and transmission planning and also allows for planning engineers to determine the type and the size of new power plants.Forecasts of power demand are generalized into three periods: hourly (or day ahead), short-term (1-3 years), and long-term (20-30 years).Taking into account the forecasting time horizon, either time series or econometrics methodology is applied to develop the forecasting models.In time series approach, demand is estimated by means of the lags of the series itself and of past lagged random errors.The related models disregard macroeconomic and microeconomic variables that might affect the dynamics of the demand for electricity.Mostly, they focus on the ideas of time series analysis and stochastic processes such as ARMA: Autoregressive Moving Average (an example of the most related studies is Magnano and Boland, 2007), SARIMA: Seasonal Autoregressive Integrated Moving Averages (Soares & Souza, 2006), time-varying splines (Sigauke & Bere, 2017), exponential smoothing (Taylor, 2003), chaotic time series method (Kuremoto et al., 2014), seasonal hybrid nonlinear procedures (Kazemi et al, 2011), Box-Jenkins (Nwobi-Okoye et al., 2015) and Markov chain combined model (Gabriel et al., 2004;Wang et al., 2010).
Generally, in econometrics or causal approach econometric explanatory variables are used to forecast the expected amount of electricity that will be consumed over a given period of time.The demand is modeled as a function of some econometrics variables (e.g., GDP and population).The explanatory variables of this model are identified on the basis of correlation analysis on each of these independent variables with the demand (dependent) variable.Linear regression (Bianco, 2009;Azadeh et al., 2012;Durán et al., 2017,), Log-linear regression, and Cobb-Douglas (Von Hirschhausen & Andres, 2000) are examples of models applied in this field.Castro and Ávila Montini (2014) and also Newsham and Birt (2010) as pioneers who adopted a hybrid forecasting model using econometrics and time series approaches, namely ARX (Autoregressive with Exogenous Inputs) and ARIMAX (Auto-Regressive Integrated Moving Average with Exogenous Input) to forecast electricity consumption, respectively.The earliest methods of long-term forecasting have been developed based on these autoregressive with exogenous regression components models.
More recently with the deregulation of energy markets, more and more attention is also paid to demand forecasts with a greater time-horizon, however, for capacity and expansion planning purposes the focus is often on the long term forecasting (Glasnovic & Margeta, 2010;Porkar et al., 2010;Rastad & Nazarzadeh, 2006).To incorporate long time causal relationships, autoregressive with exogenous regression components models have received increasing attention from many researchers in this field (e.g., Sharma et al., 2012).ARX and ARIMAX models are linear models applied through hybrid methodology of time series and econometrics; however, some recent studies have provided some evidences that nonlinear models outperform over linear ones in long term peak demand forecasting.(Shakouri et al., 2006;Toksarı, 2009;Pao, 2009;Battistelli et al., 2011).Several studies have demonstrated that ANNs have produced better results when compared to other techniques (Neto, et al., 2008).A major advantage of ANNs is their potential to model non-linear data relationships but because of its underlined simulation methodology, it is not able to handle long-term data relationships (Camara et al., 2016).Whereas Artificial Neural Networks (ANNs) are used for short and mid-term forecasting (Sarduy et al., 2016), we proposed a nonlinear ARIMAX model integrating econometrics and time-series approaches for long-term peak demand forecasting of Iran.
The remainder of this section is organized as follows.Section 1.1 is devoted to present Iran`s power sectors.Justification for technological progress is described in section 1.2.Section 2 explains the procedure of model development and projection of future power demand.The concluding remarks of this study are outlined as the final section (Section 3).

Introduction to Iran`s Power Sectors
The statistics of sectoral energy demand of a region is an important factor in accurate estimation of generation expansion size and time.In this section, key energy indicators on useful energy demand for various sectors i.e., industrial, residential, services, and agriculture are explained

Industrial Sector
The industrial sector as the largest consumer of Iran's Power Ministry is responsible for approximately 33.3 percent of the total electricity consumption in Iran.For the industry sector, five exogenous variables are developed to forecast useful energy demand: (a) the industrial electricity tariff in Rials/kWh, at April 1990 values; (b) the technological progress according to subsection (1-2); (c) the industrial Value added Real Growth Rate (VGRI), at April 1990 values; (d) the number of industrial consumers; and (e) the electricity consumption in GWh.(Al-Ghandoor et al., 2008;Amarawickrama & Hunt, 2008;Awan et al., 2012;Dilaver & Hunt, 2011;Elkarmi, 2008Elkarmi, , 2008;;Hahn et al., 2009;Yang, 2004)

Residential Sector
Residential sector accounts for approximately 33.1 percentage of total electricity consumption and 81.9 percent of number of consumers.Total energy use in residential sector in Iran can be attributed to three systems: home appliances, air conditioning and lighting.
The peak demand of residential sector is influenced by numerous factors -ranging from sectoral items to socio-economic factors.According to the literature, seven explanatory variables were considered in the study: (a) average household electricity cost in thousand Rials, at April 1990 values; (b) the residential electricity tariff in Rials/kWh, at April 1990 values; (c) the electricity consumption in GWh; (d) the Gross Domestic Product (GDP) in millions Rials, at April 1990 values; (e) the population; (f) urbanization rate; (g) the technological progress; and (h) the number of residential customers.(Barakat & Al-Rashed, 1992;Dilaver & Hunt, 2011;Ghanbari et al., 2013;Kucukali & Baris, 2010;Suganthi & Samuel, 2012;Wang et al., 2012;Yang, 2004).

Commercial and Public Services Sector
The services sector is one of the main drivers of energy demand in the future because commercial activities are gaining attention in developing countries such as Iran.About 18.7 percent of total electricity consumption and 16.4 percent of number of consumers are accounted for services sector.The increase in population requires increased levels of services (e.g., healthcare, education, financial activity, legal matter and government activity), and higher levels of economic activity leads to increased disposable income, which increases the demand for leisure requirements, and hence makes energy demand increased.Hence, six econometric and demographic variables i.e., (a) GDP in millions Rials, at April 1990;(b) (Connor, 1987;Kani & Ershad, 2007;von Hirschhausen & Andres, 2000)

Justification for technological progress
The technological progress indicator is based on the idea that technological improvements are largely driven by increasing electricity prices and that this progress is irreversible.Simply speaking, price and income elasticity are different for rising and falling prices while for the latter they are close to zero due to irreversible effects of technical progress.(Haas & Schipper, 1998).
No earlier work has been reported in the field of electricity demand forecasting that takes into account the issue of irreversible efficiency improvement.Hence, we proposed a binary variable to be incorporated in the model as technological progress indicator.We define variable Tps (Technological Progress indicator) as follows: stands for price changes between periods m and m-1, according to the literature, j is considered 3, 4, and 15 in residential, services and industrial models, respectively (Daim and Oliver, 2008).In this way, two different responsible functions are provided representing parallel equations with different intercepts.This interpreted as continual price increasing leads to substitution of more efficient devices and electricity-using appliances.

Methodology and empirical specification
In this section, the procedure of model development and projection of future power demand are explained.This procedure has four phases as follows: Phase 1: Data Preprocessing In econometrics, data transformation preprocessing results in removing the effect of nuisance components, isolate temporal components of interest, and make a series stationary.A log transformation or differencing can help linearize and stabilize the series.Trend stationary (deterministic mean trend) and difference stationary (stochastic mean trend) are two popular models for nonstationary series with a trending mean.Unit root test is a powerful tool for assessing the presence of a stochastic trend in a nonstationary series.The null hypothesis that this process is a unit root process with a trend (difference stationary) is tested against the alternative that there is no unit root (trend stationary) with α level of significance.Some of sectoral electricity demand time series and related exogenous regression components are non-stationary, with a clear upward trend; hence, log transformation and differencing are applied to the data until stationarity is achieved (See Fig. 1 as an example).The results of unit root test for these times series and their selected transformation form are provided as Table 2.As seen, log transformation of industrial, residential, and agriculture peak demand time series and also differencing the logarithm of services peak demand time series yields a stationary process.The first differences of a logged time series are approximately the rates of change of the series.In this phase, we aim to find the most parsimonious model that adequately describes the data.A simple model is easier to use for forecasting and interpretation.Specification tests, complexity model comparisons, and checking for goodness-of-fit are some tasks of this phase.According to the literature, load demand time series is a realization of a discrete-time, univariate model including exogenous variables and general structure of Auto Regressive Integrated Moving Average with Exogenous Variables: ARI-MAX(R,M,Nx) model according to the following form: where t y denotes the response series; C stands for the constant term; R and M are representing of the degrees of the ARMA conditional mean model where AR and MA stand for the R-element vector of autoregressive coefficients ) ( i  and for the M-element vector of moving average coefficients ) ( j  , re- spectively.Each column of X(t,k) is a time series, where t denotes the row (time observation), k denotes the column (predictor), and Nx represents the number of exogenous variables with Bk regression coefficients.In general, the specified model structure can be a combination of conditional mean structure (e.g., ARIMAX) and conditional variance structure such as generalized autoregressive conditional heteroscedastic (GARCH) model.The general GARCH (P,Q) model is of the form: where t  is called the innovation process and stands for a sequence of uncorrelated random variables from a probability distribution such as Gaussian or Student`s t with mean zero.To find the most parsimonious model, all plausible model structures must be developed and examined in terms of complexity and goodness of fit (verification of holding all model assumptions, for example checking that the residuals are normally distributed and uncorrelated).As mentioned, data transformation of the response series and their exogenous regression components leads to the nonlinear ARIMAX namely, NARIMAX (R,M,Nx) structure.Hence, nonlinear optimization techniques (e.g., Gauss-Newton) are used to estimate the maximum likelihood parameters of a conditional mean model of NARIMAX and a conditional variance model of GARCH form.In many nonlinear regression problems, it is more practical to use numerical methods to find the solution iteratively.The Gauss-Newton is a nonlinear programming algorithm using a Taylor series expansion to approximate the nonlinear regression models.The Gauss-Newton method begins with initial or starting values for the regression parameters j  , j=0,1,2,3,…, p-1, where p is number of parameters.The initial values are denoted by ) (k j  , where the superscript in parantheses denotes the interaction number.These values may be obtained from theoretical expectations, or a least squares parameter estimation.Then, the mean responses ) , for the n cases are approximated by the linear terms in the Taylor series expansion around Note that ) (k  is the vector of the parameter starting values and the terms in brackets are the partial deviatives of the regression function, evaluated at .

  
To simplify the notation: and the mean response for the i th case becomes in this notation: where i  is error term in a regression model.By shifting the ) (k i f term to the left and denoting the difference Y , the following linear regression model approximation is obtained: The approximation model ( 8) is precisely in the form of the general linear regression model, hence, the parameters ) (k β may be estimated by ordinary leasts squares as follows (Nocedal & Wright, 2006): where ) (k b is the vector of the least squares estimated regression coefficients and ) (k D is the vector of the partial derivatives of the mean response.Therefore, the revised estimated regression coefficients  are obtained by means of:  denotes the revised estimate of j  at the end of the kth interaction.To examine whether the revised regression coefficients represent adjustments in the proper direction, Sum of Squared Errors (SSE) is calculated in each interaction as follows: This iterative process is continued until the differences between successive coefficient estimates and/or the difference between successive least squared error criterion measures become negligible.For further study, see (Nocedal & Wright, 2006).For instance, Tables 3 shows the results of model selection process of Industrial power demand using Gauss-Newton algorithm.The results of Table 3 is interpreted as conditional mean of log (Yind) depends on: one past observation (Yind (t-1)) with coefficient of -0.0461; three past innovations ( 1 where M represents the number of validation data, denoted i y , i=1, 2, 3,…, M and i y ˆ stands for forecasts.FPE statistic includes two terms, the sum of the residuals for the validation data set and a complexity penalty term that increases as the number of parameters in the model grows.The general FPE formula considering a Sum-of-squares Error (SSE) is developed as follows: where n stands for number of observations and p is number of estimated parameters.In this assessment, we used the first 21 observations (from 1990 to 2010) as training data to estimate the model, and then forecasted the next 3 periods (2010 to 2015) as validation data.PMSE and FPE measures were calculated based on the forecasts and actual data.The results of the predictive performance checking based on PMSE and FPE for the selected models are provided as Table 4.As seen, the most significant sector in terms of future peak demand is household, followed by industry, the services sector and then agriculture.The residential power demand increases from 14248.35 GW in 2012, to 34517.84 GW in 2025, and 73161.79 GW in 2040, reflecting mean annual growth of 15.21 percent.The fastest growth of energy demand is projected to occur in the residential sector.Total peak demand of Iran's agriculture sector reaches 11275.50GW in 2025 and 18310.32GW in 2040.For 2040, the residential sector is projected to increase its share of total demand to 38.99 percent and the industrial sector will grow to 36.90 percent, while the agriculture sector's share will drop to 9.76 percent of the total.Therefore, the residential segment will have the greatest electricity demand growth during the period analyzed, due to the growing number of consumers, income expansion, rising sales of electrical appliances and greater mean consumption per family.

Conclusion
Forecasts of long-term peak demand are a key requirement for informed energy planning and policy decisions to ensure energy security.This paper undertook to address this gap using a nonlinear model namely, NARIMAX.The proposed model integrates econometrics and time-series approaches as a nonlinear forecasting model.The advantage of used NARIMAX is the adjustment of a nonlinear autoregressive model including relevant exogenous variables based on economic theory, in conjunction with the adjustment of an autoregressive model for the peak demand series itself.The authors used 23 annually observations, from April 1990 to April 2015 including econometrics and peak demand historical data in order to develop forecasts up to 2040 in business-as-usual condition.The results of the performance checking based on PMSE and FPE for NARIMAX and ARIMAX models proved the superiority of NAR-IMAX model in long-term peak demand forecasting.According to Table 5, the superiority of proposing NARIMAX over possible approaches can be addressed to its outstanding features.
the urbanization rate; (c) the technological progress; (d) the services electricity tariff in Rials/kWh, at April 1990 values; (e) the number of services consumers; and (f) the service electricity consumption in GWh are considered as explanatory factor for peak demand forecasting.Public and transportation service sectors representing 11.6 and 0.2 percent of total electricity consumption respectively are considered as services sector.Agriculture Sector About 13.1 percent of total electricity sales are devoted to agriculture sector.The explanatory variables explaining the electricity demand are (a) the agriculture value added in thousand Rials, at April 1990; (b) the rural population; (c) the agriculture electricity tariff in Rials/MWh, at April 1990 values; (d) average of annual rainfall (Kringing); and (e) the agriculture electricity consumption.

Fig. 1 .
Fig. 1.Industrial peak demand time series (Yind) and its logarithm and differencing transformation form from 1990 to 2012

Table 1
Assumptions on socio-economic parameters, power demands, and development of key indicators by sector

Table 2
The results of unit root test for response times series and their selected transformation form

Table 3
The results of model selection process of industrial peak demand with coefficients of -0.2119, -0.0105, and -0.0127; and five exogenous variables (Log Nco, Iec, Tps, Diff Iet, VGRI) with coefficients of 0.2445, 9.54e-006, -0.0067, -0.0008, 0.00061.Moreover, conditional variance of log (Yind) depends on two lagged squared innovations ( To select the best fitted model, it is required to assess the predictive ability of all plausible models.The final model is the results of interaction between performance checking and structure modification process.It is recommended to use cross validation to evaluate out-of-sample forecasting ability in order to overcome overfitting problem.Dividing response series into training and validation sets, fitting a model based on training data, and assessing the forecasts of validation set in terms of Prediction Mean Square Error (PMSE) and Akaike`s Final Prediction Error (FPE) measures are cross validation phases.PMSE measures the discrepancy between model forecasts and actual data according to the Eq.(12):

Table 4
The predictive ability of ARIMAX and NARIMAX models in terms of PMSE and FPE measures Besides, to validate the selected model framework, the predictive ability of the developed models in ARIMAX framework are compared to that of developed models in the proposed NARIMAX framework in terms of PMSE and FPE measures according to Table4.By using the model selected in previous phase forecasts can be developed over a future time horizon (2015 to 2040).Fig.2showsforecasts of Iran`s industrial, residential, services, and agriculture peak demand up to 2040 in business-as-usual condition.

Table 5
The features of proposed NARIMAX model versus existing models