Econometric and RBF Neural Network Models for Analyzing Automobile Demand in Iran

Identifying factors having influential impact on automobile demand estimation has become a primary concern for automobile industry. There has been substantial development modeling for automobile demand estimation. The Regression, Time series, CHAID and RBF neural network modeling types are proposed. In this study, automobile demand is classified into three classes. The first issue to be addressed by this study is to selecting the most appropriate modeling types in each class by comparing the actual demand and estimated demand in terms of Root-Mean-Square-Error (RMSE). Results indicate that modeling can be different for different classes of automobile demand. The second aim of this study is to identify the significant indicators for estimating automobile demand in each class of Iranian market.


INTRODUCTION
Automobiles play a vital role in daily life, which makes them a subject of interest in many academic fields.Transportation planners and automobile manufacturer interested in knowing how many and what type of automobile are owned by households, what criteria are applied to the purchase of the automobile by them and how they use their vehicles.
The purpose of this study is considered in two aspects.The first is to identify modeling types that have been found consistently in previous research to affect automobile demand.Then, the indicators that are consistently encountered in previous modeling types are identified.Numerous modeling types have been constructed to estimate and forecast automobile demand and the accuracy and usefulness of these modeling types in policy analysis has, in general, increased steadily over time.The literature in this area has expanded quickly, with a trend toward increasing realism as modeling types attempt to provide more plausible explanation of consumer behavior.
Simulation model of the American automobile market was presented by Berkovec (1985).This simulation model combines a disaggregate model of household automobile number and type choice with an econometric model of used vehicle and simple models of new car.Expectation formation and forecasting of vehicle demand in Singapore were studied by Chu et al. (2004).They constructed an econometric model to test the presumed hypothesis.Fitting a multiply regression line to travel demand forecasting in Greece were presented by Varagouli et al. (2005).They identified and estimated the main variables, which affect the travel demand and developed models to predict it.The German and US-American automobile market were presented for the evaluation of the forecast models by Hülsmann et al. (2011).Their methodologies mainly consist of time series analysis and classical data mining algorithms such as decision trees.Specifying and estimating of household demand conventional gasoline automobile and alternative fuel vehicles in Shanghai based on rank-ordered data were applied by Dagsvik and Liu (2009) and demand for mini car and large car in Japan were estimated by Bonilla et al. (2012).They estimated gasoline demand for three vehicle sizes and vehicle sales.Another model that can be used for predicting demand is Chi-squared Automatic Interaction Detection (CHAID) model.Predicting for demands of health metrological information and finding the significant predictors for demand with using a decision tree method were analyzed by Oh and Kim (2010).Bozkir and Sezer (2011) predicted actual consumption demand for a specified menu in a selected date with three decision tree methods which were CART, CHAID and Microsoft decision trees.Furthermore, the Radial Basis Function (RBF) neural network in many cases was used in literature in order to estimating and forecasting.A predictive system for car fuel consumption using a RBF neural network was proposed by Wu and Liu (2012).The proposed work was consisted of fuel consumption forecasting and performance evaluation.Gan et al. (2010) presented a modeling approach to nonlinear time series that uses lines RBF network to approximate the functional coefficients of the Stated-dependent auto regression  Murat and Ceylan (2006) illustrated an Artificial Neural Network (ANN) approach based on supervised neural network for transport energy demand forecasting using Socio-economic and transportation related indicators.To meet the need for selecting the most accurate model, there are some papers in literature that authors compared two or more modeling types.Wang et al. (2011) used the stepwise regression to select the most influential variables and input the influential variables and sales in adaptive network-based fuzzy inference system to obtain the forecast.Then they compared their model with Autoregressive Integrated Moving Average model (ARIMA) and ANN.Using regression modeling and CHAID decision trees to decision about school discipline were analyzed by Horner et al. (2010).Heuristic models and Markov model for forecasting Taiwan export was presented by Wang (2011).Furthermore, a comparison of RBF model with CHAID model and logistic regression model were analyzed by McCarty and Hastak (2007).The objective of this study is to compare comprehensive models of household automobile transportation that can provide a direct estimation and forecast of consumer demand for personal-use vehicles.Due to reviewed literature, the modeling types, which are used in this study, consist of Regression, Time series-including ARIMA and exponential smoothing, RBF neural network and CAHID.
The second purpose is to identify indicators that are consistently encountered in previous modeling types.The set of indicators found in previous research influencing automobile demand become a list of variables that the model should incorporate which consist of characteristics of automobile, consumer expenditure survey and macroeconomic variables.Due to studied research on automobile demand in cases of strategic planning, asset management, forecasting and objective analysis, some significant indicators were published by consulting firms such as Cambridge systematic, Booz-Allen and Hamilton and Mannering and Train (1985).Table 1 presents the explanatory variables that entered into previous research on automobile demand.
The modeling types, which are used in this study, consist of Regression, Time series-including ARIMA and exponential smoothing, RBF neural network and CAHID.These modeling types, particularly discrete choice models, are common in the applied Industrial Organization literature.In discrete choice models, individuals choose from a set of mutually exclusive options to gain the highest possible utility.Therefore, due to defined modeling types in this study, the most appropriate modeling type is selected by comparing actual demand and estimated demand.In these modeling types, the utilities of alternatives for individuals depend on both the consumers' demand and the characteristics of the alternatives.Some counterfactual analyses such as the effects of population attributes and purchase conditions on demands are illustrated.The present study benefits from the aforementioned estimation methodologies to give a deeper and more accurate understanding of the Iranian automobile market.As a conclusion, the two major objectives of the present study can be outlined as following.First, comparing comprehensive applied models resulted in providing estimation of household automobile demand.Second, identifying indicators having influential effects on customers' automobile demand in Iran market.

METHODOLOGY
In this section, in order to analyzing data, four modeling types are proposed.These modeling types consist of Regression model, Time series model, Neural-network model and CHAID model.In continuance, all these models were dissected and the reasons of utilizing them are represented.Some indicators, which are related to expected demand of automobile are defined and these models are run by using them.The indicators are abbreviated as the following: Regression: The first model studied is regression.This model estimates the best-fitting linear equation for predicting the output field based on the input fields.The Regression equation represents a straight line or plane minimizing the squared differences between predicted and actual output values.The reason of using regression models is that they are relatively simple and give an easily interpreted mathematical formula for generating predictions.Because Regression modeling is a longestablished statistical procedure, the properties of these models are well understood.Typically, Regression models are very fast to train.The Regression provides methods for automatic field selection to eliminate nonsignificant input fields from the equation: Time series: Time series model is the second method that we have used for analyzing data.The Time series models assess exponential smoothing and Autoregressive Integrated Moving Average (ARIMA) models and produces forecasts based on the time series data.Exponential smoothing is a way of forecasting that uses weighted values of previous series observations to predict future values.In the other side, ARIMA models provide more complicated methods for modeling trend and seasonal components than exponential smoothing models do.In particular, they allow to add benefits of including independent variables in the model.In the optimizing two above models, in this study, an Expert Modeler, which automatically identifies and estimates the best-fitting ARIMA or exponential smoothing model for one or more target variables, is used.Therefore, the Expert Modeler considers all exponential smoothing models and all ARIMA models and picks the best model among them for each target field.Forms of exponential smoothing (1) and Auto-regressive Integrated Moving Average model (2, 3) is given by formulates: (1) where  is the smoothing factor and 0 <  < 1 the smoothed statistic   is a simple weighted average of the previous observation  −1 and the previous smoothed statistic  −1 .
In these cases, the ARIMA model can be viewed as two models.The first is non-stationary: (2) The second is wide-sense stationary: (3) Given a time series of data   where  is an integer index, the   are real numbers,  is the lag operator,   are the parameters of the autoregressive part of the model,   are the parameters of the moving average part and   are error terms.

Chi-squared Automatic Interaction Detection (CHAID):
The other model that can be studied in this section is CHAID model (Van Diepen and Franses, 2006).This model first examines the cross tabulations between each of the predictor variables and the outcome and tests for significance using a chi-square independence test.If more than one of these relations is statistically significant, CHAID will select the predictor that is the most significant.If a predictor has more than two categories, these are compared and categories that show no differences in the outcome are collapsed together.This is done by successively joining the pair of categories showing the least significant difference.This category-merging process stops when all remaining categories differ at the specified testing level.The reason of using CHAID models is that they can generate non-binary trees, meaning that some splits have more than two branches.It therefore tends to create a wider tree than the binary growing methods.CHAID works for all types of predictors and it accepts both case weights and frequency variables.

Radial basis function neural network:
The last model that utilized for analyzing data is neural network.Neural networks are simple models of the way the nervous system operates.The basic units are neurons, which are typically organized into layers, as shown in the Fig. 1.
A neural network is basically a simplified model of the way the human brain processes information (Park and Sandberg, 1993).It works by simulating a large number of interconnected simple processing units that resemble abstract versions of neurons.The processing units are arranged in layers.There are typically three parts in a neural network: an input layer, with units representing the input fields; one or more hidden layers; and an output layer, with a unit or units representing the output field (s).The units are connected with varying connection strengths (or weights).Input data are presented to the first layer and values are propagated from each neuron to every neuron in the next layer.Eventually, the result is delivered from the output layer.The network learns by examining individual records, generating a prediction for each record and making adjustments to the weights whenever it makes an incorrect prediction.This process is repeated many times and the network continues to improve its ( , , , , ) (1 ) (1 ) (1 ) Two datasets are used for the demand analysis.The first dataset, gathered by the authors, contains characteristics of all automobile trends in the Iranian market-including actual demand (automobile plate number recorded in periods) and purchase price.We have used the producers' websites to collect the necessary information on the products characteristics.The second dataset is the consumer expenditure survey and macroeconomic variables which is published annually by the central bank of Iran including per capita income, automobile fuel price, leasing transaction volume, dollar vs. RLs exchange rate and consumer price index.
In following, the reasons for defining these indicators are presented.Automobile purchase price and per capita income have been used widely in the literature of the automobile demand analysis.The reasons for defining Dollar vs. RLs exchange rate, fuel efficiency and leasing transaction volume which is less cited in the literature are regional conditions, governmental policies, duopoly nature of the automobile industry in Iran and high custom duties.The significance of each indicator is investigated in the following.
Leasing: A lease is a contractual arrangement calling for the lessee to pay the less or for the use of an asset.In other words, leasing is a process by which a firm can obtain the use of certain fixed assets for which it must pay a series of contractual, periodic, tax-deductible payments.The leasing policy can be regarded as an incentive policy due to the advantages provided for automobile buyers (Amanollahi and Muhammad, 2012).Leasing policy leads to substantial increase in income level of strata and it also result in automobile demand growth , simultaneously (Menger, 2007).Hence, the leasing transaction volume is considered as an independent variable in this study.
Automobile fuel price: Regarding fuel efficiency in literature and eliminating subsides having been led to dumping fuel prices in recent years in Iran, It is expected that automobile fuel price indicator has an influential impact on automobile demand; there for, the mentioned indicator is also considered as an independent variable in automobile demand estimation.

Dollar vs. RLs exchange rate:
As far as most spare parts of the A class automobiles are imported and the B and C classes' automobiles are imported as CBU (Completely Built Units) and CKD (Completely Knock Down) in Iran, the Dollar vs. RLs exchange rate assumed as a proper indicator for automobile estimation.So the Dollar vs. RLs exchange rate is another assumed independent variable in this study.

ANALYZING RESULTS
In this section, according to price sensitivity that is important for both government and automobile consumers living in IRAN, automobile demand was classified in three price classes.First class consists of the automobiles that have price below 100 million RLs, the other class includs the automobiles that have price between 100 and 200 millon RLs, respectively.Finally, the last price class consists of the automobiles over 200 millon RLs.The price classified classes were named as A, B and C, respectively.According to scrutiny consumer financial ability and government policies, automobiles demand was analyzed in each class and conclusions about the impact indicators of each class was determined individually.
Defined indicators were input to celementine® 12.0 software as 168 monthly periods from years 1997 to 2010.For anlyzing reviewed models, actual demand was compared with estimated demand of each model.
Then, the most appropriate model for each class was selected in terms of Root-Mean-Square Error (RMSE) method (Sa-ngasoongsong et al., 2012).Actual demand and estimated demand of automobile for A, B and C classes were shown.Because of space limitations, data were depicted annually.
Actual demand for A class automobiles are obtained from recorded plaque in Traffic Police of Iran from 1997 to 2010.For determining the most appropriate model for each defined class, the estimated demand is calculated with Neural Network, Regression, Time Series and CHAID models as illustrated in Table 2, 3 and 4.  To gain a deeper deduction between applied models, the actual demand and the estimated demand of each model are depicted in Cartesian Coordinates, separately.In the proposed charts, the actual demand and the estimated demand are distinguished with dash and line curves for assumed classes, respectively (Fig. 2-13).
It should be noted that the collected data for B and C (Fig. 6-13) classes implemented in assumed models and the results are depicted in following the same as A class description (Fig. 2-5).
Furthermore, the errors between actual demand and estimated demand in each model were calculated by RMSE method to collect the most appropriate model for analyzing automobile demand in the classes.The RMSE is a frequently used measure of the differences between assess predicted by a model.The RMSE of an estimator  � with respect to the estimated parameter θ is defined as follows: The calculation of RMSE for all models is shown in Table 5.
Due to the least RMSE in default models, Table 5 shows that in the A and B classes of analyzing automobile, regression models have Estimated 100,000 300,000 500,000 400,000 600,000 700,000   All calculations were carried out by E-views software and the stability of the estimation pattern has been tested.The leasing transaction volume, per capita income, automobile purchase price, dollar vs. RLs exchange rate and fuel price efficiency are considered as independent variables in the automobile estimation demand.Numerical results of this statistical fitness are presented in following validating these statistical tests.
Variables expressed in monetary units have been deflated according to the annual consumer price index.After selection of the appropriate explanatory variables in each case, we first calculated the coefficients for each econometric model through regression analysis and then we conducted the appropriate statistical tests.

Econometric model for A class automobile demand:
The analysis periods spans over the years 1997-2010 (168 month) The model's adjustment to actual data is satisfactory with a coefficient of determination (R 2 ) equal to 97% for the regression method in the A class automobile demand.To eliminate the autocorrelation and multi-collinearity of the input data, purchase price automobile is divided by per capita income.Hence, this indicator is regarded as the share of income of individuals intended for purchasing automobile.It can be concluded that amongst four input indicators in regression method, the prc/inc indicator and leasing transaction volume are significant indicators.
In statistics, the Phillips-Perron test is a unit root test.That is used in time series analysis to test the null hypothesis that a time series is integrated of order 1.It builds on the Dickey-Fuller test of the null hypothesis  in ∆  =  −1 +   , where ∆ is the first difference operator.Like the augmented Dickey-Fuller test, the Phillips-Perron test addresses the issue that the process generating data for   might have a higher order of autocorrelation than is admitted in the test equation-making  −1 endogenous and thus invalidating the Dickey-Fuller t-test.The cointegeration analysis is subject to the integration order of time series.Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) unit root tests, examines the integration orders of variables.According to ADF and PP tests, results indicate that all variables except the price fuel efficiency, fel, are integrated of order one so that when first differenced, all would be stationary.
In this study, regression vector method used for evaluating the degree of integration between the model variables.Integration Regression Durbin-Watson testis used to determine the degree of integration between the variables.In this test the null hypothesis that the regression error terms (  ) is a random and unstable, namely: u  = u −1 + v  , v  ~in (0, σ 2 ) and the other hypothesis represents that the error terms is the first order and stable.According to the results, D-W test statistic is greater than the coefficient of determination ( 2 ), so the possibility of spurious regression is rejected.To test the hypothesis  = 0, Durbin-Watson time period i prc A utomobile purchase price in time period i les Leasing transaction volume in time period i dol Dollar vs. Rls exchange rate in time period i inc Per capita income in time period i fel A utomobile feul price in time period i cpi Consumer price index

Fig. 1 :
Fig. 1: Structure of a neural network predictions until one or more of the stopping criteria have been met.The reason of using Neural-networks models is that they are powerful general function estimators.They usually perform prediction tasks at least as well as other techniques and sometimes perform significantly better.They also require minimal statistical or mathematical knowledge to train or apply.In this study, radial basis function method of neural network used for estimating automobile demand.Data: In this study, automobiles are classified into three major classes.The price sensitivity of the government and automobile consumers in Iran are the most important reasons of classifying these classes.Safety, quality, fuel efficiency and consumers afford are the other reasons for classifying these classes.These three classes are indicated by A, B and C. The automobile industry in A class is frequently composed of domestic automobiles produced by Interior manufacturers.In contrast, of A class, most of the automobiles in B and C classes are either imported or assembled.Two datasets are used for the demand analysis.The first dataset, gathered by the authors, contains characteristics of all automobile trends in the Iranian market-including actual demand (automobile plate number recorded in periods) and purchase price.We have used the producers' websites to collect the necessary information on the products characteristics.The second dataset is the consumer expenditure survey and macroeconomic variables which is published annually by the central bank of Iran including per capita income, automobile fuel price, leasing transaction volume, dollar vs. RLs exchange rate and consumer price index.In following, the reasons for defining these indicators are presented.Automobile purchase price and per capita income have been used widely in the literature of the automobile demand analysis.The reasons for defining Dollar vs. RLs exchange rate, fuel efficiency and leasing transaction volume which is less cited in the literature are regional conditions,

Fig. 2 :Fig. 3 :
Fig. 2: Actual demand for A class with estimated demand by RBF network

Fig. 4 :Fig. 5 :
Fig. 4: Actual demand for A class with estimated demand by time series

Fig. 7 :
Fig. 7: Actual demand for B class with estimated demand by regression

Fig. 13 :
Fig. 13: Actual demand for C class with estimated demand by CHAID The model's adjustment to actual data is satisfactory with a coefficient of determination (R 2 ) equal to 97% for the regression method in the B class automobile demand.With respect to the results presented, it can be concluded that amongst four input indicators in regression method, the Dollar vs. RLs exchange rate indicator and leasing transaction volume are significant indicators.Due to existing imported or assembled automobile in this class, the exchange rate is significant indicator and the reason for selecting leasing transaction volume as a significant indicator is that consumer are owned B class automobile with half cash fund.The model's validity is tested by means of statistical and diagnostic tests similar to A class automobile demand.Radial basis function neural network model for C class automobile demand:In accordance to Fig.14, the obtained results by RBF neural network indicate that the Dollar vs. RLs exchange rate indicator is of utmost importance in comparison with the rest of indicators for C class automobile demand.Furthermore, this class of automobile is imported and due to high cost of purchase price, with a little change in the exchange rate,purchase price of the automobile willfurther change and it significantly effect on automobile demand in C class.CONCLUSIONThis study proposed three econometric models titled Time series, Regression, CHAID and an RBF neural network for automobile demand estimation in Iran market.By fitting each mentioned model to actual data for each assumed class and by contrasting the actual demand with estimated demand of each model, the most appropriate model was selected in terms of RMSE to estimate the automobile demand.Results represented that modeling can be different for different classes of automobile demand.Then, to analyze the factors having influential impacts on automobile demand five indicators were defined based on previous literature and Iranian economic and political conditions.The significance and validity of each indicator were investigated by caring out statistical and diagnostic tests on obtained results from regression and RBF neural network models.Finally, for each assumed classes the utmost important indicators were determined.

Table 1 :
Previous research on automobile demand indicators

Table 2 :
Actual demand and estimated demand for A class

Table 4 :
Actual demand and estimated demand for C class

Table 5 :
RMSE calculation in A, B and C classes Model RMSE in class A RMSE in class B RMSE in class C . The proposed model for the analysis of A class automobile demand for Iran is (with R 2 = 0.97):