FORECASTING AIR PASSENGER TRAFFIC VOLUME: EVALUATING TIME SERIES MODELS IN LONG-TERM FORECASTING OF KUWAIT AIR PASSENGER DATA

Accurate estimation of air transport demand is vital for airlines, related aviation companies


Introduction
Reliable forecast of civil aviation activity is critical in the planning process of states, airports, airlines, and other relevant organizations.Having quality forecasts regarding the future passenger flow is essential for airport management for future investment decisions in the airport.Furthermore, accurate forecasts of traffic flow are also necessary for airline companies to optimally allocate their financial resources, to adapt their flight frequencies, and to adjust their price policy.Various attempts have been made over the past decades to forecast the traffic flows in various airports around the globe (Grubb and Mason [9], BaFail [5], Al-Rukaibi and Al-Mutairi [3], Bougas [6] and Xie et al. [17]).Time series forecasting involves taking models fit on historical data and using them to predict future observations.There is almost an endless supply of time series forecasting problems, including forecasting the gross domestic product of a country, the air temperature and pollution level (Tsay [16] and Montgomery et al. [13]).A wide range of time series models have been applied to forecast the number of air passengers, including the autoregressive Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data integrated moving average (ARIMA) (Grubb and Mason [9]), the exponential smoothing and Holt-Winters methods (Bougas [6]), the bagging Holt-Winters method (Dantas et al. [8]), and hybrid methods (Xie et al. [17]).Several studies have performed empirical analysis to compare the performance of various methods in forecasting airport traffic flows (Al-Rukaibi and Al-Mutairi [3]), where performance evaluation is typically based on the forecasting accuracy of the models.Grubb and Mason [9] empirically demonstrated that Holt-Winters performs better than ARIMA in forecasting the number of passengers.Similarly, Xie et al. [17] also found that ARIMA performed poorly due to the restrictive linear assumption of the model.On the other hand, Bougas [6] showed that while Holt-Winters method performs better in predicting the number of international travelers, ARIMA achieves a superior predictive performance in predicting the number of domestic travelers.
Despite several research studies on forecasting airport traffic flows have been conducted, relatively fewer researchers have paid attention to airports in the Persian Gulf region.Some exceptions are the work by BaFail [5] on forecasting air passenger numbers in Saudi Arabia, and the work by Al-Rukaibi and Al-Mutairi [3] on forecasting air traffic demand in Kuwait.Regression methods and neural network models are compared in Al-Rukaibi and Al-Mutairi [3] where the authors show that the neural networks presented a better goodness-of-fit than the regression models.Comparing to the work by Al-Rukaibi and Al-Mutairi [3], we consider a wider range of time series forecasting methods in this paper and perform a systematic comparison between the methods.
As a civil airport located in the state of Kuwait, Kuwait International Airport mainly serves Kuwait Airways and Jazeera Airways.The former is the national airline which operates both domestically and internationally, and launches flights to 34 global destinations across Asia, Europe, North America, and the Middle East, with the aim to reach more than 46 destinations (Kuw [2]).In comparison, Jazeera Airways, which was established in April 2004, is a low-cost commercial airline and the first non-Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data government owned airline in the Middle East.It serves more than 1.2 million passengers every year and flies to 17 destinations across the Middle East, Africa and Europe (Jaz [1]).Kuwait International Airport plays a vital role in the development of the country.The number of passengers (arriving and departing) handled by the airport increased from 8.8 million in 2012 to 14.8 million in 2018.
The goal of this study is to forecast the total monthly air passengers (both arriving and departing) handled by KIA from 2019 to 2023 using data up to 2018.We will compare five different time series forecasting models, including ARIMA, exponential smoothing model with error terms (ETS), Holt-Winters, Bayesian structural time series (BSTS) and the hybrid approach.ARIMA is the traditional approach to time series modeling and forecasting which makes linearity assumption.ETS and Holt-Winters are exponential smoothing concepts, whereas BSTS is a Bayesian approach.The neural network approach to time series forecasting has become increasingly popular due to their modelling flexibility.Finally, the hybrid approach combines both linear and nonlinear models and aims to borrow strength from several models.The predictive performance of the above mentioned models are compared using multiple train-test splits where the mean absolute percentage error (MAPE) is considered as the evaluation criteria.The optimal model is the one which achieves the smallest average MAPE.
The rest of this paper is structured as follows.Section 2 reviews several time series forecasting methods, the MAPE criteria for forecasting accuracy evaluation.Section 3 discusses the historical air passengers data set used in this study.Section 4 presents the results obtained from fitting the time series model to the data set.Section 5 concludes the paper with a discussion on future works and potential extensions.
where t y is the differenced time series, t e is the Gaussian error series, and are the model parameters.
To determine the optimal orders p and q of an ARIMA model, information criteria such as Akaike information criterion (AIC) and Bayesian information criterion (BIC) are commonly used.Once the model orders are determined, parameter estimation is performed typically using the principle of maximum likelihood.The estimated model can then be used to obtain point forecasts and prediction intervals.

Exponential smoothing methods
Exponential smoothing methods (Hyndman et al. [10]) provide an alternative approach to time series forecasting.The basic idea of smoothing methods is to construct forecasts of future values as weighted averages of past observations where more recent observations receive higher weights.Exponential smoothing further assumes that the weights diminish exponentially with time.The simple exponential smoothing   N N , is suitable for forecasting data with no clear trend or seasonal pattern.Holt's linear trend method extends the simple exponential smoothing by incorporating a trend component.This method can be further extended to allow both trend and seasonality components.

Holt-Winters seasonal method
Holt-Winters seasonal method is suited for time series that exhibits seasonal variations.Two variations to this method have been proposed, namely, the additive method   A A, and the multiplicative method  .

, M A
The additive method is designed to capture seasonal variations that are approximately constant over time, while the multiplicative method is more suited to time series with seasonal variations that are evolving proportionally to the level of the series.
The Holt-Winters seasonal method can be described with four equations -the forecast equation and three smoothing equations representing the trend level, the trend and the seasonal component, respectively: and the seasonal index m time periods ago (Hyndman and Athanasopoulos [11]).

Exponential smoothing model with error term (ETS)
The exponential smoothing methods that have been considered so far generate point forecasts.In order to quantify the uncertainties associated with point forecasts, state space models are developed which produce prediction intervals along with point forecasts.A state space model consists of an observation equation which describes the observed data, and state equations that govern the evolution of the unobserved components, namely, level, trend, and seasonal over time.There are two variants of state space models for each exponential smoothing method, one assumes additive error and the other multiplicative errors (Ramos et al. [14]).
Each state space model can be labeled as ETS   is the mean one-step ahead prediction of t y at time which is a deterministic function of ,


Estimation of the parameters of a state space model can be performed by maximizing the likelihood function.Information criteria such as Akaike information criterion and Bayesian information criterion can be employed to determine the optimal state space model for the given data.

Bayesian structural time series
Structural time series models are also called state space models for time series data (Brodersen et al. [7]).The general form of a structural times series model is given below:  .
, 0 , Equation ( 1) describes the relationship between the observation t y and the latent space t  and is called the observation equation.In this paper, we consider the following structural time series model which is a special case of state space representation considered in (1)-( 2 Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data a Bayesian approach consists of simulating from the posterior distribution of the latent states conditional on the full observations,  .
Sampling from this posterior is challenging due to the high dimensional nature and the correlation between the latent states.Developing efficient Bayesian computational methods has attracted much research interests.

Neural network autoregression
Artificial neural network-based time series modelling has attracted much research interest in recent years.Neural networks are composed of elementary computational units called neurons that are arranged in layers with connectivity structure.The input layer consists of the inputs to the neural network, the hidden layers perform nonlinear transformations of the inputs entered into the network, and the output layer consists of the outputs of the network.A feedforward neural network with one hidden layer with sigmoid activation function and a single output neuron for regression problems can be described using the following equations: Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data where    are the weights for the output layer.Estimation of the weights of the network is typically accomplished by minimizing some user defined cost functions.The popularity of neural network methods in various prediction and forecasting tasks has grown substantially in recent years due to advances in deep learning technologies and availability of graphical processing units.Furthermore, the universal approximation property of neural networks, that is, the ability of a neural network to approximate any continuous functions arbitrarily well given sufficient width of the network, is particularly attractive.
We consider the neural network autoregressive model

 
m proposed by Sena and Nagwani [15] in this paper.In order to apply the model, the number of previous observations p which the current observation depends upon, the number of seasonal lags P, the period m, and the number of neurons k in the hidden layer need to be specified.
Choosing the optimal number of neurons k is typically an iterative process and requires trial-and-error due to lack of theoretical basis for selection.A larger k will increase the representational power of the neural network.However, choosing a value of k that is too large may result in over-fitting and reduction of the generalisability of the model.

Hybrid model
The use of model averaging has become popular in econometric modelling.Model averaging consists of combining multiple base models to increase accuracy beyond the individual models.We construct a hybrid Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data model by combining the ARIMA model, the ETS model, the neural network model, the seasonal and trend decomposition using Loess model (STLM), the Theta model (THETAM), and the TBATS model, which are illustrated in Figure 1.The constructed hybrid model aims to incorporate the advantages of both linear (e.g., ARIMA) and nonlinear (e.g., neural network) models in order to increase prediction accuracy.
While one may consider optimizing the model weights  , i  the resulting optimization problem is computationally intensive.In this work, we consider equal weighting of the models for computational simplicity.That is, we set 6 1   i for all i.

Multiple train-test splits
As the primary goal of this work is to obtain accurate forecast of the number of air passengers, it is not sufficient that a time series model achieves adequate goodness-of-fit.Instead, it is more appropriate to evaluate a time series model based on its forecasting accuracy.We use multiple traintest splits of a time series to obtain precise evaluation of the forecasting accuracy of a model.In each train-test split, the time series model is fitted to the training set and its forecasting is performed using the unseen test data.In this work, we use eight multiple train-test splits as illustrated in Figure 2. The size of the test set is fixed at 12 months for each train-test split whereas the size of training data is subsequently incremented by 1 month.

Forecast errors
In this work, we use the mean absolute percentage error (MAPE) to evaluate the prediction accuracy of a time series model.The MAPE is a standard measure of prediction accuracy of a forecasting method.The forecast errors of each time series model are calculated using the test sets only.Let ˆ be the residual of each forecast at time h T  for forecast made at time t, the MAPE of a model is the average of the absolute value of the residual 100.or is very close to 0, it is less relevant in our case since our data contains value much larger than zero.
The following table (Asrah et al. [4]) provides a guideline on the interpretation of the forecasting accuracy of a time series model based on the MAPE criteria.In particular, a forecasting method which achieves a MAPE of 10% or less is considered highly accurate whereas a method that achieves a MAPE between 20% to 50% is considered reasonable.

Implementation
The six models described in this section are implemented using the TSstudio and the forecast packages in R. We first perform model selection using the procedure described in Subsections 2.6.1 and 2.6.2,where the model with the smallest average MAPE on multiple test sets is considered the optimal model, which is used for future prediction of air traffic flows.

Data Description
The total number of air passengers travelled through the Kuwait International Airport is obtained.The data set consists of a time series of daily total air passengers between the 1st January 2012 and 31st December 2018.The daily figures are then combined to form a time series of monthly total passengers.The resulting time series, which consists of 89 observations, can be visualized in Figure 3.We decompose the time series into the trend, seasonal, and random error components.We observe that the time series exhibit an overall increasing trend with clear yearly seasonality.We note that the peaks of the time series in each year occur in August, and February is the month which attracted the least number of travelers.The variability of the original time series is also increasing over time, which indicates nonstationarity of the time series.

Results
The six models described in Section 2 are fitted to the air passengers data set.We assess the goodness-of-fit of the models by visualizing the residuals as shown in Figure 4. We observe that all the models achieve an adequate fit to the model.The residuals appear to be temporally uncorrelated in all cases as shown in the auto-correlation plots.Although we observe slight skewness in the residuals with the exception of the ETS model, they appear to be reasonably centered around zero.We apply the multiple train-test split strategy using the MAPE criteria to evaluate the predictive performance of the models.The average MAPE on the 8 test sets for each of the 6 models are computed and shown in Table 2.
We observe that all the methods perform reasonably well according to the interpretation of MAPE set out in Table 1.The BSTS model achieves superior predictive performance with an average MAPE of 8.45%.On the other hand, the ETS, ARIMA, and neural network methods perform slightly worse than the other 3 methods.Figure 5 shows the MAPE of each method on each of the 8 test sets.We observe that on the fourth test set, the MAPE of neural network method is over 30%.Without this outlier, we would expect the neural network method performs more competitively against the other methods.In comparison, the predictive performance of the other 5 methods are incredibly consistent across the 8 test sets.This can also be verified by looking at the box-plots shown in Figure 5.
Finally, we can visualize the observed, fitted time series along with predicted values in Figure 6 for the 6 methods.We see that all methods achieve a good fit to the observed time series where both the trend and seasonality component of the time series are well captured.

Conclusions and Future Work
Strategic and tactical decisions of both the airport and airline company management depend on accurate forecast of air passenger traffic flows.In particular, accurate estimation will optimize an airport's future financial planning.The present paper compares six different time series forecasting models which were applied for estimating KIA's traffic from 2019 to 2023.To accurately evaluate the predictive performance of different methods, we used multiple train-test splits and the average MAPE as the evaluation criteria.This study concludes that the Bayesian structural time series model achieved superior forecasting accuracy and was selected as the candidate model to predict the air traffic flow from 2019 to 2023.
Additional covariates such as gross domestic product (GDP), unemployment rate, ticket prices, and other economic information may be incorporated in the models to improve the forecasting accuracy.The inclusion of covariates is also helpful for airlines or other relevant aviation companies to understand the consequences of changes in their decision variables.Furthermore, more accurate predictions may be drawn if the time period covered by the analysis is extended.Finally, in applying the hybrid method, equal weights are Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data assigned to the different models which may limit the flexibility of the resulting model and hence its predictive performance.Finding the optimal weights to combine different models is a worthwhile research problem.

N
Several exponential smoothing methods are possible by considering various combinations of trend and seasonal components.The trend component can be additive  , A additive damped  , The simple exponential smoothing, where the weights decrease exponentially as observations come from further in the past, Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data corresponds to the case where both the trend and seasonal components are non-existent.
matrix for the hidden layer, m is the number of neurons in the hidden layer,

Figure 1 .
Figure 1.Fitting hybrid model.Let it f be the forecast value of model i at time t for , ..., , 1  i and let

Figure 3 .
Figure 3. Monthly total air passengers at KIA between 2012 and 2018.From top to bottom: the original time series, the trend component, the seasonal component, and the random error component.
Holt-WintersForecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data (d) Hybrid (e) Neural networks

Figure 4 .
Figure 4.The residual plots for each of the six methods.The temporal variation, the auto correlation and the histogram of the residuals are shown.

Figure 5 .
Figure 5.The MAPE on the 8 test sets for each model.

Forecasting
air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data Forecasting Air Passenger Traffic Volume: Evaluating Time Series … 85

Figure 6 .
Figure 6.The observed and fitted time series along with the forecasted values are shown for each of the models.
Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data 2 Forecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data

Table 1 .
A guidance on the interpretation of the MAPE values

Table 2 .
The average MAPE of the 6 models evaluated on the 8 test setsForecasting air passenger traffic volume: evaluating time series models in long-term forecasting of Kuwait air passenger data