Forecasting international tourist arrivals in zanzibar using box – jenkins SARIMA model

The arrival of international tourists contributes to the generation of foreign currencies and creates employment opportunities to the local people. Modelling and forecasting tourist arrivals plays a major role in tourism planning and marketing and therefore crucial for policy decision-making towards sustainable tourism development. In this paper an attempt has been made to forecast international tourist arrivals in Zanzibar using Seasonal Autoregressive Integrated Moving Average (SARIMA) model. Data from January 1995 to December 2017 covering 276 observations were used. The SARIMA (1, 1, 1) × (1, 1, 2)12 model was found to be the best fitted model on the basis of Akaike`s Information Criterion (AIC). The adequacy of the fitted model was confirmed by Ljung-Box test statistic and the model was used to generate monthly forecasts from January 2018 to December 2019 with 95% confidence interval. The forecasting performances of candidate models were evaluated on the basis of Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). The forecasts indicate that the number of tourists visiting Zanzibar is likely to keep on increasing with seasonal pattern similar to that of the original data.


Introduction
Tourism has been one of the rapidly growing and important economic sectors across the world. The sector contributes largely to the foreign exchange earnings, provides employment and investment opportunities to the destination countries due to global increase in the number of tourist arrivals [7]. Zanzibar, the semi-autonomous island within the United Republic of Tanzania and one of the Indian Ocean islands, has numerous tourist attractions which are nature-based such as tropical beaches, forest reserves, palm trees, historical and cultural sites and the historical stone town. In 2000, UNESCO declared Zanzibar's stone town the world heritage site due to diverse of the race on its architecture and tradition [11]. The island has been attracting a great number of tourists from various areas of the world such as Europe, Asia, America, Africa, Oceania and few of them from emerging markets such as Russia, Poland, China, India and Israel [13]. Tourist arrivals being vital to the economic growth, has made modelling and forecasting tourist arrivals to gain huge consideration among researchers and policy makers globally. This paper reviews few recent studies which applied different time series models in forecasting different variables including tourist arrivals. A study carried out in India applied Holt-Winters Exponential Smoothing (HWES) and ARIMA models in forecasting foreign tourist arrivals [12]. Another study carried out in Thailand attempted to forecast international visitor arrivals by using different time series methods [6]. Also the use of SARIMA and GARCH models in forecasting tourist arrivals was applied in Sri-Lanka [10], while [2] applied ARIMA model and Double exponential smoothing to forecast tourist arrivals in Kenya. [9] Applied ARIMA model in an attempt to forecast wholesale price of maize in Tanzania. [1] Used hybrid method and ARIMA model in forecasting annual data of rain precipitation in the Province of Erbil-Iraq.
In this paper an attempt has been made to model and forecast international tourist arrivals in Zanzibar using Seasonal Autoregressive Integrated Moving Average (SARIMA) model suggested by Box-Jenkins.

Material and Methods
This study used time series data of international tourist arrivals in Zanzibar from January 1995 to December 2017 covering 276 observations and data were obtained from the Office of Chief Government Statistician Zanzibar (OCGS). Data in the training set (from January 1995 to December 2016) were used for model fitting while the remaining data in the test set (from January 2017 to December 2017) were used for validation purposes. Box-Jenkins procedures were used to obtain an appropriate SARIMA model for forecasting international tourist arrivals in Zanzibar. Data analysis was carried out using R version 3.5.1 and Microsoft-Excel 2013. where is the backshift operator such that = − , Ԑ = Ԑ − , j=0,1,2,……… p= non-seasonal AR order, d= degree (order) of non-seasonal differencing, q= non-seasonal MA order, P= seasonal AR order, D= degree (order) of seasonal differencing, Q= seasonal MA order, s= the number of seasons per year, Ԑ = the white noise.

Stationarity of the Time Series
SARIMA models are defined for stationary time series, thus there was a need to check whether the data are stationary or not [3]. Time plot and Augmented Dickey Fuller (ADF) test were used to test for stationarity of the series.

Differencing Technique
The differencing technique with both seasonal and non-seasonal differencing were used to transform the series from non-stationary to stationary. Seasonal differencing of the first order was employed to remove seasonality in the given time series data while non-seasonal differencing was employed to get rid of the trend. Seasonal and non-seasonal differencing of the first order can be expressed as given in equations (1) and (2) respectively.

Box-Jenkins Procedure
Box and Jenkins model building procedure which involved four steps was employed in order to find the best model to fit within the class of SARIMA models.

Step 1: Model Identification
This step involved the tentative identification of the model order, that is identifying the values of p, P, q, Q, d, and D. According to [4], model identification in seasonal time series is very difficult and normally trial and error is used. Usually the values of p, P, q, Q, d, and D selected should be less than or equal to two. Plots of ACF and PACF were used for this purpose.

Step 2: Model estimation and Selection
This step involved the estimation model parameters identified in the first step. The method of maximum likelihood was used in this case. The model with the lowest Akaike Information Criterion (AIC) was selected as the best model to fit among the candidate SARIMA models. The Akaike information criterion (AIC) is defined by the formula: where k is the number of parameters estimated and T is the number of observations.

Step 3: Diagnostic checking
This involved checking whether the fitted model has adequately captured the information in the time series data. The Ljung-Box test statistic defined in equation (6)  This involved predicting future number of international tourist arrivals using the fitted model selected. In this case two years ahead monthly forecasts were generated. That is the forecasts from January 2018 to December 2019.

Model Validation and Performance Evaluation
One of the most important tests of any model is how well it forecasts. The comparison of the forecasted values and observed values was made in the validation period to see how close the two values were. To understand the magnitude of the forecast error, the statistics which measure the performance of forecast by using techniques of minimizing forecast errors were used. The Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) were employed in this case to evaluate the performances of the forecasting model. A good model should have small RMSE and MAPE value which is close to zero [8]. RMSE is defined by the formula while .MAPE is defined by the formula where: is the monthly observed number of international tourists, is the forecasted number of international tourists in the corresponding months, is the number of observations in the validation period.

Empirical Results
The time plot in Fig 1 shows a general upward trend over a specified period of time. This upward trend which is in fact the consistently long term increasing in the number of international tourist arrivals may be attributed partly by the efforts that have been taken by the revolutionary government of Zanzibar and stakeholders in tourism industry to improve tourism sector in the Island. A strong seasonal variations with some peaks and troughs that appear regularly every year are also indicated. An upward trend and seasonality indicated in Fig 1 implies that the time series is non-stationary. The ADF test which was used to supplement the graphical methods, had a p-value of 0.4317 at lag order 6 which is not less than α=0.05, indicating that the series is non-stationary. Both seasonal and non-seasonal difference of first order were made to transform the series from non-stationary to stationary. The ADF test, after differencing, had the pvalue of 0.01 which is less than α=0.05 indicating a stationary series as shown in Table 1. Fig 2 is the time plot after first order difference which shows that the trend and seasonal variation are greatly reduced if not eliminated at all.

Model Identification
The plots of ACF and PACF of the stationary time series were examined in order to identify the values of p, q, P and Q. The values of d and D which are the number of non-seasonal and seasonal differencing respectively are both equal to one. That is d=1 and D=1 transformed the time series from non-stationary to stationary.  (2). Also PACF has significant spike at first seasonal lag (at 1.0) suggesting possible seasonal Autoregressive terms which are SAR (1).

Model Estimation and Selection
The SARIMA (1,1,1)×(1,1,2)12 model with the lowest Akaike Information Criteria (AIC=4637.752) was selected as the best model amongst the candidate SARIMA models. The selected model also perform best in term of forecasting with the lowest RMSE and MAPE compared to other candidate SARIMA models as shown in Table 2. The coefficients of the best model estimated by using the method of Maximum Likelihood (ML) and all the coefficients are statistically significant at 5% as shown in Table 3.  The best model, that is, SARIMA (1,1,1)×(1,1,2)12 model can explicitly be written as:

ACF for Adjusted Time series
is the number of international tourist arrivals in month Ԑ is the white noise s the backshift operator such that: = − , Ԑ = Ԑ − , j=0,1,2,………

Diagnostic Checking
This involved checking whether the fitted model has adequately captured the information in the data. Residual analysis which involves graphical procedures and the Ljung-Box statistical test were used as shown in Table 4 and Fig 5 respectively. The residuals from the fitted model shown in Fig 5 seem to be random as they have nearly constant variance and zero mean indicating that the fitted model is adequate. The Ljung-Box statistical value, Q * =24.128 is insignificant because the p-value shown in Table 4.9 is 0.1913 which is greater than 0.05 level of significance suggesting that the residuals of the best model are not statistically significantly distinguishable from white noise. That means the model SARIMA (1, 1, 1) × (1, 1, 2)12 is adequate.

Model Validation and Performance Evaluation
The forecasted and observed number of international tourist arrivals were compared in the validation period that is from January 2017 to December 2017 as it can be seen in Table 4 and Fig 6. The forecasting performances of SARIMA (1, 1, 1) × (1, 1, 2)12 model was evaluated on the basis of RMSE and MAPE during the validation period as shown in Table 4.  In Table 5, the values of RMSE and MAPE are 2313.578 and 12.37756 respectively. The forecasted value from the SARIMA (1, 1, 1) × (1, 1, 2)12 model are somewhat close to the observed values as it can be seen in Fig 6. 3.6. Forecasting using SARIMA (1, 1, 1) × (1, 1, 2)12 model The best fitted model which is SARIMA (1, 1, 1) × (1, 1, 2)12 was used to generate forecasts with 95% level of confidence. Twenty four monthly forecasts that is from January 2018 to December 2019 were produced. Based on results shown in Table 6 as well as its plot in Fig 7, the number of international tourists visiting Zanzibar is expected to continue increasing with similar seasonal pattern as that of the original time series.

Conclusion
Tourism is currently one of the important sectors for economic growth in Zanzibar. Therefore, to predict future number of international tourists visiting the Island is crucial for tourism planning and marketing. In this paper Box-Jenkins procedure was applied to find appropriate SARIMA model to forecast the number of international tourist arrivals in Zanzibar using data from January 1995 to December 2017. The results show that the SARIMA (1, 1, 1) × (1, 1, 2)12 is the best fitted model and the model was used to generate monthly forecasts from January 2018 to December 2019 with 95% confidence interval. The forecasts indicate that the number of tourists visiting Zanzibar is likely to keep on increasing with seasonal pattern similar to that of the original data.

Recommendations and Policy Implications
Based on the results from this paper, the following policy recommendations have been suggested. The arrival of international tourists is characterized by upward or increasing trend but at a slow rate as indicated in Fig 1. Thus the Zanzibar Commission for Tourism (ZCT) and stakeholders in Zanzibar's tourism industry should continuously keep on upgrading the quality of tourism services and products as well as enhancing tourism marketing strategies in order to attract more international tourists from different parts of the world.
The seasonal variations was also shown in this study, thus necessary measures should be taken in order to deal with the issue of decreasing number of international tourists and tourism revenue during low season. For instance the government should encourage domestic tourism by cutting costs for tourism services and products during low seasons.