Comparison of exponential smoothing method and autoregressive integrated moving average (ARIMA) method in predicting dengue fever cases in the city of Palembang

Forecasting is the process of making a statement about an event where the event has not been known or observed. For pharmaceutical students, learning about forecasting techniques can help determine the treatment of various diseases, one of which is dengue fever. Dengue fever is an acute disease caused by the dengue virus. Dengue fever is still a public health issue in major cities in Indonesia, one of which is Palembang. Based on the profile done by Palembang City’s Public Health Office in 2017, dengue fever cases in the area from year to year tend to fluctuates. To get the overview of the number of dengue fever cases in the upcoming years, time series forecasting methods are used, namely the Exponential Smoothing method and the Autoregressive Integrated Moving Average (ARIMA) method. Afterward, the results of predictions from the two methods are compared. Forecasting using the ARIMA method gives the smallest MSE and MAE results of 108077.877 and 172.424, respectively, compared to the Exponential Smoothing method. This means that the ARIMA method is better at predicting the number of dengue fever cases in Palembang in the coming years.


Introduction
Dengue fever is an acute disease caused by the dengue virus. This disease is commonly found in tropical and sub-tropical regions, and is widespread in many Southeast Asian countries, including Indonesia. Dengue fever in Indonesia was first reported in Surabaya in 1968, and reported in Jakarta in 1969. In 1994, cases of dengue fever have spread to 27 provinces in Indonesia. Since 1968, the number of dengue fever cases in Indonesia continues to increase. In 1968, there were 53 cases of dengue fever with 24 deaths. Whereas in 1988, there were an increase to 47,573 reports with 1,527 deaths [1].
There are three factors that play a role in the transmission of dengue virus infections, these factors are humans, viruses, and intermediary vectors. The dengue virus is transmitted to humans through the bite of the Aedesaegypti mosquito [2]. Dengue fever is still a public health issue in major cities in Indonesia, one of which is Palembang. Based on the profile done by Palembang City's Public Health Office in 2017, dengue fever cases in the area from year to year tend to fluctuates. The government is expected to be able to maximize prevention efforts in order to reduce the number of dengue fever cases in the coming years. To get the overview of the number of dengue fever cases in the upcoming years, time series forecasting methods are used. Time series forecasting method is a quantitative method used to analyze past data that has been collected regularly using the right techniques. The results can be used as a reference for forecasting data in the future [3]. In recent years, time series forecasting methods are known to be divided into two parts. One of them is the forecasting method based on traditional mathematical models such as exponential smoothing, ARIMA, non-parametric regression methods, etc.
The exponential smoothing method is one method that can be used to forecast data with a moving average. The exponential smoothing method consists of single, multiple, and more complicated methods. All of them have the same properties, where newer values are given relatively greater weight compared to previous observations. Munarsih [4] has conducted research on the application of exponential smoothing methods to predict coffee exports in Indonesia. ARIMA (Autoregressive Integrated Moving Average) is a forecasting model that generates predictions based on the synthesis of historical data patterns.
The ARIMA method was used by Hermawan [5] to predict OLR pented anomalies in the western region of Indonesia. Compared with the Holt-Winter method, the ARIMA method has better results. Rahayu,Istiomah,and Sari [6] tested the effectiveness of the Box-Jenkins and Exponential Smoothing methods to forecast the retribution of motor vehicle testing in the Department of Transportation in Klaten, and the results of the Box-Jenkins method were better at predicting data compared to the Exponential Smoothing method. In addition, study of the comparison of forecasting using the exponential smoothing holt-winter and ARIMA methods showed that the exponential smoothing holtwinter method produced a smaller error compared to the ARIMA method, or in other words the method was better than the ARIMA method [7]. The purpose of this study is to predict the number of dengue fever cases in the city of Palembang by using the exponential smoothing and ARIMA methods to further compare the results of the predictions from the two methods.

Forecasting with the Exponential Smoothing Method
The exponential smoothing method is one of the forecasting methods consisting of single, multiple, and more complicated methods. All of them have the same properties, where newer values are given relatively greater weight compared to previous observations. The exponential smoothing method is a type of forecasting technique for a moving average that weighs past data in an exponential manner so that the latest data has a greater weight in the moving average [8].

Forecasting with the Exponential Smoothing Method
The single exponential smoothing method is actually a development of a simple moving average method. If there is data from the t-observation, the forecast value at time t + 1 is: (1) where, "#$ = forecasting value to t+1, " = actual data to t, = parameter values between 0 to 1, " = forecasting value to t.

Time Series Model
Autoregressive/ Integrated/ Moving Average (ARIMA) models have been studied in depth by George Box and Gwilym Jenkins (1976). The Autoregressive (AR) model was first introduced by Yule (1926) and later developed by Walker (1931), while the Moving Average (MA) model was first used by Sluzky (1973). But it was Word (1938) which produced the theoretical foundations of the ARMA combination process [8]. 3 AR, MA, and ARMA models are models that use the assumption that the time series data analyzed is stationary. The mean and variance of time series data is constant and the covariance is not affected. In its application, more time series data are non-stationary, or in other words, integrated. To station it, differencing is done as much as d, so that the stationary process is expressed as ARIMA (p, d, q) as follows:

ARIMA Process
and ARIMA (p,d,q)(P,D,Q) S (3) where, (p,d,q) = parts that are not seasonal from the model (P,D,Q) = parts that are seasonal from the model S = periods per season There are several stages in carrying out time series analysis, namely identification of models, estimation of models, and diagnosis of models [9]. At the model identification stage, what must be considered is data stationarity. If the data is not stationary, then differencing is done several times until the data is stationary. To stationatethe data, the following equation can be used: The Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are used in the second stage. ACF and PACF are used to determine the temporary model. After the temporary model is obtained, then the estimation of good model is carried out. In this case, estimating the model is done by the least square method.

Mean Square Error (MSE)
The MSE approach is used as a measure of forecasting accuracy, as well as determining the magnitude of the average squared error of a forecasting method where, n = forecasting value " = observation valueat t > " = forecasting result's value at t After getting a good model, the model cannot be directly used as the next data forecast, but a model diagnosis must be done. This is because the model that is made is not necessarily in accordance with the data owned. Therefore, it is necessary to diagnose the model by testing whether the residual estimation results are white noise. The data used in this study is data on dengue fever cases in the city of Palembang from 2003 to 2016 obtained from the health profile of the city of Palembang [10].

Result and Discussion
The time series method is a forecasting method analysis of an estimated variable with a variable or time function. An important step in choosing the right forecasting method is consider the type of data pattern. The ARIMA and exponential smoothing methods have similarity, that both of them can analyze the univariate data. Both also assume past values and errors as the basis for forecasting in the future. However, each method has a difference such as the ARIMA method of analyzing stationary data, so the data that is not stationary must be stationary by transformation or differencing. While the exponential smoothing method can analyze a variety of stationary and non-stationary data. The weakness of the ARIMA method is the lack of ability to produce reliable long-term forecasting due to the presumption of the random-walk type model. While the exponential smoothing method requires time in determining alpha parameters by trial and error. However, there is a fairly small difference in accuracy in forecasting between exponential smoothing methods and the ARIMA method. Figure

Single Exponential Smoothing
The forecasting stage using the single exponential smoothing method begins with creating a data pattern to see the occurrence of trend and seasonal patterns in the data. From Figure 3, it can be seen that the data does not follow the trend or seasonal data patterns. The next step is to determine the estimated initial value that affects the forecasting, which depends on the length of time and the value of the parameter, which is the mean ( ).
According to Hendikawati [11] trial and error can be done to find parameter values that give the best results. The best model is obtained by entering parameter values between 0 and 1. The iteration starts by selecting α between 0.1 to 0.9. A large α value (0.9) gives a very small smoothing in forecasting, while a small α value (0.1) gives a large smoothing [8]. Furthermore, the exponential smoothing model selection was carried out using the MAE test and MSE test. The model is said to be the best if it has the smallest MAE and MSE results. From Table 1, it can be seen that the search results of α are obtained α = 0.9 which gives the smallest error value.

Autoregressive Integrated Moving Average Model (ARIMA)
The forecasting stage with the Autoregressive Integrated Moving Average Model (ARIMA) method starts with identifying data whether the variable being predicted stationary in time series or not. Stationary is done to see the values of variables over time varying around the average and constant variance. Figure 2 shows that the data has a high and different height / fluctuations each year, this indicates that the data is not stationary in the variance because it is still changing as the time changes. The stationarity of data is needed because the ARIMA model cannot be built to stationary time series data.

Figure 2.Data autocorrelation function
In Figure 2, it appears that data is declining, this shows the characteristic of autocorrelation. Data that is not stationary in the mean and variance or shows the characteristics of autocorrelation, needs to go through differencing processes. In the plot of Figure 3 and Figure 4, it appears that the bar is already inside the line, indicating that the data is stationary. The estimation process is then carried out by entering various ARIMA models consisting of parameters p, d, and q. Because the data has been differed 1 time, then the d value is written in number 1. Then the assessment stage of the ARIMA model is carried out.
Based on the ACF and PACF pair plots, a temporary model was obtained, namely the ARIMA (1,1,1) model; ARIMA (2,1,1); and ARIMA (2,1,2). ACF and PACF plots is a rough description of the relationship statistics between observational data points in a time series data. ACF and PACF give instructions regarding patterns or models from available data.Next step is the models diagnosed to get the best model using the value of Mean Squared Error (MSE) and Mean Absolute Error (MAE).

Single Exponential Smoothing VS ARIMA
The single exponential smoothing and ARIMA methods are then compared. In forecasting results with the exponential smoothing method at a value of α = 0.9 indicates the smallest error value. In the ARIMA Method the smallest error is found in the ARIMA model (2,1,2). The smallest MSE and MAE results with the single exponential smoothing method are 118298.8 and 272.4295, while the smallest MSE and MAE results with the ARIMA method are 108077.877 and 172.424. From these results, it can be seen that MSE and MAE with the ARIMA method are smaller than the single exponential smoothing method. Single exponential smoothing and ARIMA models were developed to predict the number of dengue cases in Palembang. There are two types of forecasting, namely forecasting the sample period and forecasting the post-sample period. Sample period forecasting is used to develop models and finally used to produce original forecasting that is used for planning and other purposes.

Conclusion
The smallest MSE and MAE results with the single exponential smoothing method are 118298.8 and 272.4295 while the smallest MSE and MAE results with the ARIMA method are 108077.877 and 172.424. From these results, it can be seen that MSE and MAE with the ARIMA method are smaller