A new model for learning-based forecasting procedure by combining k-means clustering and time series forecasting algorithms

This paper aims to propose a new model for time series forecasting that combines forecasting with clustering algorithm. It introduces a new scheme to improve the forecasting results by grouping the time series data using k-means clustering algorithm. It utilizes the clustering result to get the forecasting data. There are usually some user-defined parameters affecting the forecasting results, therefore, a learning-based procedure is proposed to estimate the parameters that will be used for forecasting. This parameter value is computed in the algorithm simultaneously. The result of the experiment compared to other forecasting algorithms demonstrates good results for the proposed model. It has the smallest mean squared error of 13,007.91 and the average improvement rate of 19.83%.


INTRODUCTION
Currently, climate change affects rainfall patterns. The negative impact of changes in rainfall patterns is the occurrence of extreme floods and droughts (Hecht, 2016;Mislan et al., 2015;Strategy, 2011). Rainfall forecast information is an important requirement to support water resource management and anticipation of disasters, especially when climate change occurs (Mislan et al., 2015). Forecasting using a time series model basis aims to study previous observations based on the collected data and build a suitable forecasting model (Naim, Mahara & Idrisi, 2018). Previous studies on time series data forecasting show that the errors of forecasting are still significant and the forecasting is still inaccurate to predict rainfalls and weather. One of the reasons is because the weather data have a non-linear structure (Haviluddin & Alfred, 2014;Shrivastava et al., 2012). However, in another study, the statistical methods of rainfall forecasting have been able to produce accurate forecasts (Farajzadeh, Fard & Lotfi, 2014). Rainfall forecasting with a good and accurate method is needed to anticipate the negative impact of extreme weather (Manton et al., 2001;Yusuf & Francisco, 2017). The lack of knowledge about the future, and the term projections, whether it is short, medium or long term, make forecasting methods indispensable in planning, management, and anticipation of arising the negative impacts (Dantas, 2018). Forecasting methods that can accurately predict the future will have a significant contribution to calculate uncertainty. It allows a more efficient decision making (Hyndman & Athanasopoulos, 2014). For decades, there have been many efforts to obtain an accurate forecasting result. Researchers have also developed statistical models and forecasting methods (De Goojier, Hyndman, 2006).
The exponential smoothing algorithm is a short-term method and it is often called an inconsistent forecasting method. One example would be the case of the decrease in agricultural production in an area caused by drought. However, this exponential smoothing model will still describe an increase in its production (Burkom, Murphy & Shmueli, 2007;Hyndman et al., 2002). Forecasting using a smoothing algorithm is only effective for short term (Hameed, 2015;Ngopya, 2009). In the exponential smoothing method, the important parameter is the smoothing constant (α) representing the percentage of estimating error (Karmaker, 2017). The main weakness of this method is the process of determining the optimal smoothing constant. The evaluation of forecasting accuracy depends on the smoothing constant value. The optimal value of the constant is processed using the lowest mean absolute error, mean absolute percentage error, and root mean squared error (Karmaker, 2017;Khairina et al., 2019;Ostertagová & Ostertag, 2013). In order to determine the optimal exponential smoothing value with minimum error, forecasting is made through a trial and error method (Hameed, 2015;Karmaker, 2017;Paul, 2011). Determination of a smoothing constant through a trial and error method is considered as an ineffective method. Unsuitable smoothing constant will give inaccurate forecast result. The experimental results in previous studies indicate that a single exponential smoothing is not suitable for predicting data with trending cases or seasonal time series (Green & Armstrong, 2015;Kourentzes, Rostami-Tabar & Barrow, 2017;Lim, 2011;Prema & Rao, 2015).
The exponential smoothing method is a very successful forecasting method and widely used in theoretical research (Maia & de Carvalho, 2011;Chen & Seneviratna, 2014;Jose & Winkler, 2008;Kolassa, 2011;Kourentzes, Petropoulos & Trapero, 2014). The conducted research focuses on improving the performance and accuracy of exponential smoothing forecasting method, especially the single exponential smoothing. The proposed new model is based on the single exponential smoothing because it is a simple forecasting method that requires only small sample data and has a comprehensive statistical framework for shortterm forecasting (Khairina et al., 2019;Hyndman et al., 2002;Zhao, Mbachu & Zhang, 2019). M-Competition found that the simplest extrapolation method which is suitable for time series data forecasting is the single exponential smoothing. Its forecasting accuracy is close to 16 more complex forecasting methods (Gardner & Diaz-Saiz, 2008;Green & Armstrong, 2015). Empirical study shows that forecasting with complex and sophisticated statistical methods might be less accurate than forecasting using simple methods (Lee, Song & Mjelde, 2008).
Recently, machine learning has become popular in the world driven by the advancement and development of computers that have made high performance servers available at low cost (Dantas & Oliveira, 2018). One part of machine learning is clustering the unsupervised learning technique category (Haraty, Dimishkieh & Masud, 2015;Nataliani & Yang, 2019; with other time series data forecasting methods using real rainfall data. The graphs are used to present comparisons and analysis of the results of forecasting experiments. Finally, the conclusions are stated in 'Conclusions'.
The contributions of the paper can be summarized as follows.
• It proposes a new scheme to improve the forecasting method by clustering the data and utilize that clustering result to forecast the data.
• It proposes a learning procedure for estimating the smoothing coefficient that will be used needed on the forecasting method. This smoothing coefficient is computed in the algorithm, simultaneously.

LITERATURE REVIEW
In this section, some related works are presented. Below are the abbreviations and notations used in this paper: .,x n } is the data, where x i is the ith data and n is the number of data • V = v 1 ,...,v c is the cluster center, where v k is the kth cluster center and c is the number of cluster where z ik is the membership partition of the ith data in the kth cluster • X t : the actual data in period t • F t : the forecast data in period t • α k : the smoothing value parameter of the kth cluster • W k : the clustered data •Ŵ k : the normalized clustered data Hyndman et al. (2002) proposed a new approach to perform an automatic forecasting based on various exponential smoothing methods. The results of automatic forecasting using M-Competition data and IJF-M3 competition data show a good forecasting accuracy for short-term prediction intervals (up to about six periods ahead) -YEAR 2002(Hyndman et al., 2002. Subsequent research was carried out on the background of the importance of efficient study of temporal rainfall pattern in hydrological management. They explain that their study was carried out across the country to model a rainfall trend in Pakistan over the past six decades. For this purpose, the secondary dataset of average rainfall for 65 years was made for the period 1951 to 2015. In Pakistan, adverse consequences of rainfall had been observed, which were in the form of drought and flash floods that had a devastating effect on human settlements, water management, and agriculture. In this study, data were analyzed using a sliced functional time series model, which was a relatively new for forecasting method. The results showed a downward trend in the average rainfall across the country. The monthly forecast for the next ten years (2016-2025) was obtained along with a prediction interval of 80%. This forecast was also compared with the forecast obtained from the ARIMA model and exponential smoothing state space (ETS) (Yasmeen & Hameed, 2018).
Subsequent research was carried out concerning the time series data forecasting using the single exponential smoothing method with the error measurement methods of MAPE, MAD, and MSE. Researchers conducted nine trials to determine the most optimal smoothing constant (α), in which the test results showed that the greater of smoothing constant value gave a better forecasting accuracy. The values of MAPE, MAD, and MSE decreased along with increasing smoothing constant value. Research showed that minimum error occurred at constant optimal smoothing (α = 0.9) which resulted in MAPE of 13.1, MAD of 117.4, and MSD of 26,912.1 (Karmaker, Halder & Sarker, 2017).
Another research has compared the ability of three forecasting models using limited historical data. Based on monthly data on tourist arrivals for the period 2001 to 2013, three simple forecasting models that did not require many historical data were used for model construction, namely the single exponential smoothing model, GM (Grey Model) model (1,1), and LV (Lotka-Vottera) model. GM and LV Model were used for predicting, decision making and conditional analysis. Mathematically, GM model could be used despite of its limitation on the data in which the model could process. This model has been developed and extended to Multiple Criteria Decision Making (MCDM) (Chiou, Tzeng & Cheng, 2004;Ji, Zou & Hu, 2010;Liu & Lin, 2010). GM model is a stochastic process in which its amplitude is varied in time based on generating series rather than on the raw one. GM Model is also developed using shooting and grey differential equation and needs less data, minimum of 4 periods of data. Liu & Lin (2010). Meanwhile, Lotka-Vottera Model is developed based on the different equations of the predator and the prey (Dang et al., 2016). It could be used for prediction with limited data and proven to be better in short-term forecasting (Hung, Tsai & Wu, 2014).
The forecast results of the three models showed that the single exponential smoothing had the lowest accuracy estimation, the GM model (1,1) had better accuracy and the LV Model had the best accuracy. Based on the value results from several measurements, the error of exponential smoothing model and GM (1,1) was greater than that of LV model. This means that the accuracy of the LV model was higher than the other two models. In general, the average precision level of the LV model was 89.7%, while the GM model (1,1) and exponential smoothing model were 86.36% and 65.94%, respectively. Therefore, in addition to the LV model, the GM model (1,1) can be an alternative for short-term forecasting with limited historical data. Thus, the exponential smoothing model was not suitable to be applied in this case. This study contributed a useful statistical tool that can be applied to time series data (Dang et al., 2016).
Exponential smoothing is a method of time series data forecasting that works based on the previous estimation and the percentage of forecast errors. The main problem of this technique is determining the optimal smoothing constant. In order to minimize forecasting errors, choosing an appropriate smoothing constant value is very important. In this study, a framework is developed for selecting the optimal value for the smoothing constant which minimizes the size of the forecast error such as the mean square error (MSE) and mean absolute deviation (MAD). Experiments to determine smoothing constant in this study were carried out by trial and error methods and the use of a non-linear method was proposed based on Excel Solver. In order to validate the proposed model, this study used time series data for demand for goods with monthly periods from 2010-2016. The most optimal smoothing constants using trial and error methods were 0.31 and 0.14 with MAD and MSE values of 6.0205 and 53.4287, respectively. While for non-linear methods, the optimal smoothing constants were 0.314 and 0.143 with MAD value of 6.0199 and MSE value of 53.4286. Although both methods gave similar results, the non-linear methods were much easier to use and required less time to obtain the optimal smoothing constant (Karmaker, 2017).
Hartomo, Subanar & Winarko (2016) conducted a research on rainfall forecasting using the exponential smoothing method. The research used monthly periods rainfall data from 2003 to 2014. They proposed a new method for finding smoothing constants using the Seasonal Planting Index (SPI) algorithm with index seasonal planting (I SP ). Using I SP , the parameter of α was symbolized as α I SP which formulated as α I SP = 1 − exp(−I SP ). Here, the exponential function was chosen to determine the smoothing value (α) since the smoothing value must be between 0 <α <1. The results of the rainfall data prediction test were obtained used SPI algorithm for RMSE value of 51.37, MAE value of 35.19, MSE value of 32.05, and MAPE value of 56.25 (Hartomo, Subanar & Winarko, 2016).
Recent research has successfully improved data time series forecasting accuracy using Fuzzy Type-2 time series. This time data series model used more observation in its forecast. The model was then combined with Particle Swarm Optimization (SPO) method. Combination between PSO and Type-2 Fuzzy model was to adjust the lengths of intervals in the universe of discourse that are employed in forecasting, without adding any interval numbers. The testing result showed the effectiveness and resilience of the proposed model compared to the fuzzy time series model and conventional time series model (Singh & Borah, 2014). Another relevant research showed the improvement of time series prediction accuracy using PSO hybrid fuzzy method. This method was used to predict the unknown future value proven to reduce the means squared error (RMSE). This also improves the accuracy as compared to the other models based on fuzzy time series (Huang, Hsieh & Lin, 2019).
A bit different from the previous research, there has been research on prediction model based on machine learning to improve the prediction accuracy of the conventional method. Machine learning -based prediction was performed using Terminated Hierarchical (ETH-CNN) to predict Hierarchical CU Partition Map (HCPM). The testing result showed that the coding complexity of High Efficiency Video Coding (HEVC) intra-mode could be drastically reduced by replacing the brute-force search with ETH-CNN. This approach exceeded the other sophisticated approaches in terms of reducing the HEVC complexity (Xu et al., 2018).
A research has been conducted on improving the HEV coding efficiency by optimizing neural network on Multiframe In-loop Filter (MIF). The research has demonstrated that the approach could improve the visual quality of each encoded frame by using the adjacent frames. The testing result revealed that the MIF approach has saved 1.621% of Bjøntegaard Delta Bit-Rate (BD-BR) on average. In other words, it significantly surpassed the filter in-loop standard with other cutting-edge approaches (Li et al., 2019). The development of machine learning-based prediction is carried out by adding the intrinsic feature of the prediction model. This research uses Python tools combined with web service to process and predict the data. The testing result demonstrates better prediction accuracy compared to standard machine learning models (He et al., 2020).
Therefore, improving prediction and classification method should be performed in a Deep Neural Networks (DNNs) environment on Computer Vision (CV) which are vulnerable to Adversarial Example (AEs). This research focuses on classification method by integrating three transformation with random coefficients well-adjusted according to the number of changes in the retained sample. Compared to the 4 advanced classification methods published in the Artificial Intelligence (AI) conference for the last two years, the proposed method shows an accuracy of more than 80% (Zeng et al., 2020).
A very recent research proposes the Ocean of Things (OoT) framework for monitoring the marine environment based on IoT (Internet of Thinks) technology. The OoT framework performs temperature predictions using a cloud model. The test results show that the framework obtain good prediction accuracy (Yang et al., 2020). A different prediction approach is used to address the limited resources of socially aware networks on online buying and selling cases using virtual currency. This research proposes an Equivalent-Exchange-based data forwarding Incentive Scheme (EEIS). This framework predicts the resource status of the two parties making transactions for optimization and efficiency of the network used. The test results show that the message delivery ratio has increased significantly and the EEIS framework can address the limitations of network resources (Xiong et al., 2020). Research with a different approach was carried out for scheduling efficiency in order to overcome bottlenecks in mmWave multi-Unmanned Aerial Vehicles (UAV) communications. The testing results have proved that prediction of transmission conditions and optimization of the proposed multi-UAV communication system scheduling algorithm are able to reduce the possibility of bottlenecks and increase the spectral efficiency of multi-UAV communication (Zhao et al., 2020).
Continuous development of artificial intelligence is increasing. Further research evaluates and warns the security risks of large-scale group activities based on the random forest algorithm. This research combines several model parameters from the random forest algorithm. Optimization experiments and random forest model training experiments are used for risk analysis with a classification accuracy of up to a maximum of 0.86. It can be concluded that the random forest algorithm has a good predictive ability in risk assessment in large-scale group activities (Chen et al., 2021). Another approach uses a semi-supervised prediction model, which utilizes an unsupervised clustering algorithm to form a fuzzy partition function. It then combines it with a neural network model to construct an information prediction function. The research results show that the proposed method produces better predictive accuracy than the conventional methods (Wen et al., 2021).
Another research combined the classical time series forecasting methods and machine learning methods. Starting with validating the methodology in combining the Bootstrap Aggregating (Bagging) with Exponential Smoothing method (Bergmeir, Hyndman & Benítez, 2016), this research used time series data for air freight demands which was further expanded with other time series data. After identifying previous researches on time series data forecasting in order to find aspects and problems, the new method i.e., Bagged Cluster ETS method was proposed because it uses the basic method of Bagging, Clusters and Exponential Refining.

Single exponential smoothing
Single Exponential Smoothing (SES) model has been used by some researchers in previous studies for smoothing fluctuation in sequential demand patterns to provide stable estimations (Sopipan, 2015;Pagourtzi & Assimakopoulos, 2018). SES can be used for rainfall predictions (Wichitarapongsakun et al., 2016) using Eq. (1).
where F t is the predicted rainfall at time t , X t −1 is the actual rainfall data at time t − 1 and α = [0,1] is the smoothing parameter constant, as well as, the significance or weight assigned to the data in time t − 1. If α is low, more weight will be given to the data in the past. If α is high, more weight will be given to the most recent data.

Time series clustering
The method of identification and classification of large-scale time series data is done by grouping the time series data. This type of grouping differs from the grouping process for the cross-section data, especially in determining the distance technique for each cluster (Riyadi et al., 2017). The grouping on time series data requires a clustering algorithm or procedure to form clusters. If there is a set of unlabeled data objects, the choice of the correct clustering algorithm depends on the types of data available and the purpose of using the cluster. If the data to be clustered are the time series data, it can be analyzed whether the data have discrete or real values, whether data samples are uniform, whether they are univariate or multivariate, and whether data have the same length of series. Non-uniform sample data must be converted into uniform data before clustering operations can be performed. Grouping can be done using a variety of methods, from simple samplings based on the roughest sampling interval, up to sophisticated modelling and estimation approaches (Liao, 2005). Various algorithms have been developed to classify different types of time series data. The aim of developing and modifying algorithms for static data grouping is that the time series data can be handled into static data so that the static data grouping algorithm can be used immediately (Chiou, Tzeng & Cheng, 2004). In general, the steps of grouping algorithm are described as follows.
Step 1: Starting with the initial cluster, denoted by C, it has a number of defined k clusters.
Step 2: For each time point, dissimilarity matrices are computed and all resultant matrices that have been calculated for all time points are saved for the calculation of trajectory similarity.
Step 3: In term of the generalized Ward criterion function, find cluster C that is better than cluster C. The cluster C is obtained from C by relocating one member of C p to C q or by swapping two members between C p and C q , where C p ,C q ∈ C;p,q = 1 ,2,...,k and p = q. If there are no such cluster, then stop; otherwise replace C with C and go back to Step 3. This algorithm only works for time-series which have the same length because the distance between two time-series at some intersection is unclear (a point of time where one series has no value).

k-means clustering algorithm
Clustering is a useful tool for data analysis. It is a method to find groups within data with the most similarity in the same cluster and the most dissimilarity between different clusters. One of the popular clustering algorithm is k-means algorithm (Macqueen, 1967).
Let X = {x 1 ,...,x n } be a data in a d-dimensional Euclidean space R d . For a given 2 ≤ c ≤ n, V = {v 1 ,...,v c } be the c cluster centers with Euclidean distance denoted by x i − v k and Z = [z ik ] n×c be a partition matrix, where z ik is the membership of data x i ∈ X k satisfying z ik ∈ {0,1}, c k=1 z ik = 1, ∀i, ∀k. The k-means objective function can be written as, The updating equations for memberships and cluster centers by minimizing J (Z ,V ) are as follows, The k-means clustering algorithm is described below. Algorithm 1: k-Means Clustering Input: data (X ) and cluster number (c). Given > 0 and v (0) . Let t = 1.

THE PROPOSED METHOD: LEARNING-BASED SINGLE EXPONENTIAL SMOOTHING ALGORITHM
As known in forecasting, Single Exponential Smoothing (SES) is used for data without trend or seasonal pattern. Meanwhile, Double Exponential Smoothing (DES) is used for trend data, and Triple Exponential Smoothing (TES) is used for seasonal data. Besides that, SES, DES, and TES need one (called alpha), two (called alpha and beta), and three (called alpha, beta, and gamma) parameters, respectively as their smoothing coefficients.
To simplify the seasonal pattern data, Hartomo, Subanar & Winarko (2016) proposed Exponential Smoothing Seasonal Planting Index (ESSPI) to group the data into three groups according to their seasonal planting term. There are three seasonal planting term in one year with four months long for each term, i.e., January-April, May-August, and September-December. The drawback of ESSPI is the grouping data have fixed terms for every year, even the seasonal planting period is changed for the coming years (Hartomo, Subanar & Winarko, 2016).
To overcome the drawback of ESSPI, this paper uses the clustering algorithm to group data into seasonal clusters. Since the seasonal period can be changed every year (either the length of months or the grouped months), then k-means clustering algorithm is used to group the months with similar characterization. After k-means is applied, then SES is used to forecast each clustered data. In this case, it only needs one smoothing coefficient. Thus, in this paper, a modified single exponential smoothing, called Learning-based Single Exponential Smoothing (LSES) algorithm is proposed. Figure 1 shows the idea of LSES algorithm.
The existing literature suggests that in order to find the best smoothing value is by comparing the MSEs of different smoothing values. Smoothing value with the minimum MSE is chosen as the best smoothing value. This procedure is proven not to be effective. Therefore, this study provides a procedure to obtain the smoothing value by utilizing the clustering results.
Logically, smaller smoothing value is used for data with high changes. Meanwhile, higher smoothing value is used for data with low changes. The smoothing value that is closer to zero give higher smoothing effect than the smoothing value that is closer to one. The problem is how to determine the smoothing value. In this proposed method, the k-means clustering method is combined with the SES forecasting method. The clustering method is used to group the data with similar characteristics. These clustering results will be used to estimate the smoothing value. As known, the mean of data can be used as a point estimator of the whole data. Therefore, in this method, the mean of each cluster is used to estimate the smoothing value of each cluster. Since the mean of each cluster is vary, then the data normalization of each cluster is needed, in order to make the value of each cluster is in interval [0,1]. This normalization result can be used to determine the smoothing value, 0 < α < 1, directly. The procedure to find the smoothing value is described in Algorithm 2. Algorithm 2: Procedure to find the smoothing value Input: the clustered data (W k ), k = 1,2,...,c. IF there is only one data in W k or all the elements of W k are 0, then α k = p, where p is a constant, ELSE: Step 1: For each cluster obtained from Algorithm 1, normalize each data in W k usinĝ Step 2: Compute the smoothing value for each cluster (α k ) using the average of the normalized clustered data, as follows, Output: the smoothing value for each cluster (α k ), k = 1,2,...,c. There are two computation steps in LSES, i.e., for the initialization and for the time period t . As written in Eq. (1), SES uses X t −1 and F t −1 to get F t , where F 1 is assumed to be the same with X 0 in the initialization process. In LSES, F 1 is computed from the average of clustered data obtained from X 0 . Then, in time period t , LSES counts the forecast data F t with the actual data X t −1 . Furthermore, the smoothing values obtained from Algorithm 2 might be different for each iteration, depend on the clustered data formed in each iteration. The detailed LSES algorithm is presented in Algorithm 3.
Step 1: For initialization period (t = 0, with actual data = X 0 and forecast data = F 1 ) 1. Group the actual data (X 0 ) using k-means clustering algorithm in Algorithm 1 to obtain W 0,k , k = 1,...,c. 2. For each cluster k, compute the forecasting data (F 0 ) by computing the average of each cluster (W 0,k ). All data in one cluster have the same forecasting data. 3. Find the smoothing coefficients for each cluster (α k ) using Algorithm 2. 4. For each cluster k, compute the forecasting data (F 1 ) with α k , X 0 , and F 0 using (1), as follows. F 1 = α k X 0 + (1 − α k )F 0 Step 2: For the time period t (with actual data = X t −1 and forecast data = F t ) 1. Group the actual data (X t −1 ) using k-means clustering algorithm in Algorithm 1 to obtain W t −1,k , k = 1,...,c. 2. Append W t −1,k with W k . It means that if t = 1, then W k contains of W 0,k . If t = 2, then W k contains of W 0,k and W 1,k . If t = 3, then W k contains of W 0,k , W 1,k and W 2,k , etc. 3. Find the smoothing coefficients for each cluster (α k ) using Algorithm 2. 4. For each cluster k, compute the forecasting data (F t ) with α k , X t −1 , and F t −1 using Eq.
(1), as follows, F t = α k X t −1 + (1 − α k )F t −1 5. Let t = t + 1 and go back to Step 2.1 until the prediction time t is reached. Output: forecast data (F ) For clear understanding, the flowchart for LSES algorithm is given in Fig. 2.

EXPERIMENTAL RESULTS
This section presents the experimental results for the rainfall data in Indonesia to show the performance of the proposed LSES algorithm. The rainfall data is obtained from Meteorology, Climatology, and Geophysical Agency (http://www.bmkg.go.id). This agency has the task to carry out governmental tasks in the fields of meteorology, climatology, air quality, and geophysics in accordance with applicable law and regulations. Indonesia has 34 provinces and one of them is Central Java. There are 23 climatology stations in Central Java. A climatology station records the rainfall data of one area in its scope. We use the rainfall data recorded by Adisumarmo climatology station for this experiment, starts from January 2007 until December 2019, as seen in Table 1.
According to the characteristic of annual rainfall data, the data can be divided into three categories, i.e., high, moderate, and low rainfall data, within one year (12 months). Thus, there are three clustered data (X 1 ,X 2 ,X 3 ), with c = 3. For LSES algorithm, one constant is needed, i.e., p. In this annual rainfall prediction case, this constant can be calculated with c/n c , where c is the number of clusters and n c is the number of data in one cluster. In general, if 12 months are divided into three groups, equally, then one group has four months. Therefore, the constant p = 3/4 = 0.75 is used in the computation.
The LSES algorithm is divided into two steps.
Step 1 is started by grouping the rainfall data from January-December 2007 into three clusters, using k-means clustering algorithm. The average of each cluster is computed to obtain the forecast data of January-December 2007. It means that there is the same forecast data for months in the same cluster. After that, the forecast data for January-December 2008 are computed using (1) with the actual and forecast data of January-December 2007. Here, the smoothing value for each cluster is obtained from each clustered data (W k , k =1 ,2,...,3) of January-December 2007, using Algorithm 2, therefore three smoothing values are obtained.
Step 2 is run first by grouping the data from January-December 2008 into three clusters. The corresponding clusters obtained from Step 1 and Step 2 are combined to be the clustered data (W k ,k = 1 ,2,...,3). Three clustered data are used to get the smoothing values for each cluster. Then, SES is used to forecast the data of January-December 2009.
Step 2 is continued until the year to be predicted is reached, for this case is 2020.
For comparison, LSES algorithm is compared with five other algorithms, i.e., from SES, DES, TES, Auto Arima, and ESSPI. The line chart of actual and forecasting data for all periods of SES, DES, TES, Auto Arima, ESSPI, and LSES are depicted in Figs. 3-8, respectively. The actual data is from January 2007 until December 2019 and colored by blue color, while the forecast data is from January 2009 until December 2020 with red color. The x-axis is for prediction year and the y-axis is for rainfall prediction (in millimeter).
There are some smoothing parameters needed in SES, DES, TES, and Auto Arima. For SES, DES, TES, function in Python is used to get the best smoothing parameter values. For ESSPI, since there is no parameter needed, it follows the algorithm and applies to this data. Moreover, Fig. 9 shows the plot of the actual and forecasting data for all methods in one figure.
There are some parameters needed in SES, DES, TES, and Auto Arima. For SES, DES, TES, and Auto Arima, some functions in Python are used to get the best parameter values. The parameter values needed in SES, DES, TES, and Auto Arima are listed in Tables 2, 3, 4, and 5, respectively. While for ESSPI, since there is no parameter needed, the algorithm is followed and applied to this data.
where F is the forecasting data, X is the actual data, t is time period, and n is number of time period.
The comparison results of MSE, MAE, MAD, and MASE are given in Tables 6, 7, 8, and 9 respectively. Figure 10 shows the error values in the form of graphs. The averages of MSE, MAE, MAD, and MASE for 11 prediction years from 2009 until 2019 are compared. From those tables, SES gives 13,212,77.31,152.23,0.654;DES gives 14,032.78,91.12,147.9,  TES gives 13,246.39,90.32,145.76,0.818;Auto Arima gives 17,287.5,99.73,145.69,0.901;ESSPI gives 35,866.34,128.15,152.06,1.030;and LSES gives 13,007.91,75.87,143.34,0.648 for average of MSE, MAE, MAD, and MASE, respectively. Thus, LSES obtains the smallest averages of MSE, MAE, MAD, and MASE compared with other algorithms, i.e., SES, DES, TES, Auto Arima, and ESSPI. It means that LSES provides a promising algorithm in forecasting. Moreover, coefficient of variation (CoV) is used to find the forecast stability, where CoV = σ µ with σ is the standard deviation and µis the average (mean). Smaller values of a CoV indicates stability, since the variability of the data around their mean is small. In the experiments, the rainfall data are divided into three groups with LSES, i.e., high, moderate, and low rainfall data, so the CoV is computed according to those groups. As seen from Table 10, the results of CoV for high, moderate, and low rainfall data are about 0.22, 0.25, and 1.10, respectively, which means their variations are small. Therefore, LSES is stable and can be used for forecasting data.
Since LSES obtains the best performance, therefore, LSES algorithm is used to predict the rainfall in 2020 and compare it to the actual data of 2020. The result is shown in Table 11. Moreover, the predictions obtained from SES, DES, TES, Auto Arima, and ESSPI are also    Furthermore, the experiment is extended to better reflect the value of the presented network intrusion detection model. LSES is used to investigate the applicability of the model through a real case study of intrusion detection system. The data is obtained from Canadian Institute for Cybersecurity (https://www.unb.ca/cic/datasets/). In this data, a two-layered approach is used to generate benign and darknet traffic constitutes Audio-Stream, Browsing, Chat, Email, P2P, Transfer, Video-Stream, and VOIP which is generated at the second layer. Intrusion detection can be analyzed and identified visually by three features, i.e., average packet size, total length of forward packets, and total length of backward packets. This experiment uses four attributes, i.e., src_port (source port), dst_port (destination port), timestamp, and total_fwd_packet (total of forward packet), where the total of forward packet is being predicted (Lopez, 2019). Data with a unique combination of src_port, dst_port, and timestamp are chosen.  Table 12 shows the prediction of intrusion detection with six methods. Since LSES uses clustering results for the prediction, then the result of LSES can detect which ports have high and low values of total forward packet. Table 13 is the MSE, MAE, MAD, and MASE of SES, DES, TES, Auto Arima, ESSPI, and LSES. From Table 13, LSES gives the smallest MSE, MAE, and MASE, while for MAD, since LSES works with clustering method and MAD uses the average of all forecasting data, then the MASE for LSES cannot obtain the smallest one. Figure 11 expresses the error values in the form of graphs.

CONCLUSIONS
To sum up with, the paper proposed the learning-based single exponential smoothing (LSES) forecasting algorithm. By using k-means clustering algorithm and single exponential smoothing, LSES produce good forecasting results. This algorithm groups the data in the past by using k-means clustering algorithm, according to their characteristics. Since single exponential smoothing needs one smoothing parameter value, LSES computes this smoothing value with the clustering result by learning-based procedure, automatically. Experimental result and comparisons demonstrate the effectiveness of the proposed LSES algorithm to obtain the prediction data in the future. It has the smallest mean squared error of 13,007.91 and the average improvement rate of 19.83%. For future research, since there  is still a certain gap between the actual and forecast data of LSES, it would be better if some deep learning methods, such as MLP (Multilayer Perceptron), CNN (Convolutional Neural