COMPARISON BETWEEN SUPPORT VECTOR MACHINES AND K-NEAREST NEIGHBOR FOR TIME SERIES FORECASTING

This paper aims to use the Support Vector Machine (SVM) and K-Nearest Neighbor (K-NN) for univariate time series prediction. The main goal of this study is to compare between Support Vector Machine and K-Nearest Neighbor to predict time series data. The dataset for the monthly gold prices was used during the period from Nov 1989 – Dec-2019, which represents 362 observations. SVM and K-NN models were fitted based on 90% of data as training set, and then their accuracy was compared using the statistical measure RMSE. The results indicated that SVM was better than K-NN in predicting future gold prices, based on RMSE= 33.77.


INTRODUCTION
Time series forecasting and analysis have been a dynamic research area over the last few decades. 2343 COMPARISON BETWEEN SUPPORT VECTOR MACHINES AND K-NEAREST NEIGHBOR Different types of forecasting models have been developed, and researchers have relied on statistical techniques to predict time series data [4]. Machine learning and statistics have a lot in common work but statistics is an older field that has evolved in the field of mathematics. The main difference between machine learning and statistics is the assumption about the data. In machine learning, we often assume that the data has been generated from some unknown data generating process and different learning algorithms are used to approximate it. In Statistics, usually we assume a data model is the true data generating process and we try to estimate the parameters of this data model. So, statistics deals with models while machine learning deals with learning algorithms or procedures [14]. However, time series data often full of nonlinearity and irregularity, such as economic and financial time series. To address this, Support Vector Machine (SVM) and K-Nearest Neighbor (K-NN) methods can be used as a modern technique to overcome the problems of forecasting non-linearity and non-stationary time series data. SVM and K-NN methods, which have been used in forecasting in many fields. The use of SVM and K-NN in this domain increased because of their ability to form complex non-linear systems based on sample data. In particular, in recent years, applications of SVM and K-NN received great attention in many fields because of their enormous storage capacity and capabilities of learning and prediction. Time series prediction techniques have been used in many real-world applications such as financial market, electrical utility loads, weather and environment forecasting. In this study, some comparisons among SVM and K-NN methods will be performed. Python software was used for building the best model for forecasting and making a comparison between the results of these techniques to determine the best one among them. These models were applied particularly in this study because comparing SVM, K-NN and statistics models, is more focused on data classification and multivariate time series data than on univariate time series data forecasting. In particular, it focuses on extracting patterns and anomalies from data sets. The rest of the study is organized as follows: section 2 covers the related studies that have been conducted in this study, while the methodology used in this study is presented in section 3. Evaluation methods and results are presented in section 4 and section 5 respectively. We finally conclude in section 6.

RELATED WORKS
Related works are important aspect of any research work. they give the work done, till now, the developments in the area investigated by the researcher. [6] created a system to forecast movements in the stock market in a given day by using time series analysis on the S&P 500 values.
They also performed market sentiment analysis on data collected from Twitter to find out whether the addition of it increased the accuracy of the prediction. They collected the S&P 500 values from Yahoo Finance and the Twitter data from Twitter Census: Stock Tweets dataset from Infochimps.
The latter included around 2.3 million stock tweets. This dataset was then modified for the purpose of their work. They had three labels for the S&P index movement up, down, and same. To predict the S&P movements, they used five different attributes. To analyze the sentiments in the tweet dataset, they used Naive Bayes Classifier. The sentiments were labeled as up, down or same. After incorporating the sentiment analysis results with the time series analysis results, they found that the accuracy had improved. [16] developed a model in order to forecast the production of crops particularly of rice by making use of machine learning techniques. The researchers have used the data set pertains to the region of Bangladesh for the experimental analysis. The study area is highly influenced by the climatic variables such as rainfall and wind speed. By making use of machine learning algorithms, the researchers predicted the yield of crops and by using the model the authors also measured the effectiveness on unknown climatic variables that may cause the changes in the production of crops. The model is first skilled by relationship between past ecological pattern and crop yield rate. [3]  overall better performance in prediction. The authors also concluded that the data driven machine learning models are also useful in predictive areas than simply believing the statistical analysis. [9] have stated that an exact forecasting model is needed for a sustainable incorporation of wind power into the electricity grid. The researchers in the work have proposed a machine learning ensemble in order to predict the wind power. During the initial study, the developers have used regressors as a single base learning algorithm. The experimental results are compared with ensemble models developed by multiple learning algorithms which gives gain in classification ability by making use of weak predictors. The investigational analysis is carried out by using ensemble model developed by the combination of decision trees and support vector regression. This high-tech ensemble model generates 37% improvement of accuracy that when compared with the single base classifier that uses support vector regression. The developers also reveal that the ensemble prediction could be highly useful for high dimensional patterns. The experimental analysis is based on a large wind time series data set form simulations and real measurements. [12] have stated that machine learning algorithm plays an important role in the field of forecasting and in particular of rainfall prediction.
The machine learning methods does a significant impact on the rainfall prediction by using satellite based complex data containing variables of rainfall retrievals. The developments in parallel computing with machine learning algorithm are highly helpful in training the data set and also in predicting the future trends and these are the reasons behind using machine learning technology in real time practical situations. In the study, the researchers have analyzed the MSG SEVIRI data of Germany by making use of four machine learning algorithms viz., Neural Networks (NN), Random Forests (RF), Support Vector Machines (SVM) and Averaged Neural Networks (AVNN) for detection and rate assignment of rainfall. The satellite-based predictor variables such as cloud top height, cloud top temperature, cloud phase and cloud water path serve are taken into account for the study. The researchers identified that NNET and AVNNET performs better than the other models and NNET's computational speed is added advantage both in detection and rate assignment of rainfall. The authors have concluded that there is urging need for future research in providing suitable and accurate prediction of rainfall. [17] have proposed a new kind of forecasting model by using three data analytical forecasting models viz., support vector machine, adaptive neurofuzzy inference system and artificial neural network. The data set used for the experimental analysis was Borsa ISTanbul (BIST) 100 index. The data set contains records over the period of 8 years from 2007 to 2014. The performance of the models are evaluated by making use of the metrics such as accuracy, sensitivity and specificity. In order to minimize the bias, the study is conducted using ten-fold stratified cross-validation technique. The experimental result shows that the accuracy of forecasting down movements outweighs up movements of the index. The researchers also reveal that the study gives good results with fewer input factors that when compared with the previous forecasting models developed. [11] presented Machine Learning methods to statistical time series forecasting and compared the correctness of those methods with the correctness of conventional statistical methods and found that the first one is better and outtop using the both measures of accuracy. They provide the reason for the accuracy of learning models is less that of statistical models and suggested some other achievable ways. [18]

PROPOSED METHODOLOGY
In this section, the dataset and the proposed methods to predict future values for the same time series of the monthly gold price are introduced. First, the dataset will be introduced and then, predictive models will be explained.

Dataset
The monthly gold prices data used in our forecast models span from Nov-1989 to Dec 2019. The

Support vector machines (SVM)
Support vector machine is a supervised learning method which introduced by Vapnik in the early 90's, has proven very effective computational tool in machine learning. SVM has already outperformed most other computational intelligence methodologies mainly because they are based on sound mathematical principles of statistical learning theory. SVM is usually implemented for classification problems but is also used for regression analysis [7]. In SVM literature, when the SVM algorithm is used for regression problems, it is called Support Vector Regression (SVR) and when it is used for classification problems, it is called Support Vector Classification (SVC). With its ability to solve nonlinear regression estimation problems. In recent years, SVR has been widely used for time series forecasting which have data coming from an unstable and nonlinear system such as economic and financial data.
The idea of SVM is to separate the data into a high-dimensional feature space and find the hyperplane that maximizes the margin. In regression problems, a linear learning machine learns nonlinear function in a kernel-induced feature space. It uses an implicit mapping of input data into a high dimensional feature space defined by a kernel function. Using a kernel function is useful when the data is far from being linearly separable. The goal of SVM is to find a decision rule with good generalization ability through selecting some particular subset of training data, called support vectors. Training SVM requires the solving of a Quadratic Programming (QP) problem over a solution space known to be convex. Therefore, every local optima will also be a global solution [11]. Hence, SVM training always finds a global solution that is usually unique, this is superior to ANN, a technique that often optima.
Assume a non-linear function, given by () yx : where, w is the weight vector, b is the bias or threshold, and (x) represents a high-dimensional feature space that is nonlinearly mapped from the input space x.
The goal of SVM is to determine the values of w and b to orientate the hyperplane to be as far as possible from the closest samples. The coefficients w and b are estimated by minimizing the following function: To obtain the best SVR model for the monthly gold price data; there are three parameters that are peculiar to SVR; these are the type of kernel function, the regularization constant C and the maximum allowable deviation  [14]. For the choice of kernel, the linear function was selected as it has been found to be superior to other kernel types as seen in Table.1. As seen in table (1), linear kernel has the minimum value of RMSE and this Kernel function is used for monthly gold price forecasting in this paper.
To obtain the optimal values of the parameters C and  for the monthly gold price data, network searching algorithm is used. For this goal, one of the parameters is supposed as a constant and the other is changed to find the minimum of RMSE for the specified Kernel function. This methodology used is depicted in Fig. 3.

COMPARISON BETWEEN SUPPORT VECTOR MACHINES AND K-NEAREST NEIGHBOR
To Sum up the analysis above and after several rounds of testing C = 0.5 and  = 0.001 were concluded as the best selections for training set, because they have the smallest RMSE. These parameters were used to train the model again and then to predict the testing set. The testing set (Dec.2016-Dec.2019) was composed of the remaining 37 data points (10% of the series). Fig. 4 depicts a comparison of forecast values with actual values while Table 2 shows actual and forecast values of the monthly gold price for 37 points at the end of 2019.  The results shown in Fig. 4 and Table 2 indicate that the tendencies of the predicted value curve are basically near to those of the actual value one, and the predicted values fit the actual ones very well.

K-nearest neighbor
The k-nearest neighbor (K-NN) is a supervised learning method which introduced by Fix and Hodges Jr in 1951, afterwards formalizes for classification tasks by Cover and Hart [15]. In general, the K-NN is a developed version of instance-based learning algorithm based on the difference between features in the labelled dataset. Therefore, for a regression case, this can be the mean output variable, and for the classification task, it can be the most common class value.
For a given instance X, to compute Ŷ , K-NN model uses the k closest instances in the training set and the prediction is the average of the corresponding targets. The model can be written as the simplest way: Nx is the set of the closest points in training sample. Determining the k value is difficult as the parameter k goes to 1, at the same time the error on the training set goes to 0 but the error on the test set starts increasing. This is due to the K-NN model has low bias and high variance. To identify the closeness and measure the distance d between two data points, similarity metric (some distance functions) is used. A most common function here is Euclidean distance that measures the distance between x and y points as follows: The K-NN algorithm searches the k-closest training samples in the feature space based on the calculated distance. As a result, the parameter k plays an important role in the evaluation of the K-NN which implies k as the key tuning parameter. So, the goal here is to find the optimal k value for the model based on the dataset [12]. We have used two procedures to find the optimal parameter, first we provided 10-fold cross-validation with the10% test set split, and 90% train set to split the monthly of gold price data. Then we looped through the reasonable number of k in a range (0,12) and used the 10-fold cross-validation to estimate the optimal value of k and to record the output accuracy based on R2 between the feature test set and the response test set. For each iteration of the loop, the K-NN is instantiated with n_neighbor = k and for each fold, the accuracy is measured.
Within the range from 0 to 12, we have found the optimal k value = 5 with the highest score of 0.621 applying k-fold cross validation So, we choose k = 5 and set as the n_neighbor value to train the K-NN regressor model and forecast the output for monthly of gold price. • Forecasting using K-nearest neighbor In Fig. 6 and Table 3, the comparisons of the forecast values with actual data for gold price are shown.  Where i y is the actual value and i y is the predicted value and n is the number of observations [2]. Table 4 gives accuracy measure results for the ARIMA, SVR, and K-NN Models. The smaller (RMSE) values, the better the performance and the predicted values are closer to the actual values. The minimum accuracy measure (RMSE) of monthly gold price time series determine the best model. From the above table, it seems that SVR model performs better than the K-NN model by using (RMSE).

RESULTS
In our study we proposed two most popular machine learning techniques, support vector machine using linear function kernel, and K-nearest neighbor (K-NN), as a modern technique to overcome the problems of forecasting non-linearity and non-stationary time series data.
From the previous sections, the following results can be summarized: • The SVM model, with (C = 0.5 and  = 0.001), is the best fit for monthly gold price forecasting among all other SVM models with different parameter values.
• The K-NN model, with (k = 5), is the best fit for monthly gold price forecasting among all other K-NN models.
• The forecasting of monthly gold price, with SVM, is more efficient than K-NN method.

CONCLUSION
This study aimed to construct the best (SVM and K-NN) models for the time series data of a monthly gold price from Nov-1989 to Dec 2019, and comparing between models to see which one is better in forecasting the monthly gold price. The results of applying the SVM and K-NN methods were compared through the (RMSE) results. From this study, it can be concluded from the presented discussion that results of SVM were more accurate (with the lowest RMSE) and SVM is the most efficient forecasting technique for monthly gold price than K-NN model. In future, we intend to improve our results by using a hybrid method of K-NN and SVR to benefit from qualities of both models.