Comparison between fuzzy robust kernel c-means (FRKCM) and fuzzy entropy kernel c-means (FEKCM) classifier for intrusion detection system (IDS)

Technology is growing very fast. We can now access everything using internet anywhere and anytime. That is why it is important to have internet security since we are always open to an online fraud, property damage and theft. IDS (Intrusion Detection System) can be used to detect any system or network attack. In this empirical study, we use dataset from KDD Cup 1999, which consist of five classes: normal, probe, dos, u2r and r2l. There is some classifier method for IDS, but in this study, we will use Fuzzy Robust Kernel C-Means (FRKCM) with Polynomial kernel and Fuzzy Entropy Kernel C-Means (FEKCM) with RBF kernel to find a better result that increase accuracy of the network attacks. There will be an accuracy comparison between FRKCM method and FEKCM method. The accuracy result from this study is 99% with time execution faster.


Introduction
Technology is growing very fast. We can access everything using internet anywhere and anytime. The internet is group of small networks that connected to each other on a computer. Internet connection is very important since it allows us to access information and to communicate far easier. The Internet not only matters to businesses or citizens but also to government since it provides governments with an opportunity to function in a more innovative, engaging and cost-effective manner. However, this reliance on internet leads to an increasing number of cyber-attacks and data breaches, and numerous risks and challenges. One example of the threat is hackers [1]. Hackers can illegally gain access to a network and view the information on the local database, some of it highly confidential. The threat of hackers cannot be underestimated since now they are well structured, and the attacks might be undetected [2]. That is why it is important to have internet security to protect internal network. One of the tools we can use to prevent any system or network attacks is IDS (Intrusion Detection System).
IDS or intrusion detection is a system that can detect attacks from unauthorized users from other networks who want to try to get information from the network by checking the pattern of attacks on the computer network. However, IDS have many disadvantages as it cannot identify new attacks [3]. They most commonly detect known attacks based on defined rules or behaviour analysis through baselining the network. It can also cause system failure because when the IDS is turned off it will provide an opportunity for hackers to attack the system [4].
Nowadays, researches attempting to apply machine learning methods for IDS as solution to detect anomaly threats. Machine learning trains computer to process the information and act when required. Machine learning techniques enable computer to have thinking process like logical reasoning, trial and  [5]. There are various machine learning algorithms that can be used for IDSs like Support Vector Machines, Decision Trees, Fuzzy Logic, Bayes Net and Naïve Bayes [2]. In this study, we use Fuzzy entropy kernel c-means (FEKCM) dan fuzzy kernel robust c-means (FRKCM) as classification methods. We will use 10% CORRECTED KDD CUP 1999 DATA to see which classifier works best.
Fuzzy C-Means is a clustering algorithm which solve classification problem through finding the most accurate cluster center. However, Fuzzy C-Means method can be interfered by the outliers since the membership must one [6]. The mechanism of the Fuzzy Robust C-Means and Fuzzy Entropy C-Means is the same with Fuzzy C-Means. In Fuzzy Robust C-means method, the outliers are force into a cluster [7]. In Fuzzy Entropy C-Means, an entropy measure works by identifying the total of the clusters and their center. This measure is different from other similar methods because after determining a cluster center, this measure does not revise values of all other data points.
The common problems in machine learning are the assumption that the data can be classified in linier. In fact, it is hard to separate data in linier, as stock data its self is a non linier data. Kernel function is needed as solution to this problem so the clustering process will run smooth and efficient. Kernel function is a function to represent the multiplication in a feature or high dimension room so the distance between data in one room can be calculated without transforming the data. In this research, algorithm fuzzy robust c means modified with kernel function, that is Fuzzy Kernel Robust C-Means.

Intrusion Detection System
IDS have been used to protect computer networks against both known and unknown attacks since 1970s [8,9,10]. IDS is a method that can detect attacks from unauthorized users from other networks who want to try to get information from the network by checking the pattern of attacks on the computer network. IDS itself can be divided into two ways based on the location in a network which are Hostbased based Intrusion Detection System (HIDS) and Network-based Intrusion Detection System (NIDS) [11]. HIDS can be classified into misused HIDS and anomaly-based IDS [12]. A misused HIDS detects unusual activities of the computer that is suspected as intrusion based on prior information about specific attacks. NIDS consists of large number of sensors, which analyses data packets both inbound and outbound and offer real-time detection [13]. The challenge faced by NIDS is identifying new attacks to the system.
Based on KDD CUP 1999, the classes types of attacks as benchmark data for IDS research are classified into four categories [9]: • Denial of Service (DOS) -type of attack that can shut down or weaken the power of the computer and makes the computer system crash and cannot operate well. • Remote to Local (R2L) -the attackers send packages to find the weakness in the system and then act as local users to gain access. • User to Root (U2R) -First attackers will access using normal account and then tries to find a weakness to get into root system to get super user privileges. • Probing Attacks (PROBE) -The attackers scan the computer network to gain the information.

Fuzzy C-Means
Fuzzy C-Means (FCM) is expansion method from method K-means [13,14]. We can use Fuzzy C-Means (FCM) clustering techniques by assigning some membership values in the range of [0,1] to find a significant cluster [15]. The objective function of Fuzzy C-Means can be written as [13]: Where, c is number of cluster ( ; 2≤ ≤ ) , is weighting exponent (1≤ ≤ ∞), is fuzzy partition, is vectors of center, center of cluster .

Fuzzy Entropy Kernel C-Means
This is a modification method of Fuzzy Entropy C-Means, which is sensitive to outlier and noise. To prevent this noise's effects, we will use kernel function that can decrease the outliers' effects and can be used for no separable data. With kernel function, the modified function can be written as [18]: Where ≤ ≤ , is number of clustering, is number of data points, is the typically of in class . With, is a normalization term with ̅ = ∑ = . Where: For all and ≥ 1 and contains < different data points.  (14) where ( , ) = ( , ) = 1.

Fuzzy Robust Kernel C-Means
will produce membership function defined as: And prototype is updated by using: T is maximum iteration and t iterator. The value of calculated by:

Datasets
In this study, the data used was Intrusion Detection System data from KDD CUP 1999. There were 494,021 samples, 42 features and labels containing information on 5 classes. In Table 2, features will be shown from the KDD CUP 1999.

Results
In this section will show the results of accuracy, sensitivity and running time between FEKCM and FRKCM to solve IDS problems. We will ramdomize10000 data of each class. Formally, accuracy has the following definition [19]: Accuracy: Where, TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives. Sensitivity can be expressed as: Sensitivity: + Table 3, Table 4 and Table 5 show the accuracy and sensitivity achieved when FKEC and FRKCM with RBF kernel ( = 5 1000) was applied to 2 classes.  Table 3 and Table 4 show the accuracy achieved when FKECM and FRKCM = 5 was applied to the KDD CUP 1999. The highest accuracy with FEKCM (100%), sensitivity (100%) was obtained with 90% data training and a running time of 0.33 s. The highest accuracy with FRKCM (100%), sensitivity (100%) was obtained with 90% data training and a running time 0.19 s.  Table 5. Accuracy and Sensitivity FEKCM with RBF kernel ( = 1000) Table 6. Accuracy and Sensitivity FRKCM with RBF kernel ( = 1000) Table 5 and Table 6 shows the accuracy achieved when FKECM and FRKCM with = 1000 was applied to the KDD CUP 1999. The highest accuracy with FEKCM (100%), sensitivity (100%) was obtained with 90% data training and a running time of 0,42 s. The highest accuracy with FRKCM (100%), sensitivity (100%) was obtained with 90% data training and a running time 0.19 s. Table 7 and Table 8 demonstrate accuracy for KDD CUP 1999 data using FKECM and FRKCM are slightly different (see table 7 and table 8).  Table 7 and Table 8, we can see that FRKCM gives the best accuracy for 5 classes and 23 classes. The highest accuracy using FRKCM 94.27% with = 1000 resulted with 90% data training and a running time 9.56 s.

Discussion
In this research, we would like to compare FEKCM dan FRKCM method to solve ids problem. We will classified the data into two classes which are Normal_Dos, Normal_U2r, Normal_R2L, 5 classes and 23 classes. For each class we use n% (n= 10, 20,..., 90) data for training data and (n-100%) for testing data.
We will create table which consist of 5 classes: normal-DoS, normal-U2R, normal-R2L, and all. We will use Normal-DoS to determine whether the DoS is attacked or no. This is also applied to normal-Probe and the rest. While to determine a DoS attack, Probe attack, U2R attack, R2L attack or even not an attack at all we will use the class with label all.
From Tables 3, 4, 5 and 6, we can see that the accuracy and sensitivity from FEKCM and FRKCM in 2 classes achieved 100% for training data 90. But FRKCM gave better result since the time needed is only 0,19. While from 5 class and 3 class classification, there is a significant difference for accuracy which is 94% in 9.56 seconds.

Conclusion
The best classification of Intrusion Detection System data problem result is given from the FRKCM with RBF kernel. Thus, we have found the satisfying accuracy with rapid running time from this research. We could also continue using this research to find better method regarding IDS data classification problem which might obtain a better result.