IDS Using Machine Learning - Current State of Art and Future Directions

The prosperity of technology worldwide has made the concerns of security tend to increase rapidly. The enormous usage of Internetworking has raised the need of protecting systems as well as networks from the unauthorized access or intrusion. An intrusion is an activity of breaking into the system by compromising the security policies


INTRODUCTION
Being an essential part of daily life and an essential tool today, Internet aids people in diverse areas like business, education, entertainment etc. For the business operations both business and customers apply the Internet applications for business activities [1]. But with the popularity of Internet comes the risk of network attacks or intrusions and the need to secure network against such attacks. Intrusion, an attack on the confidentiality, availability, and integrity is a series of activities aiming at compromising the security of a computer network system [2] taking many forms: external attacks, internal misuses, network-based attacks, information gathering, Denial of Service, and so on [3]. No system can be made perfectly secure because of financial and complexity constraints, hence the hacker will eventually find a way to break into our system, to analyze the network data for the possible intrusions (attacks) an IDS has become the essential component of computer security to supplement existing defenses. Conventional intrusion prevention strategies like access control schemes, firewall or encryption methods have failed to prove themselves to effectively protect networks and systems from increasingly sophisticated attacks and malwares. The Intrusion Detection System (IDS) have become the proper salvage and have become crucial component of any security infrastructure to detect the threats before they cause widespread damage. An IDS is hardware, software, policy or their combination responsible for uncovering the possible intrusions from the network audit data. What makes IDS different from intrusion prevention system (IPS) is that IPS is proactive in nature and tries to prevent an intrusion to occur in network whereas IDS is reactive in nature and works on assumption that no matter how secure a network is intrusions are bound to take place and it tries to uncover if there were really any.
An attacker follows a well-defined ordered series of steps to break into the system and starts with gathering information about the system like the protocol used and the systems available on the network. Once the list of the systems on the network is available, the attacker starts to probe each of the system to list out various vulnerabilities in the system, applications running and the ports open. After the vulnerability is pointed out and the target system is marked the attacker tries to gain the initial access to the target system by performing Remote and Local (R2L) attack. Once the hacker gains a user access on the system, he tries to escalate the privilege he has on the system by performing User to Root (U2R) attacks. After getting the super user privilege on the system the attacker carries out the attack by stealing or modifies confidential or valuable information, modifying web pages, or implanting a backdoor as a stepping stone for future attack purpose, etc. Once a target is compromised, the attacker can do anything he wishes at this stage.
To counter the problem of network attacks a lot of devices have been developed over the last few decades some proactive and some reactive in nature. IDS can be classified as either host based or the network by their defensive scope [4]. Host based IDS captures and analyzes the data on the attacked system itself where the network detection captures and inspects the packets at the network gateway before the attack can reach to the end system [5]. Network based IDS is installed as the second line of defense behind the firewall to protect the LAN. It is aimed at detecting the intrusions caused by multiple hosts. Whereas the host bases system needs to be installed on every machine which makes them efficient for detecting U2R and R2L attacks but at high operation and maintenance cost [6,7]. Both host based and the network have different monitoring domain and both of them detect different attacks effectively.
The rest of the paper is structured as follows: Section 2 gives an overview of the detection techniques employed in IDS. In section 3 various supervised and unsupervised machine learning techniques used in the surveyed works for IDS are discussed. Feature selection techniques are discussed in section 4. Dataset and the tools available are discussed in sections 5. Section 6 discusses about the performance parameters used to check the effectiveness of the works surveyed. Various problems pertaining to current IDS and their possible solutions as well as the future research directions have been discussed in section 7. In section 8 works in IDS for machine learning have been given in tables. Finally section 9 concludes this paper.

DETECTION TECHNIQUES
An intrusion detection system (IDS) rounds around the assumption that user behavior is observable and normal user behavior is different from intrusive behavior [8]. At the heart of intrusion detection lies the ability to distinguish acceptable, normal system behavior from that which is abnormal (possibly indicating unauthorized activities) or actively harmful [9]. Two approaches to this problem can be distinguished, with some IDS implementing a combination of both approaches.

Anomaly Detection
An anomaly detection model attempts to model normal behavior. This technique observes the user behavior over the period of time and builds the model that closely represents user's legitimate (normal) behavior. Events which are very different from this model are considered to be suspicious. For example, a normally passive public web server attempting to open connections to a large number of addresses may be indicative of a worm infection. Anomaly detection raises alert for any activity that doesn't look like normal which makes it suitable for detection of zero day attacks. The problem with anomaly detection model is how to define a model for normal behavior and how to handle evolving normal user behavior. The return of high false positive is another disadvantage of the anomaly detection system. This is the result of its inability to change and adapt over time [10].

Misuse Detection
A misuse detection model attempts to model abnormal behavior, and compares the network traffic against a signature base of known attacks [11] any match of which clearly indicates system abuse. For example, an HTTP request referring to the cmd.exe file may indicate an attack. A misuse detection technique has reduced false alarms compared to anomaly detection.
Misuse and anomaly detection techniques differ from each other in a way that anomaly detection uses the model of the normal data to detect the anomalous activities whereas misuse detection model uses signatures of some well-known attacks and looks for their occurrence in the network data. The advantage of misuse detection over anomaly detection is higher accuracy and lesser false alarms for the known attacks. The problems with misuse detection models is how to represent the signatures of all possible attacks and how to write signatures that are very different from the normal data pattern. Other problem implicit to the misuse detection model is how to update the signature base when newer attacks appear on the scene.

Hybrid Approach
Usually signature and anomaly detection are employed together so that they complement each other. This fusion of signature and anomaly detection techniques leads to hybrid approach. This hybrid approach has the combined positives of both the techniques. Survey shows that hybrid technique work better than either of the two techniques. The problem with hybrid approach is the added complexity to lay down the two approaches together to form a complex system, the order in which the two should process the data.

MACHINE LEARNING
Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. It explores the construction and study of algorithms that can learn from, and make predictions on data [12]. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions, rather than following strictly static program instructions. Intrusion detection model is a multinomial classifier problem that can classify network events as normal or attack events, such as Denial of Service (DOS), Probe, U2R, and R2L.
The three prerequisites for Machine Learning are • Data should be present. • There should be some pattern in data.
• No simple mathematical model for data.
Machine learning techniques are broadly classified as supervised or unsupervised depending on the presence and absence of the labeled data, and what actually we are trying to predict from the Dataset. Fig. 1 given below is the pictorial representation of the possible approaches that have been taken to design IDS in last two and an half decade. In the next section we give a brief introduction about each of the machine learning technique.

Supervised Machine Learning Techniques
Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypothesis, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown.

Decision trees
Decision trees are trees that classify instances by sorting them based on feature values. Each node in a decision tree represents a feature in an instance to be classified, and each branch represents a value that the node c Instances are classified starting at the root node and sorted based on their feature values. The construction of optimal decision tree is a NP hard problem and few heuristic approaches have been put forward. At each level of the decision tree a feature that best divides the tree into subclasses is selected by a variety of ways based on Entropy or Information gain. The division of the tree continues as long as any of the following condition is not met.
• All instances in the training set belong to single class. • The maximum tree depth has been reached. • The best splitting criteria is not greater than a certain threshold.
The selection of the best attribute node is based on the gain ratio GainRatio(S, A) where S of records and A, a non-categorical attribute. This gain defines the expected reduction in entropy due to sorting on A. It is calculated as the following ‫݊݅ܽܩ‬ሺܵ, ‫ܣ‬ሻ ൌ ‫‪ሺܵሻ‬ݕݎݐ݊ܧ‬ െ ∑ |ௌ ೡ | |ௌ| ௩∈௨௦ሺሻ

‫ݕݎݐ݊ܧ‬
In general, if we are given a probability distribution P = (p 1 , p 2 ……pn information conveyed by this distribution called the Entropy of P is Decision trees are trees that classify instances by sorting them based on feature values. Each node in a decision tree represents a feature in an instance to be classified, and each branch represents a value that the node can assume. Instances are classified starting at the root node and sorted based on their feature values. The construction of optimal decision tree is a NP hard problem and few heuristic approaches have been put forward. At each level of the decision tree a feature that best divides the tree into subclasses is selected by a variety of ways based on Entropy or Information gain. The division of the tree continues as long as any of the following All instances in the training set belong to a The maximum tree depth has been The best splitting criteria is not greater The selection of the best attribute node is based ) where S is a set egorical attribute. This gain defines the expected reduction in entropy due to sorting on A. It is calculated as In general, if we are given a probability pn) then the information conveyed by this distribution called If we consider only Gain(S; A) then an attribute with many values will be automatically selected.
One solution is to use GainRatio instead where Si is a subset of S for which v i .

Neural networks
Neural Networks is a programming paradigm that has been inspired by the human brain. A neural network is comprised of large number of neurons with each neuron having an input, output and an activation function. The input to a neural network is applied at input layer and in the activation area there some calculations are carried on the input and weights and the output is produced depending weather the sum produced in the activation layer is greater than some predefined threshold. Usually a neural network is laid layered approach. The first layer is called the input layer, last layer being called the output layer and other layers are called hidden layers. The optimal number of layers and the number of nodes on each layer is an NP hard problem and are selected by trying different combinations and settling on one that gives the best performance for the problem at hand.
; Article no.BJAST.23668 (2) ) then an attribute with many values will be automatically selected. instead  (4) for which A has a value l Networks is a programming paradigm that has been inspired by the human brain. A neural network is comprised of large number of neurons with each neuron having an input, output and an activation function. The input to a neural network layer and in the activation area there some calculations are carried on the input and weights and the output is produced depending weather the sum produced in the activation layer is greater than some predefined threshold. Usually a neural network is laid in the layered approach. The first layer is called the input layer, last layer being called the output layer and other layers are called hidden layers. The optimal number of layers and the number of nodes on each layer is an NP hard problem and d by trying different combinations and settling on one that gives the best performance

Support vector machines
The latest supervised technique on the scene is Support Vector Machine (SVM). SVM transforms the data in higher dimensions and finds the hyper-plane that best separates the data. A support vector machine is based on the notion of the margin and tries to find the maximum margin between the dataset. SVMs revolve around the notion of a "margin" either side of a hyper-plane that separates two data classes. Maximizing the margin and thereby creating the largest possible distance between the separating hyper-plane and the instances on either side of it has been proven to reduce an upper bound on the expected generalization error.

Fuzzy logic
Fuzzy logic is form of knowledge representation suitable for notions that cannot be defined precisely, but which depend upon their contexts. Fuzzy literary means "not clear, distinct, or precise; blurred", what makes fuzzy logic different from traditional programming approaches is that a fuzzy variable can take any value between zero and one while as a Boolean variable can take either zero or one .Traditional computing logic permits propositions to take a value of truth or falsity while as fuzzy logic allows us to express the degree of truth, which makes it very suitable for modelling real world problems.

Genetic algorithm
The concept of the genetic algorithm comes from the "adaptive survival in natural organisms" [1]. To implement the natural selection and evolution genetic algorithms use the computer system [13]. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions for optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover [14]. Genetic algorithms commence by generating a large population of candidates randomly and in iteration the genetic algorithm replaces the weak solutions with high performing solutions. The performance of a solution is checked against some fitness function, in each iteration the low performing solutions are converted into high performance solutions using mutation and crossover. The solution with low performance is deleted and does not survive to the next iteration.

Unsupervised Machine Learning
Unsupervised machine learning techniques take on unlabeled dataset and assign the items to certain classes. Absence of the training set and hence cross validation for the cluster analysis marks the difference between clustering and classification. A second important difference is that although most clustering algorithms are phrased in terms of an optimality criterion there is typically no guarantee that the globally optimal solution has been obtained. The reason for this is that typically one must consider all partitions of the data, and for even moderate sample sizes this is not possible, so some heuristic approach is taken. In unsupervised learning we are not concerned about predicting the label for some data item rather our aim is to uncover the hidden groups in data. The discovered groups as such do not have any meaning of their own; and it is left to analyst to derive some meaning from the discovered groups. In cluster analysis there is hardly any hyper-parameter that can be tuned other than the number of clusters the dataset should be divided in.
Once supplied the number of clusters we want to find, the algorithms divide the dataset in the appropriate number of clusters by using some optimization function. Computationally the problem of finding the clusters is difficult (NPhard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum [15].
All the clustering algorithms use one or the other distance measure to group the data in certain clusters. The data items grouped together in a cluster are much similar to each other than the data items grouped in different clusters. Given a dataset of n-dimensions clustering approaches calculate some feature and use that feature value to assign the data item to some cluster. Below in Table 1 some of the well-known and commonly used distance measures for the clustering algorithms on two data items x and y consisting of m-features is given. Of all the variants given in the table below Euclidean distance is a widely used distance measure.
In the following section some of the clustering techniques used for intrusion detection system are discussed.

K-means
One of the mostly used partitioning cluster algorithm is k-means. Given M points in N dimensions the k-means algorithm divides M points into K clusters so that the within-cluster sum of squares is minimized. It is not practical to require that the solution has minimal sum of squares against all partitions, except when M, N are small and K = 2. We seek instead "local" optima, solutions such that no movement of a point from one cluster to another will reduce the within-cluster sum of squares.

Self-organizing maps
Self-organizing maps (SOMs) were proposed by Kohonen (1995) as a simple method for allowing data to be sorted into groups. The basic idea is to lay out the data on a grid, and to then iteratively move observations (and the centers of the groups) around on that grid, slowly decreasing the amount that centers are moved, and slowly decreasing the number of points considered in the neighborhood of a grid point. A SOM is a sheet-like artificial neural network, whose cells become specifically tuned to various input signal patterns or classes of patterns through an unsupervised learning process. In the basic version, only one cell or local group of cells at a time gives the active response to the current input.

K-mediods
Just like k-means algorithm k-Mediods algorithm is also a partitioned and just like k-means algorithm k-mediods algorithm also tries to minimize squared error, the distance between points to be in a cluster and a point designated as the center of that cluster. Rather than taking the mean of all the data points as in k-means algorithm, k-mediods takes mediods of a finite dataset is a data point from this set, whose average dissimilarity to all the data points is minimal i.e. it is the most centrally located point in the set.

Bayesian clustering
Bayesian Clustering is an unsupervised classification program that uses Bayesian inference to find the most probable classification given the description of cases in the dataset. Although Bayesian Clustering is best suited for the problems where training samples are unlabeled, by ignoring the expert knowledge the system can be used for classifying the labeled data.

Types of Classifiers
As it is clear from the above two subsections 3.1 & 3.2 that there are various machine learning techniques and they can be laid down in any combination to solve the problem. An approach to solve a problem using a machine learning techniques can be classified as single, ensemble or hybrid depending on the number and the way in which different techniques work to solve a problem.

Single
These are simple most approaches that use a single machine learning technique to solve the problem in hand. This machine learning technique can be any clustering, classification or association techniques. Single learning techniques are easy to grasp fast to implement and easier to implement but do not produce satisfactory results for a problem, hence nowadays are seldom used.

Ensemble
Another way to solve a problem using a machine learning techniques is to use more than on weak classifiers and then fuse their produced results. Fusion of more than one learning technique together yields better predictive performance than obtained from any of the constituent learning algorithms. Ensemble models achieve performance by combining the opinions of multiple learners. In doing so we often get away using simple classifiers and still achieve great performance. Being inherently parallel in nature ensemble methods can have efficient training and testing time provided we have access to multiple processors. Ensemble methods can be realized in two ways one is training multiple classifiers on the same dataset and the other is training a single classifier on multiple datasets. Once the ensemble is trained then the data item at testing time is assigned to the class which majority of the classifiers point to.

Hybrid
Hybrid approaches combine two machine learning techniques to solve the problem; here the set of machine learning algorithms work in combination rather competing with each other as is the case with ensemble techniques. Hybrid techniques can be laid by cascading two techniques, clustering followed by classification or integration of two different techniques. Clubbing two or more techniques together has the improved performance than there other two counter parts.

FEATURE SELECTION TECHNIQUES
Intrusion detection is a classification problem and is based on building the model for that depicts the normal or the anomalous behavior. The data set available for Intrusion detection has large number of features using all the features for the classification is not computationally feasible and may result in reduced performance. So researchers over the years have been devising and using a large number of feature selection algorithms. To note a few Ant Colony Optimization, Cuttlefish Algorithm, Genetic Algorithms are being widely used nowadays. A feature selection algorithm can be classified as filter, wrapper or hybrid method [16].
• Filter: Filter methods select the features from the dataset irrespective of the classifier that would be used to build the model for the data. Filter methods take the data with large number of features and select only the best features from this dataset based on some characteristic. These methods use intrinsic characteristics of the dataset to select feature subsets by typically ranking individuals without taking into consideration any data mining algorithms. Filter method analyzes sole features independent of the classifier and decides which features should be kept [17]. • Wrapper: Wrapper methods implement a predetermined mining algorithm for evaluating generated subsets of features from the data set. These methods usually have superior performance as they identify features that are better suited to the predetermined mining algorithm. Wrapper based approaches are considered to generate better features, but run much slower and need more computing resources [18].
• Hybrid Methods: Some of the researchers have gone a step ahead by incorporating the two feature selection algorithms together such a system is called hybrid system. The hybrid system selects the reliable features for each class but is computationally expensive than either of the two techniques and hard to realize than each of the two techniques.
An alternative to the feature selection is the technique of feature extraction. This technique takes n dimensional data set and transforms it to other dataset which are not the actual features of the original dataset. Approaches [19][20][21] Take ndimensional dataset and convert it into one dimension distance vector, then afterwards this distance vector is used for both training and testing purpose. The transformed features are linear combinations of the original attributes.

DATASET AND TOOLS AVAILABLE
In the next two subsections the tools and the various versions dataset used for the intrusion detection are discussed briefly. Also the protocol wise attacks present in the dataset are given and in the tabular form what each attribute of the dataset is and type of value it takes is also mentioned.

Dataset
To check the effectiveness of the techniques a lot of the datasets have been used in practice, most of the works in intrusion detection system have treated the intrusion detection by following a passive approach and once in a while an IDS is fed with the network data on which the IDS applies some mining techniques and uncovers if there are any intrusions. For testing purpose a number of datasets are available for public. Given below a brief introduction about the datasets is given.

KDCup99 dataset
The publicly available and mostly used dataset for intrusion detection is KDCUP99 Data set. This data set is divided into two subsets; training set consisting of 5 million data records and testing set consists of 3 million records. Given in the tables below is the exact count of each type of attack present in KDDcup99 dataset. Each record of this dataset data set has 41 features derived for each connection and a label which classifies connection record as either normal or specific attack type. The features of dataset fall in four categories: intrinsic features e.g. duration of the connection, type of the protocol (tcp, udp, etc), network service (http, telnet, etc.), etc. The content feature e.g. number of failed login attempts etc. The same host features examine established connections in the past two seconds that have the same destination host as the current connection, and calculate statistics related to the protocol behavior, service, etc. The similar same service features examine the connections in the past two seconds that have the same service as the current connection.

Corrected KDDCup99 dataset
The KDDCup99 dataset is highly redundant records this causes the learning algorithm to be biased to frequent records, and thus prevent them from learning infrequent records which are usually more harmful to networks such as U2R and R2L attacks [22]. In Corrected KddCupp99 dataset all the redundant records have been removed this way the chances of classifier being biased are reduced.

10% KDDCup99 dataset
A complete dataset is seldom used for the training or testing purpose. Rather 10% of the complete dataset is used this dataset has reduced instances of the attacks. Training the classifier on reduced dataset makes if feasible computationally. Below in the Table 2 [22] the count of instances in each of the variants of dataset and the number of particular attacks present in each of the variant is given.
In all the three versions of the dataset the attacks fall in one of the four categories. In Table 3 given below the attack groups and attacks present in KDCUP99 Dataset are listed.
A complete description of each of the 41 features and about the data they take is given Table 4. Features may be continuous or nominal marked by C and N respectively.
As already mentioned above that the KDDCup99 dataset has records of 41 attributes. To provide a clear explanation about how the dataset looks like in Table 5 given below we have listed two records from the dataset one being normal and one being a smurf attack. As can be pointed out from the table some of the features of the dataset are nominal, while some are continuous, the last feature of the record represents the class to which the record belongs to.    Table 7 given below gives the frequency of each attack in the dataset.

Weka: Data mining software in Java
Weka is a collection of Machine Learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or can be called from Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new Machine Learning schemes [24]. Weka is preferred by users as it is a free ware and has an easy to use GUI. To use Weka a user need not be an expert in using computer system.

Matlab
Matlab having more than 1 million users across the academia and industry is multi-paradigm numerical computing environment and fourthgeneration programming language. The MATLAB application is built around the MATLAB scripting language. [25] Even though Matlab is very rich in features and has very diverse scope than Weka at the same time it is difficult to operate and needs a user to have proper understanding of the computer programming.

PERFORMANCE PARAMETERS
To check the effectiveness of IDS and document the results a lot of performance metrics have been used below mentioned performance metrics. Researchers have used these metrics to compare their results with already existing approaches [26].

EMERGING PROBLEMS AND PROBABLE SOLUTIONS
Even though a lot of research effort has been spared for intrusion detection using machine learning techniques still there are many problems present and need to be solved to move forward.

Problems
The existing systems suffer the following problems • The problem with most of the techniques surveyed is that each of the technique generates too many false alarms. The reason for this could be that the models assume that the user behavior is perfectly observable, legitimate behavior is different from the intrusive behavior and the user usage pattern is steady throughout [10,27,28]. • The low detection rate for the U2R and R2L attacks is another problem present in the currently existing technologies. This could be because the U2R and R2L attacks are very similar to the normal data and are many times misclassified as either the normal data or some other class.
Another reason for low detection rate could be the low frequent occurrence of these classes of attacks which causes the classifier biased to them and hence has reduced detection rate [29][30][31]. • An IDS by itself is a resource and is prone to be attacked. IDS can be attacked by the attackers and if the attempt to break in the IDS succeeds the network will be left open to the attacker and hence IDS can prove to be single point of failure. There is a long list of the possible attacks on the IDS [7,32]. • An IDS is usually trained on some benchmark dataset and if any implemented in the real environment is all together left in alien conditions. Training and testing the IDS in two different environments reduces the performance. • There are lots of studies each of them documenting altogether different results even using the same classifier. There is no documentation about the maximum accuracy or the detection rate that an algorithm could attain on a given problem. • There is no study about which classifier is best for any environment, as there are lot of machine learning techniques available and research has not been up to the point where would compare the classifiers and say that a particular algorithm outnumber some algorithm is all aspects. • Normal data is very common and anomalous data is rare causing classifier to be biased towards less frequently occurring data items and in case of attacks also some attacks occur very frequently and some occur with less frequency. This restricts us to have an unbiased classifier. • An IDS raises too many alarms administrator has to deal with, alarming a network administrator for each of the attack independently would lead to too many alarms a manager can deal with. A way to group alarms in groups and raising alarm for each group can be looked into. • All the techniques surveyed need too much training and testing time which is undesirable and hence none of the techniques is feasible to be implemented in the real environment. • Although more advanced and sophisticated detection approaches have been developed, very few have focused on feature representation for normal connections and attacks.
• Even though lot of research effort has been put for anomaly detection but still all the present IDS are based on the Misuse detection only because the models devised all are based on the labeled data the thing which we don't have in then real networks.

Possible Solutions
Various possible solutions to the problems discussed in 7.1 are • To minimize the false alarms an IDS should capable of online learning, handling concept drift and should have the ability to be customized to suit any environment. • To improve the detection rate for R2L and U2R attacks a proper mix of feature extraction, feature selection, data transformation, clustering, classification techniques and the selection of such attributes from the data which are very specific to these two classes of attacks should be taken into consideration.
• To add reliability, scalability and to eliminate single point of failure the feasibility of implementing an IDS as a distributed system can be checked. • To reduce the biasness of classifier, an IDS should be capable of handling skewed class distribution. • Reduce the number of alarms an administrator has to deal with by grouping the alarms and issuing a threat for each attack rather for each packet. • To reduce training and testing time don't use the features of data as such rather transform the data and represent it as a single point in space and use the transformed single dimensional dataset for training and testing which we believe will work faster than their counterparts.

COMPARISON OF WORKS
In this section the surveyed works are put in the tabular form and for each work the model employed, dataset used, features selected, implementation environment and performance metrics are given. A dash (-) in any cell indicates that the author (s) has not mentioned about the feature in their paper. Surveyed works are classified into three tables single given in Table  9, ensemble given in Table 10 and hybrid given in Table 11. At the end of this section an abbreviations of the terms used in the tables is given.

CONCLUSION
In this work a survey of intrusion detection systems using machine learning techniques was given. A lots of relates works were surveyed and classified into three groups single, ensemble or hybrid and for each work the dataset used, environment in which implemented, feature selection if any and the performance measures checked were documented in tabular form. A complete list of the attacks present in the KDDCup99 dataset is also given. The thorough survey of the works has revealed that the hybrid machine learning techniques with the proper feature selection algorithm out class there single or ensemble counterparts. Also in this paper an effort was made to point out problems pertaining to the current system and directions for future research were also provided.