Applying one-class classification techniques to IP flow records for intrusion detection

Flow-based intrusion detection systems analyze IP flow records to detect attacks against computer networks. IP flow records contain aggregated information of network traffic packets therefore the amount of data processed by the intrusion detection system is reduced. However, flow-based techniques are not mature enough and combined with payload based techniques in a multi-stage intrusion detection system. The first stage in the multi-stage intrusion detection system uses flow-based detection to identify potentially malicious traffic. The malicious traffic is forwarded to a second stage where detail intrusion detection is performed using payload inspection. Since there is only one class of interest (malicious) at the first stage, we propose a one-class classification model for detection of malicious flows. The one-class classifier detects malicious flows from the network traffic. We review various one-class classification techniques and evaluate them on a flowbased dataset to determine their performance for detection of malicious flows. Our results show that one-class classification techniques using boundary methods give best results in detection of malicious IP flows.


Introduction
Intrusion detection systems (IDS) secure computer networks from unauthorized access and cyber attacks. The intrusion detection systems analyze network traffic and raise an alert if an attack is detected. Traditional approaches for intrusion detection uses payload and protocol based inspection. Payload-based inspection techniques scan complete packet payload to detect attacks and have full access to network traffic.
However, payload inspection can slow down network traffic in high-speed backbone links (Husak et al., 2015). Also, payload inspection is not possible when packet content is encrypted. Protocol-based techniques check header fields of every packet against the protocol specification. Any out of range value in protocol fields of the packet header is considered malicious. Protocol inspection techniques are protocol specific and cannot be generalized for unknown protocols.
An alternative approach to payload and protocol inspection is the flow-based inspection. Flow-based inspection uses IP flow records for intrusion detection. An IP flow is defined as a set of IP packets passing through an observation point in the network during a certain time interval. All packets belonging to a particular flow have a set of common properties (Claise et al., 2013). A flow export and collection protocol collects flow data from the network. The flow export and collection protocol makes the IP flow records available to a flow analysis application in the desired format. A common flow export protocol is Cisco's Netflow, which is supported by almost all major vendors. Internet Engineering Task Force (IETF) adopted Netflow's version 9 and standardized it as IP Flow Information Exchange (IPFIX) protocol (Sperotto and Pras, 2011). IPFIX specifies a standard architecture for collection and processing of IP flow records.
Flow-based intrusion detection systems have several advantages over payload and protocol-based techniques (Golling et al., 2014). Flow-based approaches only inspect the packet headers and do not consume any resources in the analysis of packet payloads. Since no payload inspection is involved, flow-based intrusion detection is not affected by the use of encryption. IP flows contain aggregate information in the form of IP flows therefore flow-based inspection does not inspect ever packet header or payload. IP flow records are independent of the higher layer protocols. Flow-based techniques also have some disadvantages. These systems only rely on the header fields 3 and have no access to relevant information residing in packet payloads. Flow-based detection systems are therefore unable to detect attacks which are hidden in packet payload and do not cause a significant change in traffic flow data. Also the flow export and collection process involve a certain delay in intrusion detection during which slow and small ramped attacks can go undetected (Vykopal et al., 2013). Researchers have, therefore, suggested multi-stage intrusion detection models which combine flow and payload based intrusion detection processes ( Sperotto et al., 2010;Golling et al., 2014).
Flow-based intrusion detection is combined with payload-based detection techniques ( Golling et al., 2014). The flow-based detection is performed at the first stage while detail analysis is performed at second stage using payload-based inspection. In this paper, we propose a one-class classification model for flow-based detection of malicious traffic. One-class classification is employed when labeled training data is available for only one class. Training data for other classes is either not available or difficult to obtain (Khan and Madden, 2014). The class for which the training examples are available is called target class. The focus of our work is to apply one-class classification for detection of malicious flows. We also evaluate various oneclass classification techniques on a flow-based dataset to determine their performance in flow-based intrusion detection. The training dataset contains malicious IP flows and test dataset contains both normal and malicious flows. On the basis of results, we discuss the application of available one-class classification techniques for flow-based detection of malicious traffic.
The organization of the paper is as follows: Section 2 presents existing work on the use of one-class classification techniques for intrusion detection. Section 3 describes the concept of one class classification and reviews various one-class classification methods. In section 4, we propose one-class classification for detection of malicious flow in the first stage of a multi-stage intrusion detection system. We evaluate different one-class classification techniques on a flow-based dataset and discuss the results in section 5. Finally, the conclusion and future work are presented in section 6.

Related Work
Machine learning algorithms have remained in primary focus for designing intrusion detection systems (Liao et al., 2013). Similarly, one-class classification has 4 also been applied to solve the intrusion detection problem. An ensemble of one-class classifiers for intrusion detection is proposed in (Giacinto et al., 2008). The authors have used a modular approach in which each module models a group of similar network protocols and services. Parzen density estimation, k-means, and v-SVM are used to construct the one-class classifier ensemble. The technique is evaluated on KDD99 dataset and results show that dividing the problem into different modules attains high detection rates with lower false alarm rates.
A flow-based intrusion detection system using one-class SVM classification is used in ( Winter et al., 2011). The OC-SVM detects malicious flows and discards normal flows. A small subset of the flow-based dataset, developed by Sperotto et al. (2009), is used for evaluation.
A differential support vector data descriptor (SVDD) based one-class classification method to detect more harmful attacks using host-based intrusion detection is proposed in (Kang et al., 2012). Experimental results show that differentiated intrusion detection method performs better than existing techniques for detection of harmful attacks.

An application of one-class classification for intrusion detection in Supervisory
Control and Data Acquisition (SCADA) networks is presented in (Nader et al., 2013).
SCADA networks monitor and control industrial and public service processes such as nuclear power plants, electrical power grids, gas pipelines and water distribution systems. The authors have employed two one-classification methods; Support Vector Data Description (SVDD) (Tax, 2001) and Kernel Principal Component Analysis (Hoffmann, 2007). Both techniques used a SCADA network dataset for evaluation.
Results indicate that both techniques tightly enclose the normal flow behavior in the SVM hyper-sphere and also detect intrusions. Amer et al. (2013) proposed two enhancements in one-class SVM for unsupervised anomaly detection. Both enhancements reduce the effect of outliers on the SVM model during training. Authors have compared the proposed techniques with nine other unsupervised anomaly detection algorithms and obtained promising results.
An industrial communication intrusion detection algorithm based on one-class SVM is presented in (Shang et al., 2015). Authors have used particle swarm optimization (Couceiro and Ghamisi, 2016) to tune the kernel parameters.

5
A robust one-class SVM for outlier detection is presented in (Yang et al., 2016). The authors use dynamic weight assignment for training datasets for the smooth influence on one-class SVM. Experimental analysis shows the proposed weighted method has improved performance and robustness as compared to the conventional one-class SVM.
Survey of literature shows that although one-class classification has been in use for intrusion detection, there are many other one-classification techniques yet to be explored in detail for flow-based intrusion detection. Mathematically, we assume that xi is an training example from the target class dataset X = {x1,x2,x3,...,xn}. The one-class classifier uses the training dataset X to learn a model with the optimized parameter set θ. After learning, the classifier can identify target class examples from unseen test dataset Z. The classifier defines an output function f using the optimized parameter set θ such that:

One-class classification
Where zi is an unseen example, and ci is the class probability. The classifier uses a mapping function h(zi) over the output function f(zi) to classify unseen examples into target or outlier classes. If the class probability is higher than a pre-defined threshold t, the target class is selected otherwise the example is declared outlier.

Density Estimation
The density estimation methods calculate the density of target class from training data using a density estimation function f(z). The function f(z) calculates the density of a test example and uses a threshold value to accept the example in the target class. If the density of test example is higher than the threshold, it is classified into target class.
Otherwise, the example is considered an outlier (Pimentel et al., 2014). We have evaluated three desntiy estimation methods; simple Gaussian distribution using Mahalanobis distance, mixture of Gaussian and Parzen distribution.

Simple Gaussian distribution using Mahalanobis distance
This technique uses the Mahalanobis distance to calculate the density of test example z in the Gaussian distribution: where µ is the mean and Σ is the covariance matrix.

Mixture of Gaussian
The simple Gaussian distribution does not fit most data distributions. To model more complex data, a mixture of different Gaussian distribution is used. Density at point z for mixture of different Gaussian distributions is defined using the following equation: Where Pi is the probability that z belongs to ith Gaussian component, µi is the mean and Σ is the covariance matrix.

Parzen Distribution
Parzen density estimation uses a hypercube < with dimension d and width h to calculate the density at point z. The hypercube < is centered at z and density is 7 calculated with respect to all examples xi. We define a function φ which gives a value of 1 if an example xi is within the hypercube < or 0 otherwise.
We use Gaussian estimate for the function f(z) such that:

Reconstruction methods
Reconstruction methods use training dataset to model the generating process for all examples. A reconstruction error is used to measure the fit of an actual example for the generating model. If the example does not fit the model and reconstruction error is high, the example is more likely an outlier (Pimentel et al., 2014). Reconstruction methods include neural networks (Auto-encoder neural networks, self-organizing map) and subspace-based approaches (Principle Component Analysis).

Auto-encoder Neural Networks
The auto-encoder neural networks encodes the input to pass through a compact hidden stage. The input is reconstructed (decoded) at the output stage and matched with original input to calculate the reconstruction error:

Self-organizing map
Self-organizing maps is a clustering technique where high dimension input space is mapped to low dimension output clusters. The self-organizing map has input and output layers which connect with each other through a competitive learning network.
The output layer contains all clusters. An unseen example is placed at the output layer and the distance from the closest cluster center is calculated. The distance is reconstruction error defined by: where k is number of output clusters.

Principle Component Analysis
The Principle component analysis (PCA) is dimension reduction technique which maps the input space of size N to output space M such that M < N. The projection of input object z from N to M is defined by: PCA use the reconstruction error as a classification function. The reconstruction error is defined as the squared distance between the projection and the original example: f(z) = ||z − y|| 2 (11)

Boundary Methods
Boundary

ν-SVM
The ν-SVM construct a boundary around the target class examples in the form of a hyper-plane during training (Sch¨olkopf et al., 2001). An unseen example z belong to target class if it falls within the hyper-plane and considered outlier otherwise.
SVM uses a feature mapping function φ : X → H which maps the input feature space X to a high dimension feature space H (Li, 2015). The similarity between an input x and its class prediction y in feature space H is be calculated using a simple kernel function: K(x,y) = (φ(x).φ(y))H (12) To separate the input examples from the origin with maximum margin using a hyper-plane, following quadratic minimizing function is applied: Subject to where ξi : Slack variable to penalize the outliers ν ∈ (0,1) : A user-defined error control parameter and sets an upper bound on the fraction of outliers and a lower bound on the number of support vectors.
ρ : The maximal margin for hyper-plane from origin.
Using Lagrange multipliers and constructing a Lagrangian, the decision function for the classification of a test example z is defined as follows (Sch¨olkopf et al., 2001):

Support Vector Data Descriptor(SVDD)
The support vector data descriptors construct a hyper-sphere around the target class training examples. The sphere has its center a with radius R > 0. The volume of the sphere is minimized such that it contains all training examples (Tax, 2001).
Following error function is minimized for SVDD: Such that all target examples xi lies within the sphere: The strict data description of minimizing the sphere radius may not fit in all cases.
To allow the possibility of outliers in the sphere, and to penalize the larger distances for xi, slack variables are introduced. Thus, the minimization problem becomes: Such that maximum target examples xi lies within the sphere: ||xi − a|| 2 ≤ R 2 + ξi,ξi ≥ 0 (19) The C parameter is analogues to ν in ν-SVM and control the acceptable number of outliers.The parameters R, a, and ξ are optimized by using Lagrange multipliers and constructing a Lagrangian. We have following quadratic minimization problem (Tax, 2001): A test example z is classified into target class if its distance from center a is less than the radius of the sphere. The decision function for classification is defined as follows:

Proposed approach
Both  The goal of this paper is to evaluate the application of one classification for flowbased intrusion detection. We consider various one-classification methods including density estimation, reconstruction methods and boundary methods for detection of malicious IP flows. We evaluate the one-class classification techniques on a flow-based dataset and obtain results to determine the suitable one-class classification technique for detection of malicious IP flows.

Dataset
We have used two scenarios of CTU-13 dataset for evaluation of one-class classification techniques. The CTU-13 dataset was created in CTU University, Czech Republic (Garcia et al., 2014). During the labeling process, all traffic was initially given the background label. The normal label was given to the traffic that was originated from switches, proxies, and legitimate computers. All traffic that came from the known infected machines was labeled botnet. We have used fourth and first scenario for the experiment. Table 1 gives detail of the malware and traffic flow records in each scenario.
The CTU dataset contains bidirectional Netflow records. Every flow record has 15 attributes. Table 2 shows a sample of the dataset with important attributes. The start time and duration fields are used to calculate the flow duration. The protocol value shows the transport layer protocol type. The direction field shows the direction of flow.
It can be incoming, outgoing or bi-directional. The Total packets and total bytes fields contain the total number of packets and bytes transmitted in either direction. Another field source bytes also exists which can be used to 13 We have extracted a sample of the dataset for evaluation of one-class classification techniques. Detail of IP flows in the sample are given on Table 3.

Performance Measures
We have evaluated one-class classification techniques using two well-known performance measures; Area under Receiver Operating Characteristic (ROC) curve (AUC) and F1 score (Wu and Banzhaf, 2010). The ROC curve plots the false alarm rate against true positive rate. In intrusion detection, ROC curve measures the detection rate as the false alarm threshold varies. The ROC curve is quantified by measuring the area under the curve (AUC). The value of AUC near 1 denotes a good intrusion detection process.
The second performance measure is F1 score. F1 score is equal to the harmonic mean of precision and recall values:

Discussion
In estimation requires all training data to be made available during testing (Mazhelis, 2006). This is particularly challenging in the case of intrusion detection in network traffic where large training records are pushed on regular intervals.
In next step, the one-class classifier based on reconstruction methods have been evaluated. The reconstruction methods use a generating process to model the data. The parameters of the generating process are optimized for the correct representation of new objects. Although reconstruction methods can be applied for one-class classification problems, they are not primarily meant for this purpose (Tax, 2001;Wozniak, 2014). In case the data does not fit the model, a bias value is used to minimize the reconstruction error. The bias value destroys the important characteristics of the dataset (Tax, 2001). Accurate modeling of IP flow records using reconstruction methods requires a large number of parameter to be optimized and can have high reconstruction error. These methods are computationally expensive for one class classification (Mazhelis, 2006). Another drawback of reconstruction methods is the difficulty in training high-dimensional spaces (Pimentel et al., 2014). This aspect limit the use of reconstruction methods for classification of IP flows because flow records can be very high dimensional as IPFIX includes around 280 attributes to define an IP flow record.
In next experiment, we have applied two neural network approaches which include auto-encoder neural network and self-organizing map. Results show that auto-encoder neural network has relatively good accuracy, but it requires some parameters to be ( a ) ν -SVM ( b ) Support Vector Data Descriptor 18 specified by the user. These include the number of hidden layers, input units, and stopping criteria (Mazhelis, 2006). The auto-encoder neural network also needs a large training set for correct estimation of weights associated with hidden and input units.
The self-organizing map (SOM) requires the user to specify the number of output clusters. Also SOM needs k d neurons for d dimension dataset. It is computational expensive to estimate the weights of k d neurons if both k and d are moderately higher (Tax, 2001). Another type of reconstruction method is Principle component analysis (PCA). PCA also has lower accuracy for one-class classification of IP flows. The PCA does not perform well if the data has variance in all feature direction. In this case, PCA is unable to reduce the dimensionality of data (Tax, 2001).
One-class classification techniques using boundary methods give best results for identification of malicious IP flows. The boundary methods are specifically focused on one-class classification and also perform better if limited training sample is available (Tax, 2001). These methods avoid the estimation of the complete probability density and can work efficiently if the dataset is high dimensional. In our experiment, we have used two SVM-based one class classification techniques; ν-SVM and Support Vector Data Descriptor (SVDD). The ν-SVM and SVDD use hyperplane and hypersphere respectively to enclose the target class examples. Finding the smallest sphere containing all target points is equivalent to find the segment containing the required points (Sch¨olkopf et al., 2001). In our experiment, both methods show similar result and perform better than all reviewed classification methods. However, a drawback of SVM-based methods is the difficulty involve in the estimation of values for the parameter that controls the number of outliers within the boundary. The accuracy of SVM-based one-class classifiers is very sensitive to a slight change in the value of these parameters.

Conclusion and Future Work
In this paper, we have applied different one-class classification techniques for detection of malicious IP flows. The techniques include density estimation, reconstruction methods, and boundary methods. We used a flow-based dataset to train the one-class classifiers on malicious IP flows and evaluated trained classifier on a test dataset containing both normal and malicious IP flow records. We have used multiple performance measures including Area under ROC curve (AUC) and F1 score for comparison of results. On the basis of results, we discussed pros and cons of all techniques used for one-class classification. The results show that boundary methods i.e. SVM-based one-class classifiers give higher accuracy in identification of malicious IP flows. We, therefore, consider SVM-based one-classification techniques suitable for detection of malicious IP flows on the basis of experimental results.
In future, our focus will be on implementing a multi-stage intrusion detection system using SVM-based one-class classifier at the first stage. The second stage will classify IP flows into different attack categories and provide deep insight into the malicious traffic. Another point of interest will be to combine multiple one-class classifiers for improvement in performance. Use of unsupervised learning for one-class classification is also a promising research area.