Hybrid Intrusion Detection System

iii


Abstract
With the enormous growth of computer networks and the huge increase in the number of applications that rely on it, network security is gaining increasing importance. Moreover, almost all computer systems suffer from security vulnerabilities which are both technically difficult and economically costly to be solved by the manufacturers. Therefore, the role of Intrusion Detection Systems (IDSs), as special-purpose devices to detect anomalies and attacks in a network, is becoming more important.
Traditionally, intrusion detection techniques are classified into two categories: misuse (signature-based) detection and anomaly detection. However, some researchers have recently proposed the idea of hybrid detection to reap the advantage of misuse detection by having a high detection rate on known intrusions as well as the ability of anomaly detectors in detecting brand-new attacks. Despite the inherent potential of hybrid detection, there are still two important issues that highly affect the performance of these hybrid systems.
First, anomaly-based methods cannot achieve an outstanding performance without a comprehensive labeled and up-to-date training set with all different attack types, which is very costly and time-consuming to create if not impossible. Second, efficient and effective fusion of several detection technologies becomes a big challenge for building an operational hybrid intrusion detection system.
With respect to the aforementioned shortcomings, in this thesis, we introduce an adaptive hybrid network-based intrusion detection system to recognize malicious network activities and report them to the system administrator. The proposed detection system is based on a multi-layer model which consists of three processing layers: 1) Packet Analysis; 2) Intrusion Detection; and 3)

Security Information and Event Management (SIEM).
Packet Analysis layer contains two important modules, namely Flow Analyzer and Traffic Classification which are responsible for grouping the packets and labeling them with the proper application name, respectively. Analyzed packets will be then forwarded to the Intrusion Detection layer for further investigation. Depending on their application labels, each flow will be treated in a specific detection module and will trigger an alert if identified as malicious activity by the hybrid detection system. These alerts are fed into the Security Information and Event Management (SIEM) layer to notify the administrator of potential breaches. Generally, IDSs are using two fundamental approaches. The first one is misuse detection, also called signature-based detection. In this type of IDSs, the search for evidence of attacks is based on knowledge accumulated from known attacks. This knowledge is represented by attacks' signatures which are patterns or sets of rules that can uniquely identify an attack. Being de-signed based on the knowledge of the past intrusions or known vulnerabilities, misuse-based IDSs are also called knowledge-base detection. The advantages of knowledge-based approaches are that they have a very good accuracy and very low false alarm rate. Furthermore, the analysis is detailed meaning that there is enough information about the type of detected attacks; thus, it is easier for the system administrator to take preventive and corrective action.
On the contrary, drawbacks include the difficulty of gathering the required information on the known attacks and keeping it up-to-date with new vulnerabilities. Moreover, misuse-base IDSs are not complete, i.e., they do not have the ability to detect all types of attacks, especially new ones and those involving an abuse of privileges.
The second type of IDSs is anomaly detection or behavior-based detection.
In this approach models of legitimate activities are built based on the normal data, and then the deviation from the normal model will be considered as an attack or anomaly. The main advantage of this approach over misuse detection is that it can detect attempts to exploit new and unforeseen vulnerabilities. It also can help detect "abuse of privileges" types of attacks that do not actually involve exploiting any security vulnerability. However, this approach has its own shortcomings. The main reported problem is high false alarm rate which is caused by two kinds of problems. The first one is the lack of a training data set that covers all the legitimate areas, and the other one is that abnormal behavior is not always an indicator of intrusions. It can happen as a result of factors such as policy changes or offering of new services by a site. Besides the aforementioned problem (high false alarm rate), 3 behavior-based approaches suffer from some other shortcomings as well: • They cannot explain why a detected event is an attack • They cannot provide an explanation for the type of attack • They cannot provide information to respond to the attack In order to overcome these challenges, and keep the advantages of misuse detection, some researchers have proposed the idea of hybrid detection. This way, the system will achieve the advantage of misuse detection to have a high detection rate on known attacks as well as the ability of anomaly detectors in detecting unknown attacks. Despite the inherent potential of hybrid detection, there are still two important issues that highly affect the performance of these hybrid systems. First, anomaly-based methods cannot achieve an outstanding performance without a comprehensive labeled and up-to-date training set with all different attack types, which is very costly and timeconsuming to create if not impossible. Second, efficient and effective fusion of several detection technologies becomes a big challenge for building an operational hybrid intrusion detection system.

Contributions
In this thesis, we introduce an adaptive hybrid network-based intrusion detection system to recognize malicious network activities and report them • An online traffic classification method is proposed based on the weighted unigram distribution of the payloads. A genetic algorithm based scheme is then employed to find appropriate weights in order to achieve a higher accuracy.
• We provide a novel hybrid intrusion detection system which can evolve as the network behavior changes. In addition, we have proposed an efficient fusing algorithm that is able to model the uncertainty attached to each detection method.
• Through a statistical analysis on the KDDCUP99 data set, we found two important issues which highly affects the performance of evaluated systems, and results in a very poor evaluation of anomaly detection approaches. To solve these issues, we have proposed a new data set, NSL-KDD, which consists of selected records of the complete KDD 5 data set and does not suffer from any of identified drawbacks.
• To overcome the shortcomings of available data sets in reflecting real life conditions, we have prepared a data set of full network traces, including packet payloads, which is publicly available to researchers through our website. The ultimate goal is to prepare a public data set for the evaluation of network-based IDSs through imitating our centers network and to conduct several attack scenarios against it.

Thesis Organization
The rest of the thesis is organized as follows: Chapter 2 provides a comprehensive review of three important areas of research that have significant implications for the proposed framework, namely anomaly detection systems, hybrid intrusion detection systems, and network traffic classification. Chapter 3 introduces the drawbacks to the existing evaluation methods of IDSs and provides some solutions for obtaining more realistic results.
In Chapter 4, we propose an adaptive hybrid intrusion detection system to overcome the main shortcomings of the existing IDSs. The proposed detection system is based on a multi-layer model which consists of three processing layers: 1) Packet Analysis; 2) Intrusion Detection; and 3) Security Information and Event Management (SIEM).
Chapter 5 provides two data sets to address some inherent problems in DARPA and KDD data sets which are widely used as the only publicly 6 available data sets for network-based anomaly detection systems. Chapter 6 presents the experimental results of this thesis, followed by a detailed discussion on the provided results.
The thesis is finally concluded in Chapter 7 which summarizes the current work and presents several suggestions for further research in the area of intrusion detection.

Literature Review
In this chapter, we review three important areas of research that have significant implications for the proposed framework. First, we explain anomaly detection systems in more detail. In the second section, we review existing hybrid intrusion detection systems. Network traffic classification methods will be discussed in the third section. Finally, we provide some information about the existing evaluation data sets for network-based intrusion detection systems (NIDS).

Anomaly Detection Systems
As mentioned in the previous chapter, the idea of anomaly detection is to build models of legitimate activities based on the normal data, and then any deviation from the normal model will be considered as an attack or anomaly.

8
To this end, anomaly detectors basically consist of two phases: a training phase and a testing phase. In the former, the normal model is automatically built based on the training data set applying machine learning techniques; in the latter, the learned model is applied to the new testing instances.
The training data set contains a collection of data instances each of which can be described using a set of attributes (features) and the associated labels.
The attributes can be of different types such as categorical or continuous.

Data Labels
Since labeling is often done manually by human experts, obtaining an accurate labeled data set which is representative of all types of behaviors is rather costly. As a result, based on the availability of the labels, three operating 9 modes are defined for anomaly detection techniques: Semi-supervised Anomaly Detection: Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Generally speaking, semi-supervised methods employ unlabeled data in conjunction with a small amount of labeled data. As a result, they highly reduce the labeling processing cost, while maintaining the high performance of supervised methods.
In the area of anomaly detection, however, semi-supervised learning methods assume that the training data has labeled instances for only the normal class. This way, they are more practicable than supervised methods to operate in real networks since they do not require any labels for the anomaly class. One-class SVM is one of the most well-known classifiers of this type which makes a discriminative boundary around the normal instances, and any test instance that does not fall within the learned boundary is declared as anomalous. Although the typical approach in semi-supervised techniques is to model the normal behavior, there exist a limited number of anomaly detection techniques that assume availability of the anomalous instances for training [2,3]. Such techniques are not widely used since obtaining a training set which covers every possible anomalous behavior is almost impossible.
Unsupervised Anomaly Detection: unsupervised techniques do not require training data. Instead, this approach is based on two basic assumptions [4]. First, it assumes that the majority of the network connections represent normal traffic and that only a very small percentage of the traffic is malicious. Second, it is expected that malicious traffic is statistically different from normal traffic. Based on these two assumptions, data instances that build groups of similar instances and appear very frequently are supposed to represent normal traffic, while instances that appear infrequently and are significantly different from the majority of the instances are considered to be suspicious.
While supervised methods are very dependent on the labeled training data which are usually error-prone, time consuming and costly, unsupervised learning techniques avoid such complications by not using the labeled training data and any prior knowledge of attacks or normal instances. Instead, they partition the data into normal operations and anomalies using statistical models.

11
Applying such techniques to the problem of anomaly detection is one of the possible avenues to build large, reliable anomaly detection systems without the need for extensive and costly manual labeling of instances.

Output Format
In addition to the type of training set, anomaly detection methods can be categorized base on the method they use to report the anomalies. Typically, there are three types of output to report anomalies, namely scores, binary labels, and labels.
Scores: in this technique, anomaly detectors will assign a numeric score to each instance which indicates how likely it is that the test instance is an anomaly. The advantage of this technique is that the analyst can rank the malicious activities, set a threshold, and select the most significant ones.

Data Collection
Regardless of the types of data labels and output, anomaly intrusion detection systems (AIDS) can also be classified based on the source of data being acquired and analyzed. Generally speaking, AIDSs are divided into three main categories based on the locus of data which is used to learn normal behavior.
Application-based AIDS: in this approach, sources of data are application log files or network traffic. The research by Kruegel et al. [5] is a good example of application level AIDS. It presents an anomaly detection system that detects Web-based attacks using the Web server log files as input and then produces an anomaly score for each web request.
Host-based AIDS: these techniques analyze activities on a protected host by the acquisition of data from a source that resides on that host. The main data sources of this type of AIDSs are audit-logs and system-calls. In the first method, i.e., audit-logs, the system uses information about a set of events that the operating system (OS) is generating, while in the case of systemcalls, the system independently monitors the behavior of each user-critical 13 application that runs on the operating system.

Network-based AIDS:
NIADSs are designed to scan and analyze network packets and are not restricted to one host. The applicability of networkbased anomaly detectors to large networks, has made them very popular and attracted many researchers to focus their work on network-based anomaly detection.

Techniques Used in Anomaly Detection
From the time anomaly detection was formalized by Denning [6], different methods have been proposed for anomaly detection. In the following we briefly explain some of the methods applied for network-base AIDSs: Bayesian Networks: Bayesian Network is a probabilistic graphical model that represents a set of variables and their probabilistic independencies.
Bayesian networks, literally, are directed acyclic graphs whose nodes represent variables, and whose edges encode conditional dependencies between the variables [7]. They have been applied in anomaly detection in different ways; for example, Valdes et al. [8] developed an anomaly detection system that employed Naive Bayes, which is a two-layer Bayesian network that assumes complete independency between the nodes.
Genetic Algorithms: a genetic algorithm (GA) is a search technique used to find exact or approximate solutions to optimization and search problems.
Because of their flexibility and robustness as a global search method, genetic algorithms have been applied in anomaly detection in different ways. Some approaches [9] have used genetic algorithms directly for the classification of instances, while others [10] have applied this technique for feature selection.
Neural Networks: a Neural network is a network of computational units that jointly implement complex mapping functions. Initially the network is trained with a labeled data set. Testing instances are then fed into the network to be classified as either normal or anomalous. Support Vector Machines (SVM) is an example of Neural Network technique which is widely used in anomaly detection [11].
Immune System Approach: the natural immune system uses a variety of evolutionary and adaptive mechanisms to protect organisms from foreign pathogens and misbehaving cells in the body. Artificial immune systems (AIS) seek to capture some aspects of the natural immune system in a computational framework, either for the purpose of modeling the natural immune system or for solving engineering problems. In either form, a fundamental problem solved by most AIS can be thought of as learning to discriminate between "self" (the normally occurring patterns in the system being protected, e.g., the body) and "nonself" (foreign pathogens, such as bacteria or viruses, or components of self that are no longer functioning normally). Since the role of immune systems in human body is similar to the role of intrusion detection systems, AISs have widely been applied in anomaly detection [12].
Inductive Rule Generation Algorithms: the most famous technique in this category is decision tree that is a predictive model mapping observations about an item to conclusions about its target value.
Clustering: these techniques are usually based on two important assumptions [4]. First, the majority of the network connections represent normal traffic and that only a very small percentage of the traffic is malicious. Second, malicious traffic is statistically different from normal traffic. If these two assumptions hold, anomalies can be detected based on the cluster size, i.e., large clusters correspond to normal data, and the rest correspond to attacks.
Outlier Detection: this approach is based on the idea of semi-supervised learning in which the system is trained based on normal data, and then any instances that do not fit in the normal profile will be considered as anomaly.

Hybrid Intrusion Detection Systems
Despite the great capability of anomaly detection systems in detecting unknown or zero-day vulnerabilities, these methods suffer from a major deficiency namely their high false alarm rate. This is mainly caused for two reasons. The first is the lack of a training data set that covers all the legitimate areas, and the second is that abnormal behavior is not always an indicator of intrusions. The abnormal behavior can also happen as a result of factors such as policy changes or the offering of new services by a site.
In order to overcome these challenges and keep the advantages of misuse detection, some researchers have proposed the idea of hybrid detection. This will help the system take advantage of misuse detection to have a high detection rate on known attacks as well as the ability of anomaly detectors 16 in detecting unknown attacks. According to this fusion approach, current hybrid IDSs can be divided into two categories: 1) sequence-based, in which either anomaly detection or misuse detection is applied first, and the other one is applied next; 2) parallel-based, in which multiple detectors are applied in parallel, and the final decision is made based on multiple output sources.
The most common type of hybrid system is to combine a misuse detection and an anomaly detection together. In such a hybrid system, the signature detection technique detects known attacks, and the anomaly detection technique detects novel or unknown attacks.
In [13], Tombini et al. applied an anomaly detection approach to create a list of suspicious items. Then a signature detection approach is employed to classify these suspicious items into three categories, namely false alarms, attacks, and unknown attacks. The approach is based on an assumption that a high detection rate can be achieved by the anomaly detection component because missed intrusions in the first step will not be found by the follow-up signature detection component.

Zhang et al. proposed a hybrid IDS combining both misuse detection and
anomaly detection components, in which a random forest algorithm was applied firstly in the misuse detection module to detect known intrusions [14].
The outlier detection provided by the random forest algorithm was then utilized to detect unknown intrusions. Evaluations with a part of the KD-DCUP'99 data set showed that their misuse detection module generated a high detection rate with a low false positive rate, and at the same time the anomaly detection component had the potential to find novel intrusions.
In [15], Peng et al. proposed a two-stage hybrid intrusion detection and visualization system that leverages the advantages of signature-based and anomaly detection methods. It was claimed that their hybrid system could identify both known and unknown attacks on system calls. However, evaluation results for their system are missing in the paper. The work is more like an introduction on how to apply a multiple stage intrusion detection mechanism for improving the detection capability of an IDS.
Similar to [14], Depren et al. proposed a novel hybrid IDS system consisting of an anomaly detection module, a misuse detection module, and a decision support system [16]. The decision support system was used to combine the results of the two previous detection modules. In the anomaly detection module, a Self-Organizing Map (SOM) structure was employed to model normal behavior and any deviation from the normal behavior would be classified as an attack. In the misuse detection module, a decision tree algorithm was used to classify various types of attacks. ing a signature-based method, Snort, and an anomaly detection system. In contrast to the other hybrid IDSs, their approach only relies on Snort to generate the alerts, and the anomaly detection is only used to automatically generate Snort. To this end, normal traffic is passed to the anomaly system to build a normal profile of frequent episode rules (FER). Having done with the training phase, the real traffic will be fed to the system. FERs generated from the real traffic will be compared to the normal profile and considered as suspicious if it does not match any of the FREs in the normal profile.
When the matched rule occurs beyond the threshold, it will be reported as an anomaly and the system will automatically add the rule to the Snort.

Traffic Classification
Early common techniques for identifying network applications rely on the association of a particular port with a particular protocol [22]. Such a port number based traffic classification approach has been proved to be ineffec-

Signature-Based Traffic Classifier
An alternative to traditional port number based application classification is to inspect the content of payload and seek the deterministic character strings for modeling the applications. In [23], Gummadi [28,29,30,31]. Based on the payload signatures, the application classifier can obtain an extremely high accuracy. However, the biggest limitation is that all the above mentioned approaches focus on identifying only one single application (e.g. KaZaA in [23] or Skype in [26]) or one application group (e.g. Chat traffic identification in [29] or P2P traffic identification in [25]).

Statistical Traffic Classifier
The usage of statistical properties for network traffic classification or at least traffic behavioral modeling is not new. The early studies on the subject can be traced back to the seminal report by Paxson et al. [32,33], in which some statistical variables (e.g. packet length, inter-arrival times and flow duration) have been proved to be suitable to express the behavior of a few protocols.
With the increase of newly appeared network applications, the problem now has become how to associate a given flow, characterized by a set of statistics, to a specific application. As a result, machine learning techniques can naturally achieve such a classification task through their training and learning capabilities. In According to an anomaly score, the protocol fingerprints allow the measurement of "how far" an unknown flow is from the basic characteristics of each protocol. A simple classification algorithm is then applied to classify flows dynamically when packets pass through the classifier, deciding if a flow belongs to a given application layer protocol, or if it was generated by an "unknown" (i.e., non-fingerprinted) protocol. As claimed in their evaluation, the limitation of the approach is that it can identify only 3 protocols, namely SMTP, POP3 and HTTP. In

25
The evaluation results basically group traffic into different application types, like bulk transfer, small transactions, etc. However, further work is necessary in order to obtain the more specific applications groups.
Although all these techniques show their capability for traffic classification to some extent, the number of applications they can identify is very limited. In addition, the definition of application classes is very rough and is not precise enough to obtain the fine-grained applications. One of the few exceptions is the work conducted by Erman at al. [39], in which they employed a semisupervised learning technique to classify over 29 applications.

Performance Evaluation
Conducting a thorough analysis of the recent research done in anomaly detection, we encounter some machine learning methods reported to have a very high detection rate of 98% while keeping the false alarm rate at 1% [40].
However, when we look at the state of the art IDS solutions and commercial tools, there is little evidence of using the anomaly detection approach, and people still think that it is an immature technology. Recent studies show that there are some inherent problems in DARPA and KDD data sets which are widely used as the only publicly available data sets for network-based anomaly detection systems.

KDD CUP 99 Data Set Description
Since 1999, KDD'99 [41] has been the most widely used data set for the evaluation of anomaly detection methods. This data set is prepared by Stolfo et al. [42] and is built based on the data captured in DARPA'98 IDS evaluation program [43]. DARPA'98 is about 4 gigabytes of compressed raw data of 7 weeks of network traffic, which can be processed into about 5 million con- It is important to note that the test data is not from the same probability distribution as the training data, and it includes specific attack types not in the training data which make the task more realistic. Some intrusion experts believe that most novel attacks are variants of known attacks and the signature of known attacks can be sufficient to catch novel variants.
The data sets contain a total number of 24 training attack types, with an additional 14 types in the test data only. The name and detail description of the training attack types are listed in [44].
KDD'99 features can be classified into three groups: 1. Basic features: This category encapsulates all the attributes that can be extracted from a TCP/IP connection. Most of these features lead to an implicit delay in detection.

Traffic features:
This category includes features that are computed with respect to a window interval and is divided into two groups:

Inherent Problems of KDD'99 Data Set
As mentioned in the previous section, KDD'99 is built based on the data captured in DARPA'98 which has been criticized by McHugh [45], mainly because of the characteristics of the synthetic data. As a result, some of the existing problems in DARPA'98 remain in KDD'99. However, there are some deliberate or unintentional improvements, along with additional problems.
In the following, we first review the issues in DARPA'98 and then discuss the possible existence of those problems in KDD'99. Finally we discuss the author's observations of the KDD data set.
1. For the sake of privacy, the experiments chosen to synthesize both the background and the attack data, and the data are claimed to be similar to that observed during several months of sampling data from a number of Air Force bases. However, neither analytical nor experimental validation of the data's false alarm characteristics were undertaken.
Furthermore, the workload of the synthesized data does not seem to be similar to the traffic in real networks. Fortunately the aforementioned simulation artifacts do not affect the KDD data set since the 41 features used in KDD are not related to any of the weak-nesses mentioned in [46]. However, KDD suffers from additional problems not existing in the DARPA data set.
In [4], Portnoy et al. partitioned the KDD data set into ten subsets, each containing approximately 490,000 instances or 10% of the data. However, they observed that the distribution of the attacks in the KDD data set is very uneven which made cross-validation very difficult. Many of these subsets contained instances of only a single type. For example, the 4 th , 5 th , 6 th , and 7 th , 10% portions of the full data set contained only smurf attacks, and the data instances in the 8th subset were almost entirely neptune intrusions.
Similarly, the same problem with smurf and neptune attacks in the KDD training data set has been reported in [47]. The authors have mentioned two problems caused by including these attacks in the data set. First, these two types of DoS attacks constitute over 71% of the testing data set which completely affects the evaluation. Secondly, since they generate large volumes of traffic, they are easily detectable by other means, and there is no need use anomaly detection systems to find these attacks.

Summary
In this chapter, we have provided a brief introduction of anomaly intrusion detection systems (AIDS) and their various taxonomies based on the availability of data labels, output format, and sources of data. We have also given an overview of some existing machine learning techniques that have been suc-32 cessfully applied for intrusion detection. Since the main focus of this thesis is to propose an application-specific hybrid intrusion detection system, we have devoted two separate sections to review the existing research in the area of hybrid IDSs and also traffic classification methods. Finally, we have discussed the current issues and difficulties that must be overcome to have an accurate performance evaluation of intrusion detection systems.

Detection Systems
During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in de-tecting novel attacks. Conducting a survey on the recent research done in anomaly detection, we encountered some machine learning methods reported to have a very high detection rate of 98% while keeping the false alarm rate at 1% .
However, when we look at the state of the art IDS solutions and commercial tools, there is little evidence of using the anomaly detection ap-proach, and people still think that it is an immature technology.
To find the reason for this contrast, we studied the details of the research done in anomaly detection and considered all the aspects such as learning and detection approaches, applied data sets and evaluation methods. Our 34 study shows that there are some problems with the employed data sets and evaluation methods. In this chapter, we mention all the drawbacks to the existing evaluation methods and provide some solutions for obtaining more realistic results.

Evaluation Methodology
In order to conduct a comprehensive survey of anomaly detection systems, we collected all research papers indexed by the Digital Bibliography and

Employed Data Sets
The applied data sets, both training and testing, play an important role in the evaluation of anomaly detection methods. However, due to the criticism of existing data sets and also privacy issues of employing real traffic, preparing a data set has become one of the biggest challenges in the area of intrusion detection.
As it is shown in Figure 3.1, the most dominant evaluation of anomaly de-   Among the publicly available data sets, the Defence Advanced Research Projects Agency (DARPA) evaluation data set (24%) and the Knowledge Discovery and Data mining (KDD) data set (28%) are the most widely used data sets for anomaly detection. In total these two data sets are used in more that 50% of the studied papers. DARPA [44] refers to a series of data sets generated in 1998, 1999 and 2000 in the MIT Lincoln Laboratories, specifically for the testing of intrusion detection systems. The sets consist of simulated normal traffic and network attacks are manually generated. The KDD set [41], known as KDD'99 is prepared by Stolfo et al. [42] and is built based on the data captured in DARPA98 IDS evaluation program [43]. Although these two data sets have had a significant role in the evaluation of IDSs, their accuracy in simulating real networks conditions have been extensively  criticized in [45,48,46].
In the host-based intrusion detection systems, people have mostly used a synthetic data set called the University of New Mexico (UNM) data set [49], which is employed in 36.5% of the host-based research papers, and DARPA has the second place among the host-based research papers with 26%. However, there is no evidence of using the KDD data set since it is specifically designed for network-based IDSs. The detailed statistics of applied data sets in both host-based and network-based intrusion detection systems are listed 39 in Table 3.3.

Data Processing
In addition to the current issues with the employed synthetic data sets, data processing is also playing an important role in having an accurate evaluation.
In contrast to the papers published in the area of machine learning and data mining, much of the published research in the intrusion detection field do not pay attention to this important stage. In the following we briefly explain some of the common pitfalls identified in the surveyed papers.

Definition of anomaly
One of the biggest issues affecting the performance of anomaly detection systems is the interpretation of data and defining them as either normal or anomaly. As it is defined in the seminal paper of Denning [6], any deviation from normal traffic will be considered as anomaly. In addition, in her paper, she assumes that malicious behavior is anomalous, and therefore detecting anomalous behavior will result in the detection of malicious activities. Although this assumption might have been true in 1987 and specially for the host-based IDSs, it is not applicable to today's networks for the purpose of intrusion detection. Apparently, based on this assumption people in academia still define anomaly as abnormal behavior, while network administrators think of anomalies as any activities with the potential to threaten 40 their networks. The necessity of having a clear definition of anomalous behavior has been also highlighted by Gates and Taylor [50] and Ringberg et al. [51]. However, the majority of current research on anomaly detection, 88% of the surveyed papers, do not explicitly define what type of information is considered as anomaly, and the implicit assumption in academia is to refer the abnormal traffic as anomaly while the main target of proposed methods and evaluation systems is to detect attacks and malicious behavior.

Data scaling and normalization
Although scaling is not always necessary, removing this step when the features are not uniform result in a bias toward data with a larger range of values, and consequently impact the final outcome of the algorithm. Equation 3.1 shows one of the most common forms of scaling numeric variables in the range [0, 1]. The drawback of this scaling method is that when there is an outlier in the data set, a "normal" data will be mapped into a very small interval. Normalization, on the hand, transforms data to have a mean value of zero so that outliers can be easily detected. One of the common forms of normalization also called standardization is depicted in Equation 3.2. The main problem of this method is that it assumes the data is generated with a Gaussian law with a certain mean and standard deviation, which might not be true in reality.
x − x min While the necessity of scaling cannot be judged in the data sets where little information is available, we can point out with certainty that publicly available data sets such as DARPA and KDD require such normalization.

Feature selection
Data sets usually contain various features, and a proper feature selection method has a high impact on the performance of detection methods. However, our survey shows that only 24% of network-based approaches have applied feature selection (see Table 3

Experiments
Experiment setup is another factor affecting the evaluation results of anomaly detection systems. This factor can be investigated from two different aspects, experimentation procedure and proper documentation. Table 3.5 lists a detailed statistic of the experiments in the reviewed papers.

Experimentation procedure
It is a common practice in machine learning to run the experiments in several rounds and on various data sets to verify the analysis is data set independent and can be generalized for real applications. Cross-validation is one of the techniques that is widely used in machine learning research. Each round of cross-validation involves partitioning the data set into two complementary parts of training and testing. To increase the reliability and validity of the results, researchers perform multiple rounds of cross-validation and average the results over the rounds. As an example, in 10-fold cross-validation the data set will be divided into 10 equal parts, and the experiments will be done 44 in 10 rounds such that in round i, the ith section will be used as the testing set and the rest will compose the training set to build the model. A total of 12 papers reported the use of cross-validation. There were also a few papers reported that the best obtained result as the overall performance of the system.

Proper documentation
This phase is very crucial for researchers in order to reproduce the experimental results for evaluation purposes. However, it does not receive proper attention by researchers working on intrusion detection. This is especially important in case of very large data sets such as DARPA and KDD, in which people employ a portion of them for the experiments due to the limited time and available resources. Unfortunately, out of 194 papers using the publicly available data sets, close to two-thirds (63%) did not properly specify which sets were used for training and testing of the approach. Among experimental data sets that were not public, these numbers were slightly higher: out of 88 papers, 59% did not describe the training and testing sets.
Another aspect related to the employed testing set is the ratio of anomalous and normal records in the testing data. An assumption of rareness of anomalies, i.e., the existence of a small portion of anomalous records compared to the volume of normal activity, is common in the intrusion detection domain [50]. Recently there have been several studies showing that this picture is changing and nowadays abnormal traffic on the Internet (including scanning activity) cannot be quantified as rare [52,53].
Reviewing the network-based studies on the DARPA and KDD data sets, we noticed a great variability in the employed ratio of abnormal to normal activity. Among 34 papers that specified this ratio for the testing set, the majority of the studies (24 out of 34) experimented with a high percentage of abnormal activity (30%-82%) in the data. It should be also noted that 19 of 46 these studies worked with the KDD data set that has 81% of abnormal activity. As some of these studies employed random sampling, a final percentage of abnormal activity ranged from 80% to 82%. 2 papers experimented with 6%-20% of abnormal activity in the set, and only 8 papers out of 34 (23.5%) assumed a low probability of intrusive activity, using a 1%-2% abnormal to 99%-98% normal activity ratio.

Performance Evaluation
The performance of anomaly detection techniques can be generally evaluated from two perspectives: 1. Efficiency: This measure deals with the resources needed to be allocated to the system including CPU cycles and main memory.

Effectiveness:
This measure which is also called classification accuracy represents the ability of the system to distinguish between normal and intrusive activities.
As the current intrusion detection systems are experiencing serious issues to gain acceptable detection and false alarm rates, researchers have mostly concentrated on the effectiveness measures and do not pay much attention to the efficiency of their systems. Our survey shows that only 19% of the papers have conducted a performance study on time and memory complexities.
In contrast, effectiveness of the intrusion detection systems have been extensively studied and there are many approaches proposed to have a better 47 evaluation of IDSs. In the rest of this section, we will summarize the most popular evaluation metrics which have been employed for the comparison of IDSs.

Confusion Matrix
Anomaly detection methods are usually applied to distinguish between anomalous and normal traffic. So, here we are interested in performance measures which are applied in binary classification. To the best of our knowledge, confusion matrix is the best way of presenting the binary classification result (Table 3.  • The big circle defines the space of the whole data (i.e., normal and intrusive data) • The small ellipse defines the space of all predicted intrusions by the classifier. Thus, it will be shared by both TP and FP.
• The ratio between the real normal data and the intrusions is graphically represented by the use of a horizontal line.

ROC Curves
Perhaps the oldest measure used to evaluate machine learning methods is Accuracy. As mentioned this measure shows the ratio of correctly classified samples to the total number of samples in the data set. Since in the data sets used in intrusion detection, normal data usually outnumbers intrusions, e.g. 95% normal traffic and 5% attack, the Accuracy measure is misleading because a system that always classifies all data as normal would have a high accuracy (95% in our example). So, we need measures being able to evaluate the performance invariant of the bias in the distribution of labels in the data. Receiver Operating Characteristic (ROC) curves were the measure introduced to meet this aim by using a trade-off between false positive and detection rates.
Originating from signal detection theory [54], ROC curves are used on the one hand to visualize the relation between detection and false positive rates of a certain classifier while tuning it, and on the other hand to compare the accuracy of several classifiers.
Although this measure is very effective and widely used in anomaly detection, it has some shortcomings. The first drawback is that it is dependent on the ratio of attacks to normal traffic. In [55] and [14], the authors have used different data sets with various ratios of attack to normal. Their result shows that as we decrease the ratio of attack in the data set, we are getting better performance on the ROC curve. This issue is not very problematic when we are comparing different methods run on the same data set. However, the comparison of anomaly detection methods run on various data sets is completely wrong, unless they both have the same ratio of attack to normal. The second problem with ROC curves is that they might be misleading and simply incomplete for understanding the strengths and weaknesses of a proposed method [48,56,57,58].

Precision, Recall and F-Measure
As previously mentioned, under normal operating conditions there is a big difference between the rate of normal and intrusion data. Thus, the Precision, Recall, and F-Measure metrics ignore the normal data that has been correctly classified by the IDS (TN).
Precision: It is a metric defined with respect to the intrusion class. It shows how many examples, predicted by an IDS as being intrusive, are the actual intrusions [59]. The aim of an IDS is to obtain a high Precision, meaning that the number of false alarms is minimized.
Recall: This metric measures the missing part from the Precision; namely, the percentage from the real intrusions covered by the classifier. Consequently, it is desired for a classifier to have a high recall value [59].
This metric is equivalent to the detection rate (DR).
F-Measure: Due to the fact that the previously discussed two metrics (i.e., Precision, and Recall) do not completely define the accuracy of an IDS, a combination of them would be more appropriate to use. Being defined � n as the harmonic mean of precision and recall, the F-Measure mixes the properties of the previous two metrics [59]. false alarms and detects 100% of the attacks. Thus, the F-Measure of a classifier is desired to be as high as possible.

Positive and Negative Predictive Values
Base-rate fallacy is an error that occurs in conditional probabilities when there is no prior probability of the hypothesis and the evidence [60]. As pointed out by Axelsson [58], this problem occurs in the evaluation of intrusion detection systems since there is a huge difference between the prior probabilities of normal and intrusive connections.
Lets assume that I denotes intrusion, and ¬I denotes normal behavior. Similarly, let A denote an alarm generated by the IDS. Applying general from of 53 ≡ | Bayes theorem (Equation 3.9) we can conclude that: Where P (I|A) is called Bayesian detection rate or positive predictive value (PPV) which shows what percentage of alerts are true positive; DR and FPR are detection rate and false positive rate, respectively; and p , also called the base-rate, is the prior probability of having intrusion in the data set and can be estimated using the following formula: To make this problem more clear, let p = 10 −5 which means that on average only 1 out of 100,000 connections is an attack. Assuming that the detection rate is 100% and false positive rate is 1% we will have This result shows that even for the unrealistically high detection rate of 100% and false positive rate of 1%, only one alert out of 1000 alerts is a real intrusion and the rest are false alarms. In other words, to have a Bayesian detection rate of 50% (half of the alarms will be a true indication of intrusive activity), we need to have a very low false alarm rate of 10 −5 .
Due to the very low number of attacks in real networks, Axelsson believes that the Bayesian detection rate (PPV) is a more effective metric compared to the detection rate (DR). However, as can be seen in Equation 3.10, PPV is maximized when the false positive rate approaches zero, no matter what the value of the detection rate is. Therefore, he has defined another metric called the negative predictive value (NPV), Equation 3.13, and mentioned that there has to be a trade-off between the values of PPV and NPV.

B-ROC Curves
In order to overcome the deficiencies of ROC curves and the predictive values, Cárdenas and Baras proposed a new metric called B-ROC curves [61]. This new metric uses the same intuition as the ROC curves. However, instead of using false positive rate for the X axis, it uses 1 − P P V . This measure is called the Bayesian false alarm rate (P (¬I|A) and indicates that what percentage of generated alarms are false positive.

Proposed Framework
Except for the problems discussed in Chapter 3, Sommer and Paxson [62] point out some other issues which result in rare deployment of anomaly detection systems in real networks. In their paper, they mention that intrusion detection has special characteristics that make it harder to effectively deploy machine learning techniques. In the following we briefly overview these characteristics. Having these issues in mind, we have proposed a new framework for networkbased anomaly detection which is discussed in the rest of this chapter.

Overall View
To overcome the shortcomings of existing intrusion detection systems, a multi-layer model is provided (Figure 4.1) which consists of three processing layers: 1) Packet Analysis; 2) Intrusion Detection; and 3) Security Information and Event Management (SIEM).

Packet Analysis
Being responsible for all the preprocessing tasks required for the intrusion detection, Packet analysis layer contains two important modules, namely flow analyzer and traffic classification.

Intrusion Detection
As indicated in Figure 4

Security Information and Event Management
Although

Traffic Classification Module
Accurate classification of network traffic has received a lot of attention due to its important role in many subjects such as network planning, QoS pro-visioning, class of service mapping, to name a few. Traditionally, traffic classification relied to a large extent on the association of a particular port with a specific protocol [22]. Such a port number based traffic classification approach has been proved to be ineffective due to: 1) the constant emergence To have a better understanding of this approach, Table 4.1 illustrates the signatures of 11 typical applications in which to print the signatures, alphanumeric characters are represented in the normal form, while non-alphanumeric ones are shown in the hex form starting with "0x".
As can be seen in Table 4.1, each application has a unique set of characters However, these signatures do not necessarily start from the beginning of the payload. For example, to identify HTTP Image Transfer traffic, we should search for the string "image/" starting from the 5 th byte of the payload. This starting point is referred as offset in Table 4.1. Moreover, it is important to know which side of the connection, client or server, produce the signature.
For example, the signature to detect HTTP Web is in the source payload, i.e., the ASCII string "GET" is sent by the client, initiator of the connection, to the server. This information helps signature-based methods to improve their performance by looking at a fewer number of signatures in either the source or destination payloads.
Although this approach has a high accuracy for identifying network applica-tions, it fails to detect 20% to 40% of the network flows. Some of the reasons that cause this problem are: • Whenever a newer version of an application is released, all the signatures should be updated which usually cannot be done immediately.
• There are hundreds of applications being developed by small groups around the world which networking companies are not aware of.      Having specified the applied network features, we focus on the selection of a high-performance classifier. In order to choose an appropriate classifier, we selected some of the most popular classification methods implemented by Weka [68] and performed some evaluations to compare the accuracy, learning time, and classification time. We finally selected the J48 decision tree, the Weka implementation of C4.5 [69], since it has a high accuracy while maintaining a reasonable learning time. In addition, decision trees are shown to have a reasonable height and need fewer comparisons to reach a leaf which is the final label, and therefore have a very short classification time. Taking advantage of the J48 decision tree as a traffic classifier, we evaluated our proposed method on two real networks.
Although our evaluations on the two networks showed promising results, we still believe that the performance can be improved by assigning different weights to the payload bytes based on the degree of importance. However, finding the appropriate weights is a challenging task. To this end, we employ a genetic algorithm based scheme to find the weights.
In the next section, we formally define our problem and explain how we apply genetic algorithms to improve the accuracy of our proposed traffic classification method.

Problem Formulation
In this section, we formally describe how the network application discovery problem can be performed through the combination of genetic algorithms and decision trees. Essentially, we formulate the network application discov-i s/d ery problem as a classification problem, i.e., given the values for a specific set of features extracted from the network flows, we identify the possible application that has generated this payload using a statistical machine learning technique (decision tree). In other words, we assume that each payload uniquely exhibits certain characteristics that very well represents the application that has produced the payload; therefore, these features can be employed to discover that network application.
As was mentioned earlier, Wang et al. [66] were the first to view the payload as Based on this definition, we need to employ a classifier that infers the correct network application label given the features from the payload. To achieve this we employ the J48 decision tree learning algorithm [69] that builds a decision tree classifier from a set of sample payloads and their corresponding network application labels. The learned decision tree classifier will enable us to find the possible network application label for a payload from an unknown network application. Here, the learned J48 classifier is our required 8' mapping function.
Now lets consider the HTTP Web application payload. As it can be seen in where P represents the positions in which κ s/d has appeared in Ψ.
Simply, this definition defines that each position in the payload has its own significance in the application discovery process; therefore, some of the features may be more discriminative of the applications and hence should receive a higher weight. Based on this weighting scheme, we now revise the dimensions of our vector space such that the frequency of each ASCII character observed in the payload is multiplied by the weight of the position that it was located. For instance, if ∞ is observed in positions 5 and 210, and the weight for 5 and 210 are 1 and 8, respectively, the value for the 236 th dimension of the representative vector would be 9 rather than simply being 2.
Accordingly, we have:

Definition 4 (Extends Definition 2). 8' ω : Ψ → Λ is a learning classification
machine that maps a payload Ψ with frequency ων to a network application such as λ ∈ Λ.
It is obvious that 8' is a special case of 8' ω when ∀p ∈ [0, η], ω(Ψ p ) = 1. Now, since 8' ω is sensitive to the weights given to the positions in the payload, a method needs to be devised to compute the appropriate weights for the positions. For this purpose, we employ a genetic algorithm based process to find the weights.
Briefly stated, genetic algorithms are non-deterministic and chaotic search methods that use real world models to solve complex and at times intractable problems. In this method the optimal solution is found by searching through a population of different feasible solutions in several iterations. After the population is studied in each iteration, the best solutions are selected and are moved to the next generation through the applications of genetic operators.
After adequate number of generations, better solutions dominate the search space therefore the population converges towards the optimal solution. We employ this process over the weights of the positions in the payload to find the optimal weights for each location. The process that we employ is explained in the following: • The objective is to find the optimal weight vector ω O from a payload Ψ with length of η. So, initially a pool of random weight vectors of length η are generated: ω 1 , ...ω n ; • The process of the genetic algorithm is repeated for Gen generations; • Once the algorithm reaches a steady state and stops, the weight vector with the best fitness is selected and will be used as the most appropriate weight vector (ω O ) for the application discovery process using 8' ω O .
In the above process shown in Figure 4.4, the weight vectors that are needed for the payload are defined using the genes in the genetic algorithm and the fitness function is defined as the accuracy of the learning classification machine developed based on that specific weight vector (gene). The outcome of the genetic algorithm process provides us with the optimal set of weights for the positions of the ASCII characters in the payloads. This optimal set of weights can be used to learn a classifier that can best find the network application for new payloads whose generating application is not known.
To implement this process, we employed the JGAP (Java Genetic Algorithms Package) software package [70].

Intrusion Detection Module
Intrusion detection has been extensively studied since the seminal work by Anderson [71]. Traditionally, intrusion detection techniques are classified into two categories: misuse (signature-based) detection and anomaly detection.
Misuse detection is based on the assumption that a large number of cyber attacks leave a set of signatures in the stream of network packets or in audit trails, and thus attacks are detectable if these signatures can be identified by analyzing the audit trails or network traffic behavior. However, misuse detection is strictly limited to the known attacks and detecting new attacks is one of the biggest challenges faced by misuse detection.
To address the weakness of misuse detection, the concept of anomaly detection was formalized in the seminal report of Denning [6]. In this approach models of normal data are build based on normal traffic, and then the deviation from the normal model will be considered as an attack or anomaly.
The main advantage of this approach over misuse detection is that it can detect attempts to exploit new and unforeseen vulnerabilities. It can also help detect "abuse of privileges" types of attacks that do not actually involve exploiting any security vulnerabilities. However, this approach has its own shortcomings.

79
As discussed in the beginning of this chapter, the original idea of anomaly detection, i.e. learning normal behavior and labeling outliers as anomalies, is not practical in real-life problems since it causes a high amount of false alarm rate. This is mainly because of two well-know issues: 1) Lack of a training data set that covers all legitimate areas; and 2) Abnormal behavior is not always an indicator of intrusions. It can happen as a result of factors such as policy changes or offering of new services by a site.
In order to overcome these challenges, and keep the advantages of misuse detection, some researchers have proposed the idea of hybrid detection. This way, the system will achieve the advantage of misuse detection to have a high detection rate on known attacks as well as the ability of anomaly detectors in detecting unknown attacks.
Although with respect to the characteristics of signature-based and anomalybased methods, the fusion of these two approaches should theoretically provide an effective IDS, there are still two important issues that make this task cumbersome. First, anomaly-based methods cannot achieve an outstanding performance without a comprehensive labeled and up-to-date training set with all different attack types, which is very costly and time-consuming to create if not impossible. Second, efficient and effective fusion of several detection technologies becomes a big challenge for building an operational hybrid intrusion detection system.
In the rest of this section, we provide a novel solution to overcome the aforementioned problems.

Anomaly-based Detector
As the first step to have an effective anomaly detector, we should extract robust network features that have the potential to discriminate anomalous behavior from normal network activities. Since most current network intrusion detection systems use network flow data (e.g. netflow, sflow, ipfix) as their information sources, we focus on features generated based on these flows. The name and description of the applied features are listed in Ta

Signature-based Detector
As our first signature-based detector we chose Snort [5] because of its popularity and availability to researchers. However, our proposed hybrid detection scheme is completely independent from Snort, and any other signature-based detector can be used instead. As mentioned earlier, our anomaly-based detector works on flows. However, Snort is designed to work on packets. To make our detectors consistent, we matched Snort alerts with the existing flows based on the source IP, source port, destination IP, destination port, and time stamp. Since the flows and Snort alerts are generated by differ- the total number of packets sent from the destination to the source SrcBytes/DstBytes the ratio of "SrcBytes" to "DstBytes" SrcPackets/DstPackets the ratio of "SrcPackets" to "DstPackets" SrcBytes/SrcPackets the ratio of "SrcBytes" to "SrcPackets" DstBytes/DstPackets the ratio of "DstBytes" to "DstPackets" ent devices, we were not very strict with the time stamps and considered a deviation of up to 5 seconds acceptable.
In addition to Snort, we employed QRadar Rules [64] as our second source of signature-based detection system. Analyzing the extracted information from network packets and flows, QRadar Rules provided a very strong mechanism to discover malicious activities hidden in network communications.

Proposed Hybrid Detector
The most important issues that current anomaly detectors deal with are firstly to prepare a comprehensive labeled data set, and secondly to keep that data set up-to-date. To solve these problems we have proposed to apply 82 the idea of adaptive learning. To meet this goal we have defined learning time intervals, e.g. 1 day, at the end of which the anomaly-based detector will be trained by the two most recent training sets. These training sets are the flows labeled by the hybrid detector in the previous intervals. Figure 4.5 illustrates the structure of the hybrid detector. In the first interval that there is no training set, the hybrid system only relies on the labels from signature-based detectors. These labels will be used as a training set for the anomaly-based detectors in the next time interval. During the second time interval, final labels of the hybrid system will be used as a training set to feed anomaly-based detectors. In order to optimize the performance of anomalybased detectors, we have added a filtering module, Train Set Optimizer, to make sure that those records which have a support level of 50% or higher will be forwarded to the learners. Finally, we use a fusing algorithm to combine the results from signature-based and anomaly-based detectors. The final result will be both reported to the administrator and used as a portion of training set in the next time interval.

Fusing Algorithm
Investigating the state-of-the-art fusion methods to combine the detection result of various IDSs, we found out that the existing methods are not able As discussed in [72], classic probability theory is not capable of modeling Epistemic Uncertainty due to the following issues: 1. It requires the probability of all possible events. When the information is not available, the uniform distribution function is often used as justified by Laplace's Principle of Insufficient Reason. This approach, however, will not produce accurate results as it assumes that all events 84 with unknown probability distributions are equally likely.
2. It is not capable of assigning probabilities to sets of events. As a result, the summation of all probabilities assigned to possible singleton elements must be equal to one. This assumption is not in line with epistemic uncertainty as it requires to assign probabilities to select sets of events.

Dempster's Rule of Combination
Dempster-Shafer theory of evidence [73,74] is the most widely used approach to address uncertainty in probability theory. Dempster-Shafer theory is considered as the generalization of probability theory where probabilities are assigned to sets as opposed to singleton elements. This features makes D-S theory capable of quantifying the lack of knowledge with regards to a certain phenomenon.

Definition 5.
Given a sequence of basic probability assignments m 1 , · · · , m n , the generalized Dempster's rule of combination is defined as: n ( a::

Dempster's Rule of Combination for Binary Variables
Although Dempster's rule of combination is one of the most effective fusion methods, it is computationally very complex, which prevents it being widely used in real-world applications. To overcome the shortcoming of Dempster's rule, Srivastava proposed an alternative form of Dempster's rule for combining sources of evidence that pertain to binary variables [75].
n n ( a:: n n ( a:: where K is defined as: In order to provide an efficient mechanism for the fusion of intrusion de-86 tectors, we employ the general form of Dempster's rule of combination for binary variables (Theorem 1). The detailed proof of this Theorem is provided 86 in Appendix A.

Quantification Method
In order to benefit from Dempster's rule of combination, one needs to quantify the support (belief) of labels assigned to each set. As most of the intrusion detection systems are not capable of providing support for their final decision, we have proposed a general approach to quantify the decisions made by IDSs.
The idea of our approach is based on the fact that intrusive activities are not isolated but related as different stages of attack sequences, with the early stages preparing for the later ones. As a result, intrusions will cause a series of alarms to be generated in detection systems. Theses series of alarms can be usually related through IP addresses and port numbers.
Taking advantage of this similarity amongst related alarms, we propose a method to cluster network traffic based on four important features: 1) source IP address; 2) destination IP address; 3) source port number; and 4) destination port number. Each cluster will then be analyzed separately to assign a support factor to each network flow labeled as either "normal" or "intrusive".
Since well-know distance functions such as Euclidean distance are not suitable to measure the similarity between either IP addresses or port numbers, two customized distance function are introduced to provide an effective approach for measuring the similarities.

1
Assuming Γ is a set of four binary variables, Γ = {γ 1 , γ 2 , γ 3 , γ 4 }, the distance between Ω 1 and Ω 2 can be defined as: 4 4 ∆(Ω 1 , where k=1 k=1 Definition 7. Let ρ 1 and ρ 2 be either the source or destination port numbers from from two different network flows. Assuming Γ is a set of two binary variables, Γ = {γ 1 , γ 2 }, the distance between ρ 1 and ρ 2 can be defined as: The experimental results on test data sets illustrate a huge improvement over Euclidean distance. It's also observed that the exponential structure of the defined distance functions is very effective in dividing similar flows into separate clusters.
For the experiments, we employed X-Means clustering algorithm [76] as it achieved the best performance of evaluated clustering algorithms during the preliminary experiments. In addition to the clustering algorithm, choosing the proper parameters is of high importance. After several round of experiments, we decided to configure the X-Means clustering algorithm with the following parameters: • Minimum number of clusters: 5 • Maximum number of clusters: 20 • Cluster update interval (λ): 3600000 ms

Performance Analysis
The proposed framework does not impose any performance constrains on the hybrid detection system, and we can always achieve a better performance by applying more efficient algorithms. However, to have a better understanding of a system's performance limitation and bottlenecks, we analyze the performance of the implemented detector employed for the evaluation. In our analysis, we do not consider any of the employed commercial products

Summar y
In this chapter, we introduced a Multi-layer Intrusion Detection System (MIDS) that overcomes the main shortcomings of the existing IDSs.
As one of our main contributions, we proposed an online traffic classification method, in which the unigram payload distribution model is applied to extract the required set of features. Thereafter the J48 decision tree is employed to classify the network applications based on the unigram features.
Having a detail analysis of application signatures, we observed that the signatures are present in some designated positions in the payload, and it is important to place more weight on the features that appear in these more important positions. This is achieved through a weighted scheme over the features. However, finding the appropriate weights is a challenging task. To this end, we employ a genetic algorithm based scheme to find the weights.
We also proposed a new hybrid network intrusion detection framework, com-

NSL-KDD Data Set
Conducting a thorough analysis of the recent research trend in anomaly detection, one will encounter several machine learning methods reported to have a very high detection rate of 98% while keeping the false alarm rate at 1% [40]. However, when we look at the state of the art IDS solutions and commercial tools, there are a few products using anomaly detection approaches.
Practitioners still believe that it is not a mature technology yet. To find the  [41], which is widely used as one of the few publicly available data sets for network-based anomaly detection systems. In this section we perform a set of experiments to show the existing deficiencies in KDD.

Redundant Records
One of the most important deficiencies in the KDD data set is the huge number of redundant records, which causes the learning algorithms to be biased towards the frequent records, and thus prevent them from learning infrequent records, which are usually more harmful to networks such as U2R and R2L attacks. In addition, the existence of these repeated records in the test set will cause the evaluation results to be biased by the methods which have better detection rates on the frequent records. While doing this process, we encountered two invalid records in the KDD test set, number 136,489 and 136,497. These two records contain an invalid value, ICMP, as their service feature. Therefore, we removed them from the KDD test set.

Level of Difficulty
The typical approach for performing anomaly detection using the KDD data set is to employ a customized machine learning algorithm to learn the general behavior of the data set in order to be able to differentiate between normal and malicious activities. For this purpose, the data set is divided into test and training segments, where the learner is trained using the training portion of the data set and is then evaluated for its efficiency on the test portion.

96
Many researchers within the general field of machine learning have attempted to devise complex learners to optimize accuracy and detection rate over the KDD'99 data set. In a similar approach, we have selected seven widely used machine learning techniques, namely J48 decision tree learning [69], Naive Bayes [77], NBTree [78], Random Forest [79], Random Tree [80], Multi-layer Perceptron [81], and Support Vector Machine (SVM) [82] from the Weka [68] collection to learn the overall behavior of the KDD'99 data set. For the experiments, we applied Weka's default values as the input parameters of these methods.
Investigating the existing papers on the anomaly detection which have used the KDD data set, we found that there are two common approaches to apply KDD. In the first, KDD'99 training portion is employed for sampling both the train and test sets. However, in the second approach, the training samples are randomly collected from the KDD train set, while the samples for testing are arbitrarily selected from the KDD test set.
In order to perform our experiments, we randomly created three smaller subsets of the KDD train set each of which included fifty thousand records of information. Each of the learners were trained over the created train sets.
We then employed the 21 learned machines (7 learners, each trained 3 times) to label the records of the entire KDD train and test sets, which provides us with 21 predicated labels for each record. Further, we annotated each record of the data set with a #successfulPrediction value, which was initialized to zero. Now, since the KDD data set provides the correct label for each record, if a match was found. Through this process, we calculated the number of learners that were able to correctly label that given record. The highest value for #successfulPrediction is 21, which conveys the fact that all learners were able to correctly predict the label of that record.

Our Solution
To solve the issues mentioned in the previous section, we first removed all the redundant records in both train and test sets. Furthermore, to create a more challenging subset of the KDD data set, we randomly sampled records from the #successfulPrediction value groups shown in The generated data sets, KDDTrain + and KDDTest + , included 125,973 and 22,544 records, respectively. Furthermore, one more test set was generated As can be seen in Figure 5.3, the accuracy rate of the classifiers on KDDTest is relatively high. This shows that the original KDD test set is skewed and unproportionately distributed, which makes it unsuitable for testing networkbased anomaly detection classifiers. The results of the accuracy and performance of learning machines on the KDD'99 data set are hence unreliable and cannot be used as good indicators of the ability of the classifier to serve as a discriminative tool in network-based anomaly detection. On the con-trary, KDDTest + and KDDTest −21 test set provide more accurate information about the capability of the classifiers. As an example, classification of SVM on KDDTest is 65.01% which is quite poor compared to other learning approaches. However, SVM is the only learning technique whose performance is improved on KDDTest + . Analyzing both test sets, we found that SVM wrongly detects one of the most frequent records in KDDTest, which highly affects its detection performance. In contrast, in KDDTest + since this record is only occurred once, it does not have any effects on the classification rate of SVM, and provides better evaluation of learning methods.
The new version of KDD data set, NSL-KDD is publicly available for researchers through our website 1 . Although, the data set still suffers from some of the problems discussed by McHugh [45] and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.

Benchmark Data Sets
Due to the criticism of existing data sets and also privacy issues of employing real traffic, preparing a data set has become one of the biggest challenges in the area of intrusion detection. Although applying NSL-KDD data set as an interim solution for the preliminary results of this research was quite helpful, the results were not fully reliable due to the inherent issues of KDD data set. As a result, we decided to generate a benchmark data set in a testbed environment. In order to simulate a real network environment, we employ real machines with various operating systems. We then analyzed real traces to create profiles for agents that simulate real traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. Having generated normal traffic in our testbed, we carried out various state-of-the-art multi-step attacks to represents current cyber-threads that organizations are dealing with.
The main advantages of our generated benchmark data set are listed in the following: • Realistic Traffic: It is determined that the insertion of post-capture traces and merging them into the real traffic will cause some inconsistencies in network packet parameter such as TTL and sequence numbers. To prevent these issues, both normal and malicious traffic are generated using physical devices.
• Labeled Data Set: Having a labeled data set plays an important role in the evaluation of detection systems. As a result, we put a lot of effort to control our testbed environment to distinguish anomalous activities from normal behavior.
• Total Traffic Capture: Intrusion detection systems require various types of information to provide an optimal result. Thus, it is very 104

Network Architecture
As illustrated in Figure 5.6, our testbed network consists of 21 workstations divided into four distinct VLANs. This will help us reduce the broadcast domain for each workstation. All workstations run on select versions of the Windows operating system which is listed in Table 5.5. The reason behind this selection is to be able to apply known exploits to attack the machines.  Table 5.6 specifies more details regarding our servers and the various service providers installed on them.
The sixth LAN, whose traffic is not captured, enabled us to conduct nondisruptive monitoring and maintenance tasks such as loading applications, tuning certain services among others.
A single layer 3 switch is utilized to provide the required layer 2 and 3 switching and also, the required mirroring of traffic. All connections are explicitly set at 10 Mbps. This was seen as a required setting for the networked devices to operate effectively while keeping the maximum throughput well below the maximum switching capacity. This measure was taken to reduce the probability of packets being dropped by the switch and also capturing devices.
An Ethernet tap (running at 100 Mbps) was efficiently employed to transmit the mirrored traffic to multiple devices without any processing overhead or disruption. These devices provided the means for redundant capturing (e.g. tcpdump), alert generation through various Intrusion Detection Systems (IDS) (e.g. Snort), IDS management systems (e.g. QRadar, OSSIM), and visualization (e.g. ntop).

Data Generation
In order to generate realistic background traffic, we captured and analyzed four weeks worth of network activity associated with Information Security Center of Excellence (ISCX). This was to initially determine the composition of the traffic in terms of applications used and protocols utilized. Figure 5.7 depicts the arrangement of protocols and applications used throughout the capturing period for our center.  To fully simulate the behavior of a single users with an agent, we analyzed the behavior of a real user on specific applications namely, HTTP, SMTP, POP3, IMAP, SSH, and FTP. We then extracted the data distribution of different features such as request intervals and application payload length.

Centre's Traffic Composition
However, by analyzing the extracted information, we failed to observer a certain statistical distribution over the data considering the following distributions: a) Normal; b) Beta; c) Weibull; d) Erlang; e) Triangular; f) Gamma; g) Exponential; h) Uniform; i) Lognormal.
Moreover, even the behavior of a single user varies during the weekdays.
These outcomes are inline with previous research that have suggested more complex distributions are required to model HTTP activity [83].
To prevent these issues, we applied an inverse transformation method [84] which is proven to be more accurate compared to applying well-known distributions.
Having prepared the required infrastructure, services and simulation agents, period. The number of flows per seconds is depicted in Figure 5.8, and the overall statistics for our data set is illustrated in Figure 5

Attack Scenarios
Since the generated data set is intended to be used in the area of intrusion detection, we aim to provide a diverse set of attack scenarios representing state-of-the-art multi-step attacks being conducted by malicious hackers.
Performed on a separate day, each attack scenario includes the following steps to simulate the real-world attacks; It is in the intent of every attacker to move through a network as quietly as possible, raising the least number of alarms and leaving minimum tracks behind. Editing or deleting system or application level log files is often a useful mechanism in avoiding detection by system administrators. In the following we will explain four attack scenarios conducted in our data set based on the aforementioned steps. All four attack scenarios have been carefully designed based on the aforementioned steps but not all attack scenarios necessarily constitute all five. In order to be as realistic as possible, later attack scenarios are based on the results of earlier attacks making them sophisticated, powerful, and harder to detect.

Scenario 1: Infiltrating the network from the inside
Many of today's networks are built on what is called the eggshell principle: hard on the outside and soft on the inside. This means that if an attacker gains access to a host on the inside, she can then use the compromised host as a pivot to attack systems not previously accessible via the Internet such as a local intranet server or a domain controller.
Our attack scenario starts by gathering information about our target including network IP ranges, nameservers, mail servers and user email accounts. This is achieved by querying the DNS for resource records using network administrative tools like nslookup and dig.
Having done the initial reconnaissance, we found out the only system on our target network exposed to the Internet is a NAT server. This makes internal users of our target network inaccessible from outside since the NAT server acts as a firewall, dropping any initial connections from outside the network. This is where conventional attacking techniques are not useful and we are forced into using client side attacks. Based on the nslookup output we can start to enumerate through the mail server guessing potential email addresses that we require to penetrate into the system.
We will be using the Adobe Reader util.printf() buffer overflow vulnerability as a starting point for our scenario. An attacker can exploit this vulnerability to execute arbitrary code with the privileges of the user running the application. We create a malicious PDF file using Metasploit and embed a Meterpreter reverse TCP shell on port 5555 inside it. The PDF file is attached to a system upgrade email and sent on behalf of admin@t3lab.com to all 21 users of the testbed. We have also set up a listener on port 5555 to capture the reverse connection on our attacking machine. Clicking on the file opens Adobe but shows a grey window that never reveals a PDF but instead makes a reverse TCP connection back to our attacking computer listening on port 5555. command and bots are ordered to quit and to disconnect.

Scenario 4: Dictionary Attack against SSH
Brute force attacks are very common against networks as they tend to break into accounts with weak username and password combinations. Our final scenario has also been designed with the goal of acquiring an SSH account by running a dictionary brute force attack against the t3lab.com server. We use brutessh as the main tool for this scenario as it can be easily configured to use our custom made dictionary list. The dictionary is composed of over 5000 alphanumerical entries of varying length. We run the attack for a 30 minute time period resulting in a sudo user account credentials being returned. The user credentials are used to login to the server and is used to download the 118 /etc/passwd and /etc/shadow files.

Capturing Traffic
In our network configuration, a single layer 3 switch (Omniswitch 6850) was Our preliminary test runs indicated no substantial effect on the overall quality of the date set. Consequently lower hardware specifications were required to capture, monitor and analyze the traffic, thus enabling the use of commodity and off-the-shelf hardware. As an example, we utilized a simple Ethernet tap to replicate the mirrored traffic, on-the-fly, for various monitoring devices.

Summary
In this chapter, we have provide two data sets to address some inherent problems in DARPA and KDD data sets which are widely used as the only publicly available data sets for network-based anomaly detection systems.

119
The first data set, which is called NSL-KDD, consists of selected records of the entire KDD data set. Although, the proposed data set still suffers from some of the problems discussed by McHugh and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.
As our next effort, we have generated a benchmark data set in a testbed environment. In order to simulate a real network environment, we employ real machines with various operating systems. We then analyzed real traces to create profiles for agents that simulate real traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. Having generated normal traffic in our testbed, we carried out various state-of-the-art multi-step attacks to represent current cyber-threads that organizations are dealing with.

Framework Evaluation
In this chapter, we provide the evaluation results of our proposed framework.
In order to have a detailed comparison of our hybrid system, we conduct the experiments in three phases. In the first phase, we focus on the Traffic Classification Module and provide a thorough analysis of the classification module using two benchmark data sets. A comprehensive performance analysis of Intrusion Detection Module is conducted in phase 2. This includes a detailed study on the performance of detection module using the prepared benchmark data set which was explained in Chapter 5. Having studied the performance of each module individually, in the last phase, we analyze the overall performance of our proposed hybrid system.

Phase 1: Evaluation of Traffic Classification Module
To evaluate our proposed method we prepared two data sets from various networks. The first data set is prepared using the traffic we capture in   For the experiments we applied the J48 decision tree classifier. In the first step we evaluated our classifier using the payload unigram features with equal weights. For the evaluation we employed 10-fold cross-validation to obtain a reliable result. We then applied the genetic algorithm technique to find the appropriate weights to obtain higher accuracy. The experiment took approximately five days to complete since the classifier model should be updated during each run of the fitness function which takes a few seconds.   The results presented in this paper show that the genetic algorithm is a promising method to find the appropriate weights for our unigram payload model. We also believe that the weighted unigram model can be applied as an effective intrusion detection method to find the malicious activities on the Internet.

Phase 2: Evaluation of Intrusion Detection Module
In this phase of our evaluation, we focus on the performance of the Intrusion Detection Module independently. This means that incoming traffic will not be separated on the Traffic Classification Module, and they all will be analyzed using the same general purpose intrusion detection systems.
As explained in Chapter 4, for the experiments we decided to employ the C4.5 decision tree algorithm [69] as our anomaly detection system. As the signature based detector, we chose Snort [5] because of its popularity and availability to researchers. Similar to any other intrusion detection solution, we conducted a log of fine tuning to enhance the performance of Snort.
As an example, since our FTP traffic was not encrypted, during the first experimental setup, Snort generated a lot of false positives warning that the FTP traffic is sent in clear text format. After several rounds of tuning, we succeeded in gaining a 43% reduction in the number of false positives generated by Snort.
In addition to Snort, we employed QRadar Rules [64] as our second source of signature-based detection system. Analyzing the extracted information from network packets and flows, QRadar Rules provided a very strong mechanism to discover malicious activities hidden in network communications.
As discussed in Chapter 5, due to the inherent issues of publicly available data sets, we decided to prepare our own benchmark data set in a testbed environment. Although the generated data set is prepared in a testbed environment, we believe it is a good representative of real traffic in our Information Security Center of Excellence (ISCX). In order to simulate a real network environment, we employed real machines with various operating systems. We then analyzed real traces to create profiles for agents that simulate real traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. Having generated normal traffic in our testbed, we carried out various state-of-the-art multi-step attacks to represents current cyber-threads that organizations are dealing with. Figure 6.3 and Table 6.4 illustrate the distribution of normal versus intrusive activities during the seven days of experiments.
As indicated in Table 6   impact on reducing the false positives on the hybrid detector as in many of the cases Snort and QRadar together had more support compared to the anomaly detector.

Analysis of Day 2:
The attack that is conducted on Day 2 includes four steps. The first step which is querying the DNS Servers for gathering the information is quite normal and cannot be detected by any detectors. Therefore, we did not tag any of the DNS network flows as an attack in our database. However, the next three steps should be definitely identified by detectors as intrusive activities.
Analyzing the behavior of signature-based detectors, we found out that Snort has a very high detection rate with an acceptable low rate of false alarms.
Further detailed analysis indicates that the breach of Adobe Reader buffer overflow is successfully detected by Snort as the proper signature has been already added to the known-attack database. Besides, Snort seems to have a very effective mechanism of detecting the traffic from IP/port scanners. However, the SQL injection attack which consists of 3 network flows are not detected by Snort.
On the contrary, as indicated in Table 6.6, QRadar shows a very poor per-132  has an acceptable impact on reducing the false alarm rates.

Analysis of Day 4:
Analyzing the results in Table 6.8, we see a very similar pattern with the previous day as most of the steps are conducted in the same way with two exceptions. First, since the communication between the bots is done using IRC protocol, this traffic looks very normal and none of the detectors are able to identify it. Second, since in the distributed denial of service attack (DDoS), the packets are forwarded from a large group of machines in different location, this type of attack is more difficult to identify.
As a result, QRadar performance is not as impressive as that of day 3.

Analysis of Day 5:
Similar to Day 1, we only focus on the false positives (FP) generated by each detector as there has been no attack conducted on Day 5 (see Table 6.9. During this day, Snort has generated 13,630 false alarms which is the highest number of the entire week amongst all detectors. Similarly, QRadar has generated 7,547 false positives which is a very  Analyzing the results in Table 6.11, we find that all detectors have shown a poor performance in detecting the intrusive SSH traffic. In addition, QRadar has also generated a high number of false positives and has dramatically affected the performance of the hybrid detector in terms of having an acceptable low number of false positives.
In summary, overall analysis of the detectors shows that the hybrid detector has the best performance of all detectors in terms of identifying the intrusive traffic. This is indicated in

Detection System
The purpose of this phase of evaluation is to measure the impact of applying application-based detectors on the overall performance of the system.
For the experiments, we employ Ax3soft Sax2 [85] as a signature-based detec- tor to analyze web traffic. Sax2 is a network-based intrusion detection system which is mainly designed to detect web-based intrusions such as CGI/WWW attacks and SQL injection. In addition, we employ C4.5 decision tree algorithm [69] as our anomaly detection system.
For the rest of the network traffic, we use the general purpose hybrid detector which is explained in the previous section. This detector is composed of Snort, QRadar and C4.5 detectors. Table 6.12 illustrates the distribution of web-based traffic during the 7 days of experiments.
As illustrated in Figure 6.6, network packets will be first sent to Packet Analysis Module in which they will be grouped as network flows, and each flow will be labeled by an application name. The output will be then forwarded to the Intrusion Detection Module for investigation of intrusive activities. Depending on the application type, flows will be either analyzed by web-based or general purpose detectors. Table 6.13 summarizes the application groups that are considered as web traffic for our experiments.
In the remainder of this section, we provide a detailed analysis on the performance of the hybrid detector in general mode (phase 2) vs. application  Table 6.14), out of 3,186 false positives generated by Snort only 4 of them was related to web traffic. Similarly, there are only 435 false positives by QRadar that are categorized as web traffic.

Analysis of Day 1: As indicated in
As a result, applying an application-base IDS on web traffic does not have a huge impact on the overall performance of the detector.

Analysis of Day 2:
As a large portion of the attacks during this day are not web-based such as Adobe Reader buffer overflow and port scanning, in terms of statistics we do not gain a lot applying the application-based IDS. However, as indicated in Table 6.15, the web-based IDS is able to detect the SQL injection attack which was missed by Snort in phase 2.

Analysis of Day 3:
There are about 676 web-based intrusive flows in day 3, which are related to the Slowloris stealthy denial of service attack. Out of 676 flows, only 159 and 299 flows were detected by Snort and QRadar, respectively in phase 2. However, as illustrated in Table 6.16, applying a  web-based IDS solution we were able to successfully detect 527 flows without a major change in the total number of false positives.

Analysis of Day 4:
Employing Sax2 to monitor the web traffic has a noticeable improvement in day 4 to detect the distributed denial of service attack against the Apache Web Server. As can be seen in Table 6.17, the new configuration in phase 3 has resulted in having 1,136 more flows successfully detected as intrusive while keeping the false alarm rate at the same level. Table 6.18, there is not a big difference on the performance of the detector during phase 2 and 3. This is mainly because there is no attack during this day, and web traffic are quite distinctive from attacks signatures.  Analysis of Day 6: Similar to Day 5, there is not a noticeable difference on the performance of the detector during phase 2 and 3 (see Table 6.19).

Analysis of Day 7:
Analyzing the results in Table 6.20, we find that there is no change in the detection rate of hybrid systems during phase 2 and 3.
The reason being that the attacks on Day 7 are conducted on SSH service which does not include any web traffic. There is only a slight change in the number of false positives as Sax2 performs better than QRadar on web traffic.
In conclusion, we have gained a slightly better performance employing applicationspecific detectors. During Day 2, the web-based detector was successful in detecting SQL injection attack. However, since there are only three flows 142  In contrast, as our web-based detector performed very well on detecting the distributed denial of service attack (DDoS), we observed a significant improvement on the detection rate during Day 4 ( Table 6.21).
Similarly, Table 6.22 illustrates the captured false alarm rate during the seven days of experiment. As can be seen in the table, although we have achieved a better detection rate applying application-based detectors, there is no impact on the false alarm rate of the system.

Summar y
In this chapter, we have evaluated the performance of our hybrid detector.
To have a detailed comparison of the proposed system, we have conducted the experiments in three phases.
In the first phase, we focused on the Traffic Classification Module, in which we achieved a high classification rate of 90.97% and 86.55% on the two applied data sets. A comprehensive performance analysis of the Intrusion Detection Module is conducted in phase 2. The results show that he hybrid detector has the best performance of all detectors by a high detection rate of 62.13% while keeping the false alarm rate as low as 1.62%.
Having studied the performance of each module individually, in the last phase, we analyzed the overall performance of our proposed hybrid system applying an application-based IDS to monitor web traffic. The conducted experiments show a significant success in detecting SQL injection and DDoS attacks against the web server.

Conclusions and Future Work
In this thesis, we have proposed an adaptive hybrid intrusion detection system to overcome the main shortcomings of the existing IDSs. The With regard to the hybrid intrusion detection, we have identified two main issues that highly affects the performance of the system. First, anomaly-based methods cannot achieve an outstanding performance without a comprehensive labeled and up-to-date training set with all different attack types, which is very costly and time-consuming to create if not impossible. Second, efficient and effective fusion of several detection technologies becomes a big challenge for building an operational hybrid intrusion detection system.
To solve the first issue we have proposed applying the idea of adaptive learning. To meet this goal we have defined learning time intervals, e.g. 1 day, at the end of which the anomaly-based detector will be trained by the two 146 most recent training sets. These training sets are the flows that are labeled by the hybrid detector in the previous intervals. Moreover, applying Dempster's rule of combination and with along a clustering-based quantification method, we have introduced an efficient fusing algorithm.
For the experiments, we have generated a benchmark data set in a testbed environment to overcome the inherent issues of publicly available data sets. In order to simulate a real network environment, we employ real machines with various operating systems. We then analyzed real traces to create profiles for agents that simulate real traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. Having generated normal traffic in our testbed, we carried out various state-of-the-art multi-step attacks to represents current cyber-threads that organizations are dealing with.
Having prepared our data set, we have conducted the experiments in three phases to have a detailed comparison of the proposed system.
In the first phase, we focused on the Traffic Classification Module, in which we achieved a high classification rate of 90.97% and 86.55% on the two applied data sets. A comprehensive performance analysis of Intrusion Detection Module is conducted in phase 2. The results show that he hybrid detector has the best performance of all detectors by a high detection rate of 62.13% while keeping the false alarm rate as low as 1.62%.
Having studied the performance of each module individually, in the last phase, we analyzed the overall performance of our proposed hybrid system applying an application-based IDS to monitor web traffic. The conducted experiments show a significant success in detecting SQL injection and DDoS attacks against the web server.

Future Work
The work performed in this thesis provides a basis for future research of intrusion detection systems in several areas. One area of future work is applying a broader range of features for anomaly detection. These features need to be flow-based and calculated in real-time to enable the detector to keep up with the existing gigabit networks. Moreover, customized machine learning methods should be devised to minimized the CPU and memory consumption of anomaly detectors. It is also beneficial to employ a powerful random sampling method to reduce the huge number of flows that is fed to the system as the train set.
One of the areas that needs a lot of improvement is the fusing algorithm.
Although Dempster's rule of combination is proved to be effective, assigning the probabilities are quite challenging. This can be done by either applying more features or utilizing a more efficient clustering technique.
In addition, as discussed in this thesis, one of the main advantages of applying Traffic Classification Module is to balance the load between different detectors. However, more than 50% of network traffic are Web-based, which impose a lot of pressure on the Web-based detector. This will also result in having more attack signatures which cause more delay. To deal with these 148 issue, the future trend would be separating the traffic based on the content or destination. For example, IIS and Apache Web Servers have their own vulnerabilities and can be analyzed separately. Similarly, some famous social networking web sites such as Facebook can be dealt separately as they might need their own set of signatures.
Also as future work, we are interested in applying more intrusion detection systems, including anomaly-based and signature-based, to analyze the effect of detector quantity on the overall performance of the hybrid system. ing sources of evidence that pertain to binary variables [75].
In order to derive the general binary form of Dempster's rule, we start with  where K is defined as: Proof. The above proposition can be proved by extension of Equations A.11, A.15, A.16, and A.17 through induction.
In order to provide an efficient mechanism for the fusion of intrusion detectors, we employ the general form of Dempster's rule of combination for binary variables (Theorem 2).