Using Artificial Intelligence Techniques For Intrusion Detection System

Along with the development and growth of the internet network, and the rapid expansion of World Wide Web and local network systems have changed the computing world in the last decade. Nowadays


ABSTRACT
Along with the development and growth of the internet network, and the rapid expansion of World Wide Web and local network systems have changed the computing world in the last decade.Nowadays, as more people make use of the internet, their computers and the valuable data in their computer system contain become more exposed to attackers.Therefore, there is an increasing need to protect computer and network from attacks and unauthorized access.Such that network intrusion classification and detection systems to prevent unlawful accesses.This work has taken the advantage of classification and detection abilities of Artificial Intelligent Techniques AITs algorithms to recognize intrusion(attack) and also detect new attacks.These algorithms are used to multi classifier and binary classifier for network intrusion and detect it, AITs such as unsupervised and supervised fuzzy clustering algorithms ( Fuzzy C-Mean FCM, Gustafson-Kessel GK, and Possibilistic C-Means PCM ), was applied to classify intrusion into 23 classes according to the subtype of attack.The same dataset classifies it into 5 classes according to the type of attacks (Normal, DoS, Probe, U2R, R2L).And also classifies this dataset into 2 classes (Normal, and Attack), one for normal traffic and another for attack, also these algorithms are used to detect intrusion.
Other techniques were used which are artificial neural network (ANN) represented by counter propagation neural network (CPN) which is hybrid learning (supervised and unsupervised) that is applied to classify intrusion into 23, 5 and 2 class(es) and used it to detect the network intrusions, and then we combined fuzzy cmean with two layers Kohonen layer and Grossberg layer for counter propagation neural network to produce the proposed approach or system that called it fuzzy counter propagation neural network (FCPN) were applied it to classify network intrusion into 23, 5 and 2 class(es) and detect the intrusion.DARPA 1999 (Defense Advanced Research Project Agency) dataset which is represented by Knowledge Discovery and Data mining (KDD) cup 99 dataset was used for both training and testing.This research evaluates the performance of the approaches that are used that obtained high classification and detection rate with low false alarm rate.The performance of the proposed approach FCPN is the best if it is compared with the other approaches that are used and with previous works.Finally, in this research comparisons are made between the results obtained from the application of these algorithms on this dataset and the FCPN is the best approach that is implemented into Laptop where, CPU 2.27GH and RAM are 2.00 GB.

General Introduction
Network security is fast becoming an absolute necessity to protect information contained in the computer systems world wide.And with the rapid expansion of computer networks during the past decade [1], and the network grows in size and complexity and computer services expansions, vulnerabilities within local area and wide area network has become mammoth albeit problematic.The problems occur due to the increasing number of intrusion tools and exploiting scripts which can entice anyone to launch an attack on any vulnerable machines.The attack can be launched in term of fast attack or slow attack.Fast attack can be defined as an attack that uses a large amount of packet or connection within a few seconds.Meanwhile, slow attack can be defined as an attack that takes a few minutes or a few hours to complete.Both of the attacks give a great impact to the network environment due to the security breach [2].The number of intrusion in computer networks has grown extensively, and many new hacking tools and intrusive methods have appeared which attackers are used [3].Intrusion detection techniques can be categorized into misuse detection and anomaly detection .
-Misuse detection uses the patterns of well-known attacks or vulnerable spots in the system to identify intrusions [4].Misuse detection is based on the knowledge of system vulnerabilities and known attack patterns.Misuse detection is concerned with finding intruders who are attempting to break into a system by exploiting some known vulnerability, ideally, a system security administrator should be aware of all the known vulnerabilities and eliminate them [5].-Anomaly detection attempts to determine whether can be flagged as intrusions.
There are three types of intrusion detection systems: Host-based Intrusion Detection System (HIDS), Network-based Intrusion Detection System (NIDS), and combination of both types (Hybrid Intrusion Detection System ) [6] and [4].

KDD Cup 99 Dataset
Since 1999, (Knowledge Discovery and Data Mining) KDD'99 has been the most wildly used dataset.The network data is distributed by MIT Lincoln Lab for DARPA [3][4].This dataset is built based on the data captured in the Department of DARPA'98 IDS evaluation program.DARPA'98 is about 4 gigabytes of compressed raw tcpdump data of 7 weeks of training set and two weeks of test data.It is important to note that the test data is not from the same probability distribution as the training data, and it includes specific attack types not in the training data which makes the task more realistic.The "10% KDD" datasets contain a total number of 23 training attack ,with additional 15 types in the test data only which contains 38 attacks in "Corrected KDD", recorded connection in KDD data are a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol.The KDD cup 99 dataset includes a set of 41 features derived for each connection and a label which specifies the status of connection records as either normal or specific attack type [7][8].Attack type falls into four main categories [4][9] and [10]:  Denial of Service(DOS) attacks, which prevent a computer from complying with legitimate requests by consuming its resources. Probe attack, which are scanning and polling activities that gather information on vulnerabilities for future attack. Remote-to-Local(R2L) attack, which are local non-authorized access attempts from a remote machine. User-to-Root(U2R) attack, which have the goal of obtaining illegal or nonauthorized super-user or root privileges.The total number of connection records in training dataset is kdd 10% dataset (494020) records .And the total number of connection records in testing dataset is kdd corrected dataset (311029) records.This dataset consists of symbolic and numeric values, all symbolic values were transformed into numeric values [11] such as three types of protocols (tcp, udp, icmp) and 68 types of services and 11 types of flag, each one takes value from [1..N] and then normalized all input data of 10%kdd dataset [12].

Preprocessing Dataset
From the KDD Cup 99 intrusion detection dataset, 41 features were derived to summarize each connection information.In order to train an architecture, several data of enumeration and normalization operations were necessary.As a first approach, symbolic variables in the dataset were enumerated and all variables were normalized.Thus, each instance of a symbolic feature was first mapped to sequential integer values.This dataset consists of symbolic and numeric values, all symbolic values were transformed into numeric values such as three types of protocols (tcp, udp, icmp) and 68 types of services in KDD cup 99 and 11 types of flag, each one takes value from [1..N] as described in table (2) and in figures ( 1) and ( 2) [14], and each numerical value in the dataset is normalized between 0.0 and 1.0 according to the following equation : Where, X is the numerical value, min is the minimum value for the attribute that x belongs to, and max is the maximum value for the attribute that x belongs to [15].

Performance Measures
The indicators were used to measure the accuracy of the IDS [16]: True positive(TP): classifying an intrusion as intrusion.The true positive rate is synonymous with detection rate, sensitivity and recall which are other terms often used in the literature.False positive(FP): incorrectly classifying normal data as an intrusion .Also is known as a false alarm.True negative(TN): correctly classifying normal data as normal.The true negative rate is also referring to specificity.False negative(FN): incorrectly classifying an intrusion as normal [17].
The performance metrics calculated from these are: And over all classification rate is also referred to as accuracy can be calculated as follows [18]

Clustering
We are living in a world full of data.Every day, people encounter a large amount of information and store or represent it as data, for further analysis and management.One of the vital means in dealing with these large data is to classify or group them into a set of categories or clusters.Clustering is the process of grouping a dataset in such a way that the similarity between data within a cluster is maximized, while the similarity between data of different clusters is minimized.Clustering or classification systems are either supervised or unsupervised, unsupervised clustering takes an unlabelled set of data and partition it into groups of examples, without additional knowledge.Supervised clustering , on the other hand, assumes that the class structure is already known.It takes a set of examples with class labels [19].

Fuzzy C-Means (Fcm)Algoritm
The most popular fuzzy clustering algorithm is fuzzy c-means (Bezdek).It is a data clustering technique, wherein each data point belongs to a cluster to some degree that is specified by a membership grad [20].It is based on minimization of the objective function as in equation ( 8) [21]: Where c and m are user-defined parameters and represent the number of clusters and fuzzification factors, respectively, N denotes the number of patterns, conventional FCM algorithm includes the following steps: 1. Initialize the cluster center V={v 1 ,…v i ,…v c }, or initialize the membership matrix ki  and, then calculate the centers.2. calculate the fuzzy membership ki  , using where , i= 1, …., n, j=1, … , c. 3. compute the fuzzy centers v i by using 4. Repeat steps ( 2) and ( 3) until the minimum J value is achieved.5. Finally, defuzzification is necessary to assign each data point to a specific cluster(i.e. by setting a data point to a cluster for which the degree of the membership is maximal).

Gustafson-Kessel(Gk) Algorithm
The Gustafson-kessel is an extension of the fuzzy c-means algorithm [22].It used mahalanobis distance.The objective function is: The various steps involved in the GK algorithm are given below [23] 5. Calculate the covariance matrices by using equations ( 13) and ( 14) 6. Calculate the distance norms by using equation ( 15): 7. Update ij  by using equation ( 16) ≤ ε, then stop End for

Possibilistic C-Meams (Pcm) Algorithm
The possibilistic c-means (PCM) algorithm is based on a modification of the objective function of (FCM).The objective function is: where, dij is given by i the steps of ( PCM) algorithm are seen below [24]: 1. Initialize the cluster center V= {v 1 ,…v i ,…v c }, or initialize the membership matrix ki  and, then calculate the centers.2. calculate the fuzzy membership ki  by using Where i  is the suitable positive number .3. compute the fuzzy centers v i by using 4. Repeat steps (2) and (3) until the minimum J value is achieved.

Supervised Fuzzy C-Means(Sfcm)Algoritm
Class labels always provide a useful guidance during training process, as being done in all the learning methods.Hence, it becomes necessary to use the labeled samples in training phase and unlabeled samples in testing phase to improve the performance of FCM.This idea led to the development of a new algorithm called 'Supervised Fuzzy C-Means' algorithm, a slight modification of FCM(Hong-Bin).The SFCM clustering technique aims to develop classifiers that can utilize both labeled and unlabeled samples.The objective function of the SFCM is defined as: ik  Membership degree of k th data point belonging to the i th cluster.
ik f Membership degree of k th labeled sample belonging to the i th cluster.The coefficient ' a ' denotes scaling factor and ' m ' denotes the fuzzy coefficient.The role of ' a ' is to maintain a balance between supervised and unsupervised component within the optimization mechanism and parameter ' m ' controls the amount of fuzziness in the classification.The a =L/n, L denoting the size of labeled samples [25].The steps in this algorithm are as follows: 1. Fix the number of clusters c.Initialize membership values of matrix F of size c  n with 0 or 1 in accordance with class labels.Initialize fuzzy partition matrix ) 0 ( U with random values between 0 and 1. 2. Start the iterative procedure and set the iteration count, t=1. 3. Calculate the clusters (prototype) of the clusters by using equation (21) given below 4. Calculate the distance, ) (t ik d , between i th cluster center and k th dataset.The distance measure used is Euclidean Distance as given by equation (22).
5. Update the fuzzy partition matrix, ) 1 (  t U , for the next iteration as follows: ≤ ε (ε being iterative accuracy), stop the iteration and output v (cluster center), U (fuzzy matrix); else increment the iteration count, and return to step 3.

Supervised Gustafson-Kessel (Sgk) Algorithm
At the same of algorithm of the FCM that is modified by (Hong-Bin) to SFCM was explained above in section(7.1).We have modified the unsupervised Gustafson-Kessel (GK) algorithm to supervised Gustafson-Kessel (SGK) by adding two parameters ' a ' and ' f ' to equation fuzzy membership ij  in equation number (16) to be as shown in the equation (24) with the same steps of algorithm were used.Where,

Supervised Possibilistic C-Meams (Pcm) Algorithm
The same as algorithm of the FCM that is modified by (Hong-Bin) to SFCM as explained above in section(7.1),We have modified the unsupervised possibilistic cmeans (PCM) algorithm to a supervised possibilistic c-means (SPCM) by adding two parameters ' a ' and ' f ' to equation fuzzy membership ij  in the equation ( 18) to be as shown in (25).
Where, dij is given by i v j x .

Counterpropagation Network
The CP network was first developed by Hecht-Nielsen [26], and consisted of combining the Kohonen network with a Grossberg layer [27].The general form of the CP network can be seen in figure (1).The input nodes of the Kohonen layer are connected to the Kohonen neurons by weights ij w , while the Kohonen outputs are connected to the Grossberg layer by the connecting weights ij v [28].The learning of CPN can be split into two stages, unsupervised and supervised.Unsupervised learning is used during the first stage for clustering the input vectors to separate distinct sets of input data.During the second stage of learning, the weight vector between the Kohonen and Grossberg layers are adjusted by supervised learning to reduce the errors between the CPN outputs and the corresponding desired targets.During the First stage, the distances between the input vector composed of input nodes and all of the j Kohonen nodes with n dimensions are determined to compete for the winner.The training steps of the counter propagation network (CPN) [29] and [30] as follows: 1.A vector pair ) , ( y x of the training set, is selected in random.2. Normalize the input vector x to obtain x by the equation ( 26): 3. the weights are obtained as equation (27) x w   …( 27) namely, the weight vector of the wining Kohonen neuron( the j th neuron in the Kohonen layer) equals(best approximates) the input vector.4. In the hidden competitive layer, the distance between the weight vector and the current input vector is calculated for each hidden neuron j according to the equation( 28) where, k is the number of the hidden neurons and ij w is the weight of the synapse that joins the i th neuron of the input layer with the j th neuron of the Kohonen layer. 5.The winner neuron W of the Kohonen layer is identified as the neuron with the minimum distance value j D .
6.The synaptic weights between the winner neuron W and all neuron of the input layer are adjusted according to the equation ( 29) where  coefficient is known as the Kohonen learning rate.7. The weight between Kohonen layer and Grossberg layer ij v obtained at the same way to obtain ij w weight between input layer and Kohonen layer as in equation ( 27) above.8. Obviously, only weights from non-zero Kohonen neurons (non-zero Grossberg layer inputs) are adjusted.Weight adjustment as follows: i T being the desired outputs(targets),  is small number that represented the learning rate of Grossberg layer.9.A major asset of the Grossberg layer is the ease of its training.First the output of the Grossberg layer is calculated as in equation ( 31) k being the Kohonen layer outputs and ij v denoting the Grossberg layer weights.

Hybrid Counterpropagation Network With Fcm
Counterpropagation developed by Hecht-Nielsen can be generalized to design a Fuzzy counterpropagation network, by extending the two layers (Kohonen's layer and Grossberg's layer) to a fuzzy counterpropagation network.The basic objective of this network is to cluster the input patterns, in each a way that total Euledian distance between each pattern and its nearest cluster centroid is minimum in Kohonen layer, and we take the minimum distance output for each winner neuron in Kohonen layer and maximum output neuron in Grossberg layer.A novel method is proposed in this research by using fuzzy c-means algorithm in Grossberg layer which is called FCPN, and steps (4 and 5) in the following algorithm were used to implement the above algorithm which has been applied by using kdd 99 dataset.The algorithm for fuzzy counterpropagation is shown below.
1.A vector pair ) , ( y x of the training set, is selected randomly.It is normalized and used as an input to obtain the weight by the equation ( 26) and ( 27   where, i z is the fuzzy scaling function given by: , ( . The scaling function i z depends on the fuzzy generator m which is a real number greater than 1.

Compute the membership between the winner neuron and Grossberg layer based
on the distance measure ) , ( . And update the weight associated with each neuron.The weight updation is performed in accordance to the following rule.

 
where i z is the fuzzy scaling function given by: and ) , ( . The scaling function i z depends on the fuzzy generator m which is a real number greater than 1.

Calculate the output of Grossberg as equation (31).
The CPN and FCPN used for the classification and detection network intrusion.These two methods (CPN, and FCPN) performed binary classifier and multi classifier for the dataset.Figure (4) shows the system designed of these two methodes for binary classifying.The system used the input dataset (normal and attack) that contains 41 features, which are equal to nodes in the input layer.While, in the Kohonen or clustering layer, there are 2 Kohonen nodes, one for normal and the other for attack.Finally the number of the output node in the output layer is 2 according to the target output.
Figure (5) shows the system architecture of CPN and FCPN for multi classifier.The system uses the same input dataset, so, there is 41 nodes in input layer and 5 nodes in Kohonen layer.The last layer consists of 5 output nodes in output layer, one for normal and the others for four types of attack "DoS, Probe, U2R, and R2L". Figure (6) shows the system architecture of CPN, and FCPN to classify this dataset into 23 classes one for normal and 22 for subtype of attacks, node number of clustering layer and output layer is 23 nodes.

1) Experiment 1
We applied fuzzy clustering algorithms (FCM, PCM, GK), (SFCM, SPCM, SGK) and CPN, and FCPNN on the 10%kdd dataset that contains (494020) records.In the first experiment, we applied these algorithms to classify this dataset into 23 classes or clusters.One for normal and the reset classes for the types of attacks { Dos (pod, land, back, neptune, teardrop and smurf), probe (ipsweep, portsweep, satan and nmap), U2R (buffer _overflow, loadmodule, perl and rootkit), R2L(ftp_write, guess_passwd, imap, multihop, phf, spy, Warezclient and warezmater )}.Table (3) shows the clustering results after training these fuzzy clustering algorithms, CPN, and FCPN.The results of classification rate obtained is 100%, but these fuzzy algorithms took different iterations and times.The "corrected KDD file" dataset that contains (311029) records were used in testing state on the fuzzy clustering algorithms (FCM, GK, PCM), and (SFCM, SGK, SPCM).Table (5) shows the comparisons between supervised(SFCM, SGK, SPCM) and unsupervised (FCM, GK, PCM) fuzzy clustering algorithms for 23 classes with over all detection rate that obtained for FCM is equal (91.659) and for SFCM is equal (94.030), and detection rate that obtained for GK is equal (83.021) and for SGK is equal (92.672), and the detection rate that obtained for PCM is equal (94.284) and for SPCM is equal (95.971).

2) Experiment 2
The same dataset (494020) records were used after preprocessing it in the training state to classify it into 5 classes, Table (6) shows the results of experiment for ( FCM, GK, PCM ), (SFCM, SGK, SPCM ), CPN, and FCPN.Table (7) shows the results after applying these fuzzy clustering algorithms ( FCM, GK, PCM ), (SFCM, SGK, SPCM ), CPN, and FCPN to classify dataset into 5 classes when fuzzification member value equals to (1.011).In this table, SPCM was classified dataset faster than the other algorithms, that's because SPCM takes number less of iterations and time than the other algorithms, but FCM takes times greater than the other algorithms.Classification rate that is obtained from all these algorithms is 100% in training stage.In testing state the 'corrected kdd ' file that contains (311029) records are used in the fuzzy clustering algorithms (FCM, GK, PCM), (SFCM, SGK, SPCM), and CPN, FCPN algorithms.The comparisons between unsupervised and supervised fuzzy clustering algorithms (FCM, GK, PCM), (SFCM, SGK, SPCM), and CPN, FCPN for 5 classes with over all detection rate that obtained for FCM and SFCM is equal to (98.543), and detection rate that obtained for GK is equal to (80.836) and for SGK is equal to (81.155), and the detection rate that is obtained for PCM is equal to (99.955) and for SPCM and CPN is equal to (99.977), while FCPN got higher detection rate is equal to (100%) .Table (8) shows the comparison between (FCM, GK, PCM), (SFCM, SGK, SPCM), CPN, and FCPN algorithms.

3) Experiment 3
The same dataset (494020) records were also used after preprocessing it in the training state to classify it into 2 classes, Table (9) shows the results of the experiment for ( FCM, GK, PCM ), (SFCM, SGK, SPCM ), CPN, and FCPN algorithms.While, table (10) shows the results after applying these fuzzy clustering algorithms ( FCM, GK, PCM ), (SFCM, SGK, SPCM ), and CPN, FCPN to classify dataset into 2 classes.As shown in this table, SPCM algorithm was classified dataset faster than the other algorithms, because SPCM takes number of iterations and time less than the other algorithms, but CPN takes time greater than the other algorithms.Classification rate that is obtained from all these algorithms is 100%.The 'corrected kdd ' file that contains (311029) records were used in the testing state for fuzzy clustering algorithms (FCM, GK, PCM), (SFCM, SGK, SPCM), and CPN, FCPN, table (11) shows the testing results after applying these algorithms.Table (12) shows the comparison between (FCM, GK, PCM), (SFCM, SGK, SPCM), CPN, and FCPN algorithms in the testing state.This table shows SPCM is the faster algorithm, because it takes less time than the other algorithms.

Conclusions
The main conclusions of this work are as follows: 1. Classification or accuracy improvement: the applied approaches based on unsupervised and supervised fuzzy clustering algorithms (FCM, GK, PCM, SFCM, SGK, SPCM), and CPN , and hybrid fuzzy with CPN that is called FCPN improved a high classification or accuracy rate.2. Reduce training time: the intrusion detection mechanisms which are used took a few time for training dataset as compared to the other approaches.3. Reduce computational overhead: the approaches which were used in this work reduce memory and computational overhead during the training and testing process.Because these approaches took less number of iterations and few time for execution.4. Architectural framework improvement: the application of these approaches made the intrusion analysis engine more simple and efficient.5. Detection improvement: these approaches obtained a high detection rate and low false alarm for KDD CUP 99 dataset.It has been found that FCPNN algorithm is the best approach.6. IDS performance: To enhance the performance of IDS, this work proposes supervised methods such as (SGK, and SPCM), and also proposes FCPN method that satisfies the best performance.

Figure 1 .
Figure 1.The Original Data of KDD Cup 99

from the input pattern k x to each of the competing neurons i w . 3 . 4 .
Compute the membership of the winner neuron based on the distance measure ) Update the weight associated with each neuron.The weight updation is performed in accordance to the following rule.

Using Artificial Intelligence Techniques For Intrusion Detection System Manar Y. Ahmed Bayda I. Khaleel College
of Computer Sciences and Mathematics University of Mosul

Table ( 2
). Numerical Values of KDD Dataset Features

Table ( 4
) shows the result of the first experiment that using (FCM, PCM,GK), (SFCM,SPCM,GK), CPN, and FCPN clustering for 23 classes.As shown in this table, SPCM was classified dataset faster than the other algorithms, because SPCM takes a number of iterations and time less than the other algorithms, but CPN takes time greater than the other algorithms.