A novel two-phase cycle algorithm for effective cyber intrusion detection in edge computing

Edge computing extends traditional cloud services to the edge of the network, closer to users, and is suitable for network services with low latency requirements. With the rise of edge computing, its security issues have also received increasing attention. In this paper, a novel two-phase cycle algorithm is proposed for effective cyber intrusion detection in edge computing based on a multi-objective genetic algorithm (MOGA) and modified back-propagation neural network (MBPNN), namely TPC-MOGA-MBPNN. In the first phase, the MOGA is employed to build a multi-objective optimization model that tries to find the Pareto optimal parameter set for MBPNN. The Pareto optimal parameter set is applied for simultaneous minimization of the average false positive rate (Avg FPR), mean squared error (MSE) and negative average true positive rate (Avg TPR) in the dataset. In the second phase, some MBPNNs are created based on the parameter set obtained by MOGA and are trained to search for a more optimal parameter set locally. The parameter set obtained in the second phase is used as the input of the first phase, and the training process is repeated until the termination criteria are reached. A benchmark dataset, KDD cup 1999, is used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover a pool of MBPNN-based solutions. Combining these MBPNN solutions can significantly improve detection performance, and a GA is used to find the optimal MBPNN combination. The results show that the proposed approach achieves an accuracy of 98.81% and a detection rate of 98.23% and outperform most systems of previous works found in the literature. In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives.


INTRODUCTION
With the advance of edge computing, the number of edge services running on mobile devices grows explosively [1]. Edge computing can reduce processing times and improve application performance, but it also brings some new challenges. Edge computing is driving the need for more bandwidth across the network, 5G is introduced for offering sufficient communication bandwidth [2]. Privacy is a serious issue in edge computing as users data is collected, processed, transmitted, and shared over edge nodes, so it is becoming a necessity to secure user privacy before these data are integrated together [3]. The explosive growth and variety of information available on edge nodes frequently overwhelm users, recommender systems are a promising way for users to quickly find the valuable information that they are interested in from massive data [4,5,6]. In addition to these challenges, edge computing introduces a scale of cyber security challenges regular data center operators may not be used to dealing with. An intrusion, which is one of the main cyber security challenges, is any set of actions intended to compromise the confidentiality, integrity, or availability of a resource [7]. Cyber intrusions are prevalent, increasingly sophisticated, and are adept at hiding from detection [8]. To counteract this ever-evolving threat, Network-based Intrusion Detection System (NIDS) has been considered to be one of the most promising methods.
Intrusion detection techniques have become a significant topic of research in recent years. Many researchers propose different algorithms in different categories, such as Artificial Neural Networks (ANNs) [9], SVM [10], k-nearest neighbor [11], random forest [12], deep learning approaches [13], Bayesian approaches [14] , decision trees [15]. As a result, the performance of detection techniques is getting better and stronger.
Artificial Neural Network (ANN), the computing paradigm that mimics the way neuron system of human brain work is widely used in cyber intrusion detection. Hodo E, Bellekens X, et al [16]. presents a multi-level perceptron, a type of supervised Artificial Neural Network (ANN) to detect Distributed Denial of Service (DDoS/DoS) attacks. The experimental results demonstrate 99.4% accuracy and can successfully detect various DDoS/DoS attacks. Chuan-Long Y, Yue-Fei Z, et al [17]. propose a deep learning approach for intrusion detection using recurrent neural networks (RNN-IDS). The experimental results show that RNN-IDS is suitable for modeling a classification model with high accuracy. Sheraz Naseer, Yasir Saleem [18] propose a deep convolutional neural network (DCNN) based intrusion detection system (IDS). The experimental results of proposed DCNN based IDS shows promising results for real world application in anomaly detection systems. Benmessahel I , Xie K , et al [19]. present an evolutionary neural network (ENN) which is a combination of ANN and evolutionary algorithm (EA), and experiments shows that this approach is effective for cyber intrusion detection. A. Arul Anitha, L. Arockiam [20] propose an Artificial Neural Network based IDS (ANNIDS) technique based on Multilayer Perceptron (MLP) to detect the attacks initiated by the destination oriented direct acyclic graph information solicitation (DIS) attack and version attack in IoT environment. Zichao Sun, Peilin Lyu [21] uses the LSTM neural network with long and short memory function to train the KDD99 dataset, and identify the DOS according to the trained model. This is a research process of the planned adjustment for the hyperparameters to find the optimal solution after processing the data. Shenfield, Day and Ayesh [22] present a novel approach to detecting malicious cyber traffic using artificial neural networks suitable for use in deep packet inspection based IDS. Results presented show that this novel classification approach is capable of detecting shell code with extremely high accuracy and minimal numbers of false identifications. Amruta and Talha [23] present a Denial of Service Attack Detection system using ANN for wired LANs. The proposed ANN classifier gives ninety six percent accuracy for their training data-set. Most of the systems have produced promising classification accuracy.
ANN has the ability to approximate an arbitrary function mapping and learn from examples much like the human brain. In many cases, ANN surpasses the conventional statistical method for the classification task in various fields of applications [24]. However, designing an ANN is a difficult process. Its performance depends on the optimization of various design parameters such as choosing an optimal number of hidden nodes, suitable learning algorithm, learning rate and initial value of the weights, and some objectives are conflicting each other such as accuracy and complexity. Therefore, a multi-objective optimization (MOO) is considered to be a more realistic approach to the design of ANN, compared with the single-objective approach [25]. In addition, ANN has some shortcomings such as slow convergence speed, entrapment in local optimum, unstable network structure etc [26]. In contrast, the genetic algorithm(GA) exhibits its characteristic of global search and quick convergence ability, and is the most widely used technique in data mining and knowledge discovery [27]. On the other hand the method of Pareto-optimality has been widely used in MOO [28]. It offers a pool of non-inferior individual solutions and ensemble solutions instead of a single optimum, accordingly provides more degrees of freedom in selection of proper solutions. The multi-objective genetic algorithm (MOGA) [29] and the non-dominated sorting genetic algorithm (NSGA II [30,31], NSGA III [32]) are two examples of GA based MOO which apply the concept of Pareto-optimality.
Some multi-objective genetic algorithms (MOGA) based approach is proposed for effective intrusion detection based on benchmark datasets. Elhag S , Altalhi A , et al [33]. propose a multi-objective evolutionary fuzzy system which can be trained using different metrics. The system obtains more accurate solutions, and allows the final user to decide which solution is better suited for the current network characteristics. M. Stehlík, A. Saleh, et al [34]. propose multi-objective evolutionary algorithms (NSGA-II and SPEA2 [35]) for intrusion detection parametrization, which focus on the impact of an evolutionary algorithm (and its parameters) on the optimality of found solutions, the speed of convergence and the number of evaluations. Kumar G., Kumar K[27] proposes a three phase MOGA based Micro Genetic Al-gorithm2 (AMGA2) [36] which considers conflicting objectives simultaneously like detection rate of each attack type, error rate, accuracy, diversity etc. In first phase, a Pareto front of non-inferior individual solutions is approximated. In second phase, entire solution set is further refined, and another improved Pareto front of ensemble solutions over that of individual solutions is approximated. In third phase, a combined method like majority voting method is used to fuse the predictions of individual solutions for determining prediction of ensemble solution. The experiments conducted on two benchmark datasets demonstrate that the proposed approach can discover individual solutions and ensemble solutions for intrusion detection.
This paper aims to develop a novel two-phase cycle training algorithm for intrusion detection. The MOGA based approach is used to find the Pareto optimal parameter set for the neural networks. A MBPNN set is created based on Pareto optimal parameter set and trained to find more optimal parameter set locally. The proposed approach can discover a pool of MBPNN based solutions to detect the intrusions accurately.
The rest of this paper is organized as follows: Section II presents an overview of the proposed methodology. Experimental results and discussion is presented in Section III. Finally, the concluding remarks of the study are provided in Section IV.

The Proposed Approach
The TPC-MOGA-MBPNN includes training session, testing session and the combined method, and them are described below.

The training session
The training session as illustrated in Figure 1 is implemented by TPC-MOGA-MBPNN. In the first phase, a MOGA tries to find the Pareto optimal parameter set for the neural networks. The MOGA considers Avg TPR, Avg FPR and MSE on training dataset as the objectives to be minimized. Meanwhile, the weights, biases and gain factors of neural networks become the genotype to be evolved simultaneously by MOGA. In this phase we use the global search capability of MOGA to search for the initial parameter values of neural network, thereby avoid being trapped in local minima. In the second phase, neural network set is trained to find more optimal parameter set locally. The nondominated parameter set obtained from the first phase is considered as the input archive. A neural network set is generated by selecting excellent parameter set from the input archive, then back propagation is used to update the weights for the neurons, in order to bring the error function to a minimum. In this phase we use the local search and fast learning capabilities of the neural network to refine the parameter set from the first phase, and obtain a more optimal nondominated parameter set. The nondominated parameter set obtained in the second phase is used as the input of the first stage, and the training process of two-phase algorithm is repeated until the termination criteria are met. In addition, a new self-adaptive parameter adjustment strategy is also used in the training process of genetic algorithm and neural network.
The detail implementation of the proposed algorithm is as follows: S tep 1: Generate random initial population. Create random initial population, and maintains it in a solution archive. The structure of the chromosomes that make up the entire population is illustrated in Figure 2.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 Figure 2 The chromosome represents the parameters of neural network The weight segment in chromosomes represents the weights between input layer and hidden layer, and also includes the weights between hidden layer and output layer. The bias segment is dedicated to representing the biases of hidden nodes and output nodes. The gain factors of hidden nodes and output nodes are represented by the gain factor segment. After studying multiple training results of neural network, the numerical range of neural network parameters is estimated, and the initial value of neural network parameters is limited between -1000 and 1000. S tep 2: Evaluate objective fitness functions. Calculate the objective fitness function values for neural network corresponding to solution, then sort all solutions based on these values and update the solution archive. First, create neural networks based on the parameters represented by all chromosomes in the MOGA population. Then, these neural networks are used to classify the samples in the training dataset one by one. Next, calculate and record TPR and FPR for five output types. The TRP and FPR are the most important criteria for the classifier, at the same time, MSE is an important criterion for evaluating the performance of neural network classification. MSE is record and is calculated as follows.
Where D l and Y o l represent the desired output and actual output for the l-th sample of the neural network respectively, and n is the number of samples in the training dataset.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 After that calculate Avg TPR and Avg FPR for the types, and they together with MSE constitute three objective fitness functions of the MOGA. Using Avg TRP instead of TRP as objective fitness function of the algorithm can avoid bias against specific attack types, especially those with less training samples such as R2L and U2R. For the same reason, Avg FRP is also used as objective fitness function. Finally sort all chromosomes and update the solution archive. The chromosomes are sorted according to the following two rules [37]: (1) First, chromosomes are sorted according to non-inferior order, and the chromosomes with small non-inferior order values are ranked at the top of the solution archive. (2) Secondly, chromosomes with same non-inferior order are sorted according to crowded degree, and the less crowded chromosomes is closer to the top of the solution archive. Among the above two rules, the first rule is made to find non-inferior solutions, and the second rule is set down to ensure that the distribution of non-inferior solutions is as dispersive as possible. S tep 3: Stopping criteria or designated generation. MOGA uses two different criteria to determine when to stop the solver. MOGA detects if there is no change in the best fitness value for some number of generations (stall generation limit). MOGA also stops when the designated number of generations is reached, by default this number is 300. If either of the two stopping criteria is reached, the algorithm will stop and go to step 5, otherwise the algorithm will go to step 4. S tep 4: Select crossover and mutate.MOGA uses selection, crossover, and mutation operators in the solution archives to generate new population. Add the new population into the solution archive, go back to step 2 and repeat the above steps. The MOGA used in this paper is a variant of Stud GA [38]. First, some best solutions in the archive obtained from step 2 are moved to a stallion archive, and the rest solutions were moved to the other temporary archive. Then, the linear ranking selection is used to select a solution from the stallion archive as a stallion, and select the other solution from the temporary archive. Next, the selected two solutions are particular randomly for crossover and create two offspring by arithmetic crossover operator. After that all the chromosomes resulted from the crossover operation will go through a mutation process and consequently. Finally, the selection, crossover, and mutation operators are used repeatedly to generate new population until the maximum population size is reached. S tep 5: Termination criteria.The algorithm will terminate only if the following three conditions are met: (1) The Avg TPR value is greater than the designated value.
(2) The Avg FPR value is smaller than the designated value.
(3) The number of non-inferior individual solutions is greater than the designated value. If termination criteria are satisfied, the algorithm will terminate, otherwise the algorithm will go to step 6.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 S tep 6: Generate neural network set for training. Select some best solutions from archive of nondominated solutions, and generate neural networks using the parameters represented by the solutions. The MBPNN [39] which is used in our approach adds gain factors G to change the steepness of neural network activation function. During the learning process, gain factors change with the change of weights and biases, so as to speed up the convergence. The main modifications of the algorithm are as follows: (1) The neural network activation function is still sigmoid function,but the value range is changed to [-0.5, +0.5]. It can overcome the problem that the change of weights and biases does not change the calculation when learning zero value samples. The modified sigmoid is as follows: (2) Let f be the neural network activation function, add gain factor G j to the net input I j , which is computed as the sum of the input features multiplied by the weights, so the output y j is defined as follows: (3) The rule for gain factor update is the same as the rule for weights and biases, and the update value of gain factor ∆G j can calculate as follows: Where η is learning rate, δ j represents error term at node j. S tep 7: Train neural network.Each created neural network from step 6 is trained on training dataset by back propagation algorithm, which reduces the error values and update weights, biases and gain factors, so that the actual output values and desired output values get close enough. The method to update gain factors is expressed as Equations (4). S tep 8: Generate new population from trained neural network set.When the learning process for neural networks is completed new chromosomes are constructed based on the parameters of each trained neural network, and then these chromosomes form a new population. Add the new population into the solution archive, go back to step 2 and repeated the above steps.

The combined classification method
In the testing session some non-inferior solutions are selected from archive of nondominated solutions obtained in the training session, and generate a MBPNN set for predicting. Therefore, how to choose these non-inferior solutions is critical to the performance of combined prediction.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 A genetic algorithm is used to find the optimal combination solution set. Each chromosome represents one combination solution set, the structure of the chromosomes is illustrated in Figure 4. Each gene of the chromosomes corresponds to a solution in the solution archive. When the gene value is 1, it means that the corresponding solution is selected into the combination solution set, and when it is 0, this corresponding solution is not selected. Since the number of solutions in the solution archive is too large, we choose a subset of the archive to form chromosomes. The Avg TPR segment in chromosomes represents some solutions with maximum Avg TPR values selected from the solution archive. The AVG FPR segment represents some solutions with minimum AVG FPR values, and the Accuracy segment represents some solutions with maximum Accuracy values. The GA takes Avg TPR, Avg FPR and Accuracy of combination solution set on training dataset as the optimization objectives. To simplify the calculation, the three objectives are combined into a single objective using a linear weighting method, as shown in the following formula.
Where f k (I) represents the k-th objective, which is one of the three objectives, namely Avg TPR, Avg FPR or Accuracy, a k is the weight of the k-th objective.

The Intrusion Detection dataset
The performance of the proposed approach is measured based on that KDD cup 1999 [40] dataset which is the most widely used for validation of an IDS. Each record of the KDD dataset contains 41 feature attributes and 1 label attribute. The dataset contains five major types: Normal, Probe, Denial of Service (DoS), User-to-Root (U2R) and Remote-to-Local (R2L) attacks. The last four are attack types which can be subdivided into 39 different attack types. The dataset is very large, including 5 million training records and 2 million test records, so it's practically very difficult to use the whole dataset. In this study, we firstly remove records which have the same value for all features, then randomly select different records to form subsets which contain different proportions of normal and attack instances. The selected subsets used in our experiments is as depicted in Table 1. Validation set is created by extracting 10% of the records (14,558) from the train set. The test set has the same number of records as validation set, and has never been exposed to training session. Data Transformation. We use one-hot encoding scheme to transform the three categorical features: protocol, service and state. A dimension is added for each new 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  Type  Train set  Test set  Validation set  Subtotal   Normal  79049  8783  8783  87832  Probe  1918  213  213  2131  DoS  49115  5457  5457  54572  U2R  47  5  5  52  R2L  899  100  100  999   Total  131028  14558 14558 145586 Where x represents the normalized data, x represents the raw data, mim(x) finds the minimum data value in the current feature values, and max(x) searches for the maximum data value for the current feature.

Experimental setup
To evaluate the proposed approach, an experimental application is implemented in C#. MBPNN is used as a basic classifier.

Performance Metrics
In order to evaluate the performance of the proposed IDS, we use the following widely known metrics: accuracy, true positive rate, false positive rate as defined as follows: 1) True Positive (TP) is the number of attack records classified correctly; 2) True Negative (TN) is the number of normal records classified correctly;

3) False Positive (FP) is number of normal records classified incorrectly; 4) False Negative (FN) is the number of attack records classified incorrectly.
True positive rate (TPR), also known as detection rate, recall or sensitivity, is the proportion of positive cases that are correctly identified and is calculated as follows: False positive rate (FPR), also known as false alarm rate (FAR), is the proportion of positive cases that are incorrectly identified and is calculated as follows: 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Accuracy is the proportion of the total number of predictions that is correct and is calculated as follows: The closer the value of these metrics (except for FPR) to one is, the better the network topology is. The values of FPR should be close to zero for better topologies.

Design of Experiments
The proposed approach involves two algorithms: MOGA and MBPNN. The implementation of MOGA takes some parameters as depicted in table 2. Designated generation is used to set the maximum number of cycles for genetic algorithm. In order to find more solutions, the population size maintains a large value. Stallion population is created to keep few optimal solutions. Trained population size is the number of selected chromosomes which is used to created MBPNN for phase 2.
The probability of crossover and mutation change with the quality of the obtained solutions, and bad solutions will increase the probability, but cannot exceed the set maximum value.

Parameter Values
Designated generation 400 Population size 300 Stallion population size 10 Trained population size 20 Maximum value of crossover probability 0.4 Maximum value of mutation probability 0.1 Table 2 Configuration of MOGA.
The parameters of MBPNN is depicted in table 3. Designated training epochs number is used to set the maximum number of cycles. The number of input nodes is the number of cyber intrusion feature attributes after one-hot encoding. The number of hidden nodes keep small for less computation. The number of output nodes corresponds to five types of cyber intrusion. The learning rate increase when the network error is large, otherwise it will decrease. The learning rate cannot exceed the set maximum value. Maximum and minimum initial value is used to set the range of the initial parameter values.  The parameters of GA for combined prediction is depicted in table 4. Number of MBPNN means how many solutions are selected from the solution archive to participate in the optimization calculation of the genetic algorithm. Designated generation is set to 300. The population size is set to 30, and the number of elite 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65  Table 4 Configuration of GA.

RESULTS AND DISCUSSION
Here, the two-phase cycle training algorithm is done to optimize parameter values for MBPNN. After that we select some non-inferior solutions obtained by the training algorithm to create MBPNN classifiers for the final ensemble. Finally, each created MBPNN is used to classify samples on test dataset, and the prediction results are combined by the majority voting method to give the final output of the ensemble. The proposed approach is applied to KDD cup 1999 dataset that produces a set of non-inferior MBPNN based ensemble solutions. The performance of ensemble solutions for training data is depicted in Figure 5, and for test data is shown in Figure 6. A very clear Pareto front can be seen in Figure 5 and Figure 6, which exhibits the excellent optimization performance of the algorithm.
It is worth noting that although some individual non-inferior solutions with high DR and low FAR have good performance, the combination of these solutions can still significantly improve their performance. We choose one solution with largest Avg TPR value, one solution with smallest Avg FPR value, and one solution with smallest MSE value to carry out prediction respectively. At the same time, an experiment called Combined was performed, which classifies by combining some solutions selected by the genetic algorithm.
The detection rate on test dataset is shown in table 5. In the tables, for each major type, the highest DR and lowest FAR will be emphasized in bold . The shaded   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 Figure 6 Test performance of the proposed approach cells represent the methods that have good trade off of DR and FAR. Based on DR and FAR, best performing classifiers are selected for local experts as follow: Avg TPR for Probe Detector, and Combined for Normal, DoS, U2R, and R2L Detectors. Obviously, the combined classification method performs better than the individual classification. table 6 compares the overall accuracy, detection rate and false alarm rate, and it also proves that the combined classification method is a feasible classification method with better performance for intrusion detection. More experiments are done to validate the performance of the proposed approach.

Conclusion
In this paper, a novel TPC-MOGA-MBPNN algorithm based on MOGA and MBPNN is proposed for effective intrusion detection. The proposed approach is capable of producing a pool of non-inferior individual solutions which exhibit classification trade-offs for the user. By using certain heuristics or prior domain knowledge, a user can select an ideal solution or combined solution as per application specific requirements. The proposed approach attempts to tackle the issues of low DR, high FPR and lack of classification trade-offs in the field of intrusion detection. The proposed approach consists of encoding of chromosomes that provides optimized parameter values of MBPNN. MOGA is employed to build multi-objective optimization model that generates Pareto optimal solutions with simultaneous consideration of Avg TPR, Avg FPR and MSE in the dataset. A two-phase cycle training algorithm based approach can rapidly generate numerous non-inferior solutions. In the first phase, a MOGA tries to find the Pareto optimal parameter set for the neural networks. In the next phase some selected MBPNN based on chromosomes obtained by MOGA are trained to find more optimal parameter set locally. The nondominated parameter set obtained in the second phase is used as the input of the first stage, and the training of two-phase algorithm is repeated until the termination criteria are reached. KDD cup 1999 dataset for intrusion detection are used to demonstrate and validate the performance of the proposed approach. The proposed approach exhibits the excellent optimization performance of the algorithm and a very clear Pareto front has been obtained. The optimized set of MBPNN exhibits the classification tradeoffs for the users. The user may select an ideal solution as per application specific requirements. We also demonstrate that combining a few MBPNN to classify is a feasible classification method and has better performance than using an individual MBPNN to classify. A genetic algorithm is used to find the optimal MBPNN combination, and can discover an optimized set of MBPNN with good accuracy and detection rate from benchmark datasets. The result shows that the proposed approach could reach an accuracy of 98.81% and a detection rate of 98.23%, which outperform most systems of previous works found in the literature. The result of this work has also provided an alternative in the issue of selecting an optimal solution among the non-dominated Pareto-optimal solutions.  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 The major issue in the proposed approach is that MOGA takes long time to compute fitness functions in various generations. It may be overcome by computing the function values in parallel, or limiting the population size. The proposed approach only uses a small subset of benchmark dataset for validation, and its applicability can be validated by more experiments on real cyber traffic in the field of intrusion detection.