Design and Development of an Efficient Network Intrusion Detection System Using Machine Learning Techniques

Today’s internets are made up of nearly half a million different networks. In any network connection, identifying the attacks by their types is a difficult task as different attacks may have various connections, and their number may vary from a few to hundreds of network connections. To solve this problem, a novel hybrid network IDS called NID-Shield is proposed in the manuscript that classifies the dataset according to different attack types. Furthermore, the attack names found in attack types are classified individually helping considerably in predicting the vulnerability of individual attacks in various networks. The hybrid NID-Shield NIDS applies the efficient feature subset selection technique called CAPPER and distinct machine learning methods. The UNSW-NB15 and NSL-KDD datasets are utilized for the evaluation of metrics. Machine learning algorithms are applied for training the reduced accurate and highly merit feature subsets obtained from CAPPER and then assessed by the cross-validation method for the reduced attributes. Various performance metrics show that the hybrid NID-Shield NIDS applied with the CAPPER approach achieves a good accuracy rate and low FPR on the UNSW-NB15 and NSL-KDD datasets and shows good performance results when analyzed with various approaches found in existing literature studies.


Introduction
Research in network security is a vastly emerging topic in the domain of computer networking due to the ever-increasing density of advanced cyberattacks. The intrusion detection systems (IDSs) are designed to avert the intrusions and to protect the programs, data, and illegitimate access of the computer systems. The IDSs can classify the intrinsic and extrinsic intrusions in the computer networks of an organization and instigate the alarm if security infringement is comprised in an organization network [1]. One of the notable definitions for intrusion is that it produces malignant, outwardly activated functional violations. The primary goal of intrusion detection systems is to recognize a broad variety of intrusions, heretofore identified and unidentified attacks; to discover and adapt to unfamiliar attacks; and to detect and recognize intrusions in a prompt pattern [2]. The pre-liminary work on IDSs was researched by Anderson [3] who recommended means of examining data. Subsequent to Anderson's work, the previous work was aimed at developing the algorithms and procedures for online automated systems. The Sytek project [4] started producing audit trails having enhanced security and considered different approaches for analyzing automated systems. These observations contributed to the first empirical evidence that the end users can be recognized from each other through user action of using the computer [5]. The proof of SRI and Sytek studies [6] was the foundation of real-time IDS. The behavior of the users, whether it is normal or suspected, is continuously monitored by these systems. The real-time IDS relies on two techniques: (1) intrusions whether normal or suspected can be tracked by the flagged departure from the factual patterns of respective users and (2) perceived system susceptibilities and various infractions of the system-aimed security protocols are best tracked from rule-based expert systems. The stability of precision and detection is primarily two measures applied mainly to assess the IDSs [7], and in recent years, many IDS research surveys have been accomplished to enhance these measures [8]. In the inception stages, many of the research studies mainly focus on the rule-based expert system and statistical approach. However, the various performance results show that these approaches when applied to large datasets are not accurate and precise [9].
To get the better of the above-mentioned problem, data mining approaches [10,11] and machine learning techniques were introduced [12]. Some machine learning paradigms containing Graph-based methods [13], Linear Genetic Programming [14], Bayesian Network [15], k-NN [16], K-means clustering [17], Hidden Markov Model [18], Self-organizing map [19], etc. have been explored for the architecture of IDSs. Machine learning [20] can detect the correlation between features and classes found in training data and identify relevant subsets of attributes by feature selection and dimensionality reduction, then use the data to build a model for classifying data to perform predictions. The data dimensionality related to data mining and machine learning has doubled in the last decade that leads to several questions to current learning approaches [21]. Due to the presence of excessive cardinal features, the model that tends to learn gets overfitted, resulting in the performance degradation of the model.
To solve the problem of data dimensionality in machine learning and data mining, various dimensionality reduction approaches have been accessed which is considered as an essential step in the area of machine learning and data mining. Feature selection is an extensively employed and efficient technique applied for dimensionality reduction. The main aim of feature selection is to select the limited feature subsets from primary features conferring to relevancy appraisal standard that manages the training model to accomplish greater performance outcomes and reduced execution time and achieve higher model predictability. Most of the classification problem needs the supervised learning where the class-conditional possibilities and cardinal class are not familiar and the class labels and its instances are associated with each other [22]. There is a scarcity of knowledge in real-world applications related to relevant features. Endless feature candidates are acquired to generate the more coherent domain, which results in the existence of irrelevant and redundant features to the target approach or objective function. For the target approach, the relevant or significant features are not irrelevant or redundant; neither the redundant feature is spontaneously correlated with the target approach or objective function but impacts the learning approach. The new events are not added by the redundant features to the target approach or objective function. In the majority of the classification problems, it is a composite to learn even if the classifier is competent due to the presence of an enormous number of data, till the redundant features are excluded from the objective function. For the classification problem, the features are once generated then instead of processing with full data; the feature selection will bring about the feature subsets from the initial fea-tures and then process with the feature subsets to a learning algorithm. The nominally sized feature subsets for the classification problem are selected by the feature selection approach conferring to the following criterion: (i) Normally, the classifier accuracy does not decline considerably (ii) Among all the likely features, the initial distribution of the class shall be approximately close to the proceeding distribution of the class whenever the values are likely towards the features selected To obtain the high merit feature subsets from 2 m subsets, the feature subset selection approaches search feature subsets conferring to a few significant appraisal criteria. However, this approach is intensive for the conclusion of the best subset and to select the intermediate-sized feature subsets with the volume (m); the strategy is expensive and restrictive. Various approaches like heuristic and random search lower the computational intricacy by a trade-off. To prohibit the feature subsets from exhaustively searching, a stopping criterion is required. A feature selection approach [23] does the job by subset generation, subset evaluation, stopping criterion, and the result from the validation. With the likely search approach, the chosen feature subsets are sent for subset evaluators with significant evaluation criteria. After the stopping criterion is performed, the feature subset that is competent enough to fit in the evaluation strategy is preferred, and then, finally, the finest feature subset is selected and gets authenticated by employing the domain knowledge or validation.
The detection methods of intrusion detection systems are classified into three major types: anomaly-based, signaturebased, and hybrid-based. The signature-based IDS and anomaly-based IDS were the most favored methods in an organization until numerous shortcomings were observed, which leads to the development of hybrid intrusion detection systems. In the designing of IDS, classifying the datasets according to attack types and selecting the good feature subsets are a hard problem. The classifying of datasets according to attack types aids in predicting the vulnerability of individual attacks in various networks. Moreover, relevant features should not be irrelevant or redundant so that accurate and highly merit feature subsets are obtained. To address this issue, a new hybrid network intrusion detection system called NID-Shield is designed that classifies the dataset according to attack types. Furthermore, the hybrid CAPPER approach is applied as a feature subset selection approach. Screening is applied to those features by the CAPPER approach which is redundant having a high-class correlation. Moreover, machine learning algorithms are applied for selecting high merit and accurate feature subsets.
The major contributions of this manuscript are as follows: (i) An efficient hybrid NID-Shield NIDS is proposed in this manuscript that classifies the UNSW-NB15 and NSL-KDD datasets according to the attack types and attack names 2 Wireless Communications and Mobile Computing (ii) An effective hybrid feature subset selection method called CAPPER is applied as a feature subset selection that combines the CFS and Wrapper approaches for obtaining the reduced accurate and high merit feature subsets (iii) The reduced accurate and high merit datasets obtained from CAPPER are trained by the machine learning approaches and assessed by a 10-fold cross-validation method (iv) The hybrid NID-Shield network intrusion detection system shows overall good improvement results on the different approaches found in the existing literature studies The remaining article is coordinated accordingly. Section 2 focuses on related work. Section 3 proposes the architecture of the hybrid NID-Shield NIDS. Section 4 relates to the characteristics of UNSW-NB15 and NSL-KDD datasets. Section 5 discusses the performance evaluation of the hybrid NID-Shield NIDS approach with various existing approaches on the UNSW-NB15 and NSL-KDD dataset, and Section 6 then concludes the work.

Related Work
This section introduces the existing literature studies on the hybrid network intrusion detection system. Moreover, this section discusses the advantage of a hybrid intrusion detection system over a traditional intrusion detection system. Furthermore, distinct machine learning approaches are acquainted and discuss the usefulness of selecting specific machine learning techniques.
2.1. State-of-the-Art Network IDSs. The research in the manuscript is focused on studying the appropriateness of intrusion detection approaches to recognize network-level intrusions, as the network structures generate resources more susceptible to intrusions than autonomous machines. Three facets of network structures generate resources more exposed to attack by an autonomous machine: (1) networks generally provide additional resources than autonomous machines; (2) networks are usually formed to aid resource sharing; and (3) the global security policies that are applied to the IDS are limited [24]. Moreover, the hybrid methods are suggested over the signature and anomaly-based IDS, as the integration of multiple approaches into a distinct hybrid system retains the advantages of multiple techniques, while reducing many of the deficiencies [25].
Acharya and Singh [26] conclude that for obtaining the best possible detection and accuracy rate, the hybrid learning approaches can be a good choice and proposed intelligent water drop (IWD) algorithm, introduced by Shah-Hosseini [27]. This approach applies the support vector machine (SVM) as a classification algorithm and IWD approach as a feature selection technique that is inspired by the nature. IWD approach selects the best feature subsets, and the evaluation of the subsets is executed by the SVM classifier. The proposed model lowers the forty-five features from the applied dataset to the lowest of ten features. KDD-Cup '99 dataset is used for the appraisal of metrics. The proposed approach attains an accuracy, detection, and precision of 99%. The disadvantage of applying the elemental IWD algorithm is the likelihood of choosing the adjacent node for a water drop to stream.
Arif et al. [28] introduced the hybrid approach for IDSs. In this approach, pruning of the node is performed by PSO and pruned decision tree is applied for the classification purpose in a NIDS. The proposed approach applies the single and multiple-objective particle swarm optimization (PSO) algorithms. The KDD-Cup '99 dataset is used as an experimental evaluation approach. From the 10% KDD-Cup '99 training and testing dataset, thirty arbitrary samples are chosen for evaluation purposes. The statistical records in every training and testing dataset are 12,000 and 24,000 accordingly for the appraisal of the metrics. The precision of 99.95% and accuracy of 93.5% are achieved using the above approaches. But there are some primary problems involved with traditional PSO when adopted as a feature selection approach. The most significant problem submits the following question: in a random initialization, from the initial population, how far is it to reach an optimal solution. If the optimum answer tells that the predicted prediction is far distant, then it may not be possible to obtain the global optimal solution within the allocated time. The second problem involves the conventional upgrading mechanism of global best and personal best of the PSO approach, as these mechanisms may result in losing some valuable features.
Ahmed et al. [29] applied a triple strategy to build a hybrid IDS in which the Naive Bayes feature subset selector (NBFS) technique has been applied for dimensionality reduction. For the outlier rejection, optimized support vector machines (OSVM) are applied, whereas prioritized k-nearest neighbors (PKNN) are applied as a classifier. The NSL-KDD, KDD-Cup '99, and Kyoto 2006+ datasets are used for evaluation purposes. 18 efficient features are preferred from the KDD-Cup '99 dataset with a detection ratio of 90.28%. 24 features are selected from the Kyoto 2006+ dataset having a detection ratio of 91.60%. The author has compared with previous work and has the best overall detection ratio of 93.28%. The major disadvantage with the Naive Bayes is that it presumes prediction of the features that are mutually independent to one another. The features with mutual independence are consistently hard to get in realworld problems.
Dash et al. [30] reports two new hybrid intrusion detection methods that are GS and sequence of GSPSO which is the combination of gravitational search and the particle swarm optimization algorithms. It involves search agents who relate to each other having heavy masses from the gravitational force, and their performance is assessed by their mass. The combination approach has been carried out to train ANN with models such as GS-ANN and GSPSO-ANN. The random selection of 10% features is selected for training purposes, while 15% is used for testing purposes and is applied successfully for intrusion detection purposes. The author does not apply any feature selection technique. The KDD-Cup '99 dataset was applied as a metric for 3 Wireless Communications and Mobile Computing    6 Wireless Communications and Mobile Computing calculation. Normalization of the dataset was done for uniform distribution by MATLAB. An average detection ratio of 95.26% was achieved. The gradual shift of the search agent encourages the relevant solution of the algorithm, but the major weakness is its speed of convergence that slows down in subsequent stages and has the tendency to get trapped in the local optimum solution. Yao et al. [31] introduced a hybrid framework for IDS. K -means algorithm is employed for clustering purposes. In the classification phase, many machine learning algorithms (SVM, ANN, DT, and RF) which are all supervised learning algorithms are compared on different parameters. The supervised learning algorithm has various parameters for different kinds of attacks (DoS, U2R, Probe, and R2L). FIMS is applied as a feature selection technique. The proposed approach has obtained an accuracy rate reaching 96.70% with the KDD-Cup '99 dataset. The drawback with the FIMS approach is that the correlation between the candidate features and their class is not considered.
Suad and Fadl [32] introduced an IDS model applying the machine learning algorithm to the big data environment. This paper employs a Spark-Chi-SVM model. ChisqSelector is applied as a feature selection method, and an IDS model is constructed by applying the SVM as a classifier. The comparison is done with the Spark-Chi-SVM classifier and Chilogistic-regression classifier. The KDD-Cup '99 dataset is used for the metrics of the evaluation process. The result shows that the Spark-Chi-SVM model shows good performance having an AUROC of 99.55% and an AUPR of 96.24%. The disadvantage of ChisqSelector is having a larger sensitiveness towards the sample size. However, when the sample size increases, the total differences become smaller than the predicted value.
Ijaz et al. [33] introduce a genetic algorithm, which is based on vectors. In this technique, vector chromosomes are applied. The uniqueness of this algorithm is that it shows the chromosomes as a vector and training data as metrics. It grants multiple pathways to have a fitness function. Three feature selection techniques are chosen: forward feature selector algorithm (FFSA), linear correlation feature selector (LCFS), and modified mutual information feature selector (MMIFS). The novel algorithm is tested in two datasets (CDU-13 and KDD-Cup '99). Performance metrics demonstrate that the vector genetic algorithm has a high detection ratio of 99.8% and a low false positive rate of 0.17% on the denial of service (DoS) attack. However, the authors do not evaluate the U2R, Probe, and R2L attacks which are considered important metrics in the IDS.
Alauthaman et al. [34] proposed an approach of peer-topeer bot detection build on a feed-forward neural network in assistance with the DT. CART is then applied as a feature selection approach to obtain the significant features. Network traffic reduction techniques were applied by using six rules to pick the most relevant features. Twenty-nine features are selected from six rules. The proposed approach obtained an accuracy of 99.20% and a detection ratio of 99.08%, respectively. The disadvantage of utilizing a CART is that the decision tree may not be stable and the CART splits the variables one by one.
Venkataraman and Selvaraj [35] report an efficient hybrid feature selection structure for the classification of Wireless Communications and Mobile Computing the data. For classification purposes, symmetrical uncertainty is applied to find the relevant features. Moreover, GA is applied to search for the merit subset with higher accuracy. The author combined SU-GA as a hybrid feature selection approach. MATLAB and Weka tools are applied for evaluation purposes. Different classification algorithms (KStar, J48, NB, SMO, DT, JRIP, Multilayer Perceptron, and Random forest) are used to classify different attacks. The average learning accuracy with Multi Perpn and SU-GA is the highest having 86.0%. The major drawback of a genetic algorithm is that it may be computationally expensive, as the training of the model is required for the appraisal of each candidate. GA is stochastic, so it may require a longer time to converge.
Kumar and Kumar [36] introduce an intelligent-based hybrid NIDS model. This model then integrates the multilayer perception, fuzzy logic controller, adaptive neurofuzzy interference system, and a neurofuzzy genetic. The author applied fuzzy logic as a feature selection method. The proposed system has three key elements: analyzer, collector, and predictor modules, for gathering and filtering network traffic to classify the data and prepare the final decision in assuming knowledge on the accurate attack. The experiment is assessed on the KDD-Cup '99 dataset that achieves an improvement of true attack detection and false alarm detection accuracy upto 99% rate of 1% using MATLAB. The disadvantage of fuzzy logic is that the results are observed based on assumptions, and due to this reason, accuracy is sometimes incorrect.
Cavusoglu et al. [37] applied the hybrid approach for IDS using machine learning techniques. k-nearest neighbor and Naive Bayes algorithms are used for classification purposes, while the random forest algorithm is used as a classifier. The author applied two feature selection techniques called the CfsSubsetEval and WrapperSubsetEval approach. J48 algorithm is applied in conjunction with WrapperSubsetEval for selecting accurate attributes. For the evaluation of metrics, the NSL-KDD dataset is applied. The overall accuracy of 99.86% is obtained on all types of attacks.
Saxena et al. [38] implemented a DBSCAN-based hybrid technique for obtaining the high-quality feature subsets for IDS. DBSCAN is employed as a method for eliminating noise from data. For grouping data, K-means clustering is proposed. The SMO classifier is applied for classification purposes. The KDD-Cup '99 dataset is applied for evaluation purposes with reduced attributes. The proposed approach, DBKSMO, achieved an accuracy of about 98%. Weka and MATLAB tools are applied for the execution of the results. However, the major disadvantage of DBSCAN is that whenever there is a cluster having variations in density or the clusters having similar variation, its performance declines, the major reason being the setting of ε (distance threshold), and minimum points for determining the neighborhood points will change from clusters to clusters, whenever density changes. This problem exists for high-dimensional data, as the ε (distance threshold) becomes difficult to examine.
Kambattan and Rajkumar [39] introduced effective IDS, which employs a feature selection technique named IFLFSA to select the finest reduced features that are effective for analyzing the attacks. To identify the outliers from the dataset, the EWOD approach is utilized. An intelligent layered technique is employed for efficient classification. For experimental purposes, the KDD-Cup '99 dataset is applied. The comprehensive detection rate wraps the detection rate on four types of attacks, namely, Probe, DoS, U2R, and R2L. The detection rate of the proposed system is achieved at a rate of 99.45%. The weakness of using intelligent agents is that whenever the global constraints are applied, the intelligent agent fails to deliver appropriately. Each agent is more effective in dealing individually with the main or central controller. The agents make the decisions based on locally acquired knowledge; whenever there is global knowledge available, the agents are missing the major available knowledge globally.
Kar et al. [40] utilize the decision tree algorithm called ID3 which is applied for the classification of the data into its corresponding classes. To designate the class labels to its unexplored data point on its class labels to the k-nearest point, the k-NN approach is applied. Isolation forest is introduced to isolate the anomaly against normal instances. The suggested approach HFA has applied to the NSL-KDD and KDD-Cup '99 dataset. The metrics on the KDD-Cup '99 dataset obtained the ACC of 96.92%, DR of 97.20%, and FPR of 7.49%. The proposed algorithm performance with the NSL-KDD dataset has an ACC of 93.95%, DR of 95.5%, and FPR of 10.34%. However, the main drawback of applying the k-NN is that whenever the size of the variables increases, the k-NN finds it difficult in predicting the output of the new data positions. On the other side, the k-NN does well with the variables having smaller numbers.
Mishra et al. [41] applied the BFS-NB hybrid structure in IDS. This paper proposes the best first search technique for dimensionality reduction which was employed for the attribute selection technique. For the classification of data, Naïve Bayes classifier is applied for a classification purpose and to maximize the accuracy of detecting intrusion. The BFS-NB algorithm is analyzed with the KDD dataset gathered from the US Air Force. The classification accuracy of BS-NFB is 93% while the sensitivity analysis of 97% is achieved. The major disadvantage with the Naive Bayes is that it presumes prediction of the features that are mutually independent to one another.
Dutta et al. [42] introduced a hybrid model for improving the classification metrics in a NIDS. The literature applies a deep neural network for enhancing classification accuracy. Furthermore, classical autoencoder is used as a feature subset selection technique. The efficiency of a proposed technique is evaluated with the UNSW-NB15 dataset. A precision rate of 92.08%, a recall of 90.64%, an accuracy of 91.29%, and F -measure of 91.35%, and an FPR of 0.805 are obtained from the proposed architecture. The deep neural network has activation functions and multiple layers that produce nonconvex shapes. The drawback of a deep neural network probably introduces the complex error space, leading to the substantially tuning of hyperparameters to be able to get into a small error space so that the model can be beneficial. Moreover, the training is very slow due to the tuning of many hyperparameters.
Latah and Toker [43] introduce an efficient flow-based multilevel hybrid intrusion detection system. The author 8 Wireless Communications and Mobile Computing applies the k-NN, H-ELM, and ELM which are used for classification purposes, and the SDN controller is used as a feature selection method. An accuracy of 84.29%, FPR of 6.3%, a precision of 94.18%, a recall of 77.18%, and F-measure of 84.83% are obtained from the proposed approach. However, the disadvantages of k-NN are that it is not able to handle well with large and high-dimensional datasets. Furthermore, the k-NN is sensitive to the noise in the dataset. Sumaiya Thaseen et al. [44] applied the integrated techniques CFS + ANN to improve the classification accuracy. CFS is applied as a feature selection approach for selecting the best feature subsets, while the ANN is employed as a classifier. UNSW-NB15 and NSL-KDD datasets are used for evaluating purposes. An accuracy of 98.45%, a sensitivity of 92.94%, a specificity of 94.38%, and an execution time of 500 seconds are obtained on the NSL-KDD dataset. For the UNSW-NB15 dataset, an accuracy of 96.44%, a sensitivity of 50.4%, specificity of 98.4%, and an execution time of 1023 seconds are achieved. The major disadvantage of ANN is that it takes a longer time for training the data.
Safaldin et al. [45] applied the improved binary gray wolf optimizer as a feature selection method and support vector machine for classification in an IDS in a wireless sensor network. The proposed approach attains an accuracy of 96%, FPR of 0.03, a detection rate of 0.96, and an execution time of 69.6 h. The choosing of a good kernel function is hard which is the major disadvantage of the SVM classifier. Moreover, SVM takes a longer time in training the large datasets, and to store all the support vectors, the memory consumption is extensive.
Vallathan et al. [46] introduce the skeptical action detection system that is based on the deep learning approach in IoT surroundings. Unexpected activities obtained from the footage of the N/W surveillance devices are predicted with the help of deep learning approaches and RFKD. For classification purposes, the multiclassifier approach is used, while DNN is used for training and learning the data. Moreover, for prediction and clustering of data, the kernel density approach is applied. The proposed approach uses the basic merge-sort tree as a feature subset selection approach. For evaluation purposes, HHAR datasets are used. The proposed approach obtained an accuracy rate of 98.4%, specificity of 99.8%, and a sensitivity of 96.02%, on the HHAR dataset. However, the main drawback of the neural network is that the training is very slow due to the tuning of many hyperparameters. Table 1 depicts the taxonomy of the latest hybrid IDS techniques with its various feature selection approaches. When the literature studies are analyzed, most of them do not classify the dataset according to attack types and attack names thus preventing the assessment of individual attacks on the various networks. Distinct attacks may have peculiar connections as some of the attacks such as R2L and U2R may have very few N/W connections, while other attacks such as Probe and DoS may have a large number of N/W connections or can be a combination of any of them. The attack names found in the attack types help in predicting the vulnerability of individual attacks in various networks. Moreover, a feature selection approach that utilizes highly merit and accurate feature subsets which apply machine learning techniques is not utilized. Furthermore, performance metrics such as precision, MCC, ROC area, PRC area, kappa statistic, MAE, RAE, RMSE, and RRSE which are considered important metrics in model predictability are not utilized in the existing works of literature.
Due to the reviewed problem in the literature studies, a novel hybrid network IDS named NID-Shield has been introduced that employs a distinct machine learning and efficient hybrid feature subset selection approach called CAPPER that is the sequence of the CFS and Wrapper method. Moreover, the hybrid NID-Shield NIDS classifies the dataset according to the various attack names and their types found in the dataset.

Advantages of Hybrid
NIDSs. This section introduces the problem of the existing approaches of IDSs based on anomaly and signature IDSs and explains the advantages of hybrid network intrusion detection systems.
Cybersecurity ventures [47] in the report estimate that the damages arising due to cybercrime in 2025 will increase to $10.5 trillion annually as compared to $3 trillion in 2015. Furthermore, there is a prediction of nearly 7.5 billion active internet users by the end of 2030 worldwide and spending on cybersecurity aggregately surpasses $1 trillion approximately in the coming five years globally.
Despite having enormous financing in the field of IDSs, the losses brought by the intrusions are soaring at an alarming rate leading to enormous debt revenues to the organizations. Considering the efficiency of the IDS, there should be an analytical and stringent proceeding to be acclimated so that network susceptibilities can be classified in a precise and accurate fashion. In past decades, the IDS has been the blocking source for ever-growing intrusion violations and it is utilized as a primary prevention method against computer attacks, safeguarding networks and computer systems. IDS employs statistical techniques, logical operation, and machine learning approaches to analyze distinct kinds of network behaviors [48]. Although present-day IDSs are certainly effective and pursue upgrades, they still develop numerous false alarm rates and fail to analyze the unidentified attacks. Utmost IDSs rely upon inappropriate and redundant inferior level network data to observe cyber intrusions [49]. At two layers of supervision, the existing intrusion detection approaches work to counter the cyberattacks, the host, and the network level. NIDS audits the details of N/W connections to identify the cyberattacks. Contrarily, HIDS scans the workstations' stature and internals of the computing structure utilizing definitive IDS techniques so that at the host level, the potential intrusions can be detected. NIDS is the operating system and platform-independent that does not require any modification when NIDS operates. This makes NIDS more scalable and robust compared with HIDS.
Machine learning analysts classify IDS within three extensive categories: anomaly-based, signature-based, and hybrid-based [50]. The anomaly-based IDS employs the new action profiles which are created every time to distinguish the deviation of outliers from the new profiles. Anomaly-based IDS depends on analytical methods to 9 Wireless Communications and Mobile Computing constitute an attack predictor model. The attack that does not have predefined signatures is recognized by the anomalybased IDS as its main strength. However, a major weakness lies in the difficulty in creating new action profiles every time. Moreover, the deviations of outliers from the new profiles always are not an attack. Failing to analyze the perimeters of new actions leads to the false prediction of new actions as an attack, possibly ending in a high false-positive rate. The signature-based intrusion detection systems evaluate resemblance among occurrences under scrutiny and the familiar attack patterns. If the patterns formerly established are recognized, then alarms are triggered. For signaturebased IDS, e.g., the SNORT [51] is among the utmost preferable, consistently adapted technique. SNORT carries out content seeking, content resembling, and real-time traffic investigation to recognize attacks by employing the predefined precise signatures. Although these systems are definite in analyzing the identified attack, they are incapable to perceive the unidentified attack.
The hybrid-based IDS integrates the anomaly and signature detection approaches to detect attacks. However, the computational expense of utilizing the anomaly and signature IDS that examines the N/W connections is the major drawback of hybrid approaches. The anomaly and signature IDS were the most preferred methods in an organization until various weaknesses were observed leading to the development of hybrid intrusion detection systems. Furthermore, when Table 1 is observed related to hybrid network IDS, most of the literature studies do not classify the dataset according to attack types and their names leading to the difficulty in predicting the attacks individually on different networks. To solve this problem, a novel hybrid NIDS called NID-Shield is proposed in the manuscript that classifies a dataset according to different attack types.

Machine Learning Algorithms Used in This
Study. Distinct machine learning algorithms such as neural network [52], decision trees [53], k-nearest neighbor [54], and support vector machine [55] are introduced by the researchers to attain learning on the datasets. Under the contrasting structure of the datasets, the particular algorithms apply distinct methods for achieving higher performance from the datasets. The relevant approach may be applied according to the divergent form of the datasets [56]. Machine learning algorithms such as Naive Bayes, random forest, and J48 (C 4.5) are applied in this study for analyzing the outcome of feature selection and training of the classifier. These algorithms are known to be prominent in the area of machine learning and have proven appropriate in the process.
2.3.1. Random Forest. Random forest [57] is the sequel of tree predictors, and every tree corresponds to the profit of random vector that is sampled independently, and there is an identical distribution of entire trees in the forest. In a forest, as the tree grows larger, the generalization error coincides to a greater extent. The generalization error from a forest relies on the individual strengths of a tree in the forest and correlation between each other in a forest of classifier trees. The random forest performs sequences of inputs or the inputs that are randomly selected at every node so that the accuracy can be increased. By applying this method, the correlation is decreased and simultaneously yields efficacy to forests. The random forest constructs the random features at every node by dividing the limited number from the input variables and electing the features randomly. In the random forest, the tree is grown with the same procedure as the CART [58] approach and the branch that is to be developed is determined by the Gini index. Random forest applies bagging [59] besides the selection of random features. From the standard training dataset, a contemporary training dataset is performed with substitution, and then, on the contemporary training dataset with the help of random feature selection, the tree is grown. Pruning of the tree is not performed on the random forest; rather, the trees are grown in this approach.
Employing bagging has mainly two benefits. Firstly, the accuracy is increased each time the random features are enforced. Secondly, estimation of the generalization error containing the ensemble tree combination and the correlations and its intensity appraisal is provided by the bagging. The assessment is carried out-of-bag [60]. The main approach behind the out-of-bag estimation is the incorporation of nearly one-third of classifiers from the continuing prevailing sequence. Whenever the statistic of the sequence is incremented, the rate of error declines. Therefore, the contemporary error rate can be augmented by out-of-bag estimation; hence, it is necessary to pass on from the area where the merging of the error occurs. In the cross-validation, there is a high probability of the existence of bias; also, the degree of extent of the bias is unfamiliar, whereas the out-of-bag estimation is free from bias. The random forest applies two-thirds of the data and for testing one-third of the data from training data, to grow the tree. Out-of-bag data is simply the one-third data from the training data. Pruning is not performed by the random forest which thus aids in fast and high performance. Moreover, having the multiple tree construction, the random forest performs reasonably well with additional tree framework and it achieves a higher performance rather than any other decision tree method.

Naïve
Bayes. Naive Bayes [61] is the classifier having the probabilistic nature, having the relationship relevant to Bayes belief with the strong expectation, and having naive independence between its features. With the kind of probabilistic analysis, the Naive Bayes represent the knowledge. In mathematical terms, the Naive Bayes can be defined as where R and S are the events and PðRÞ and PðSÞ are the events.
PðR/SÞ is the posterior probability, having the probability of observation of the event R, given that the S is true. PðRÞ and PðSÞ are called the prior probabilities of R and S. PðS/R Þ is called likelihood, the probability of observation of an event S, given that R is true. The Naive Bayes version that is applied in this study is the implementation by [62]. The nominal feature probabilities are approximated from the 10 Wireless Communications and Mobile Computing given data and the Gaussian distribution. The highly apparent class for the given instance based on the entire data distribution is predicted by the Bayes classifier or Bayes rule. Whenever the log probabilities are applied, the Naive Bayes is easy to understand. There are added scoring objectives and natural expression capabilities found in the log probabilities. High accuracy can be obtained from the Bayes classifier. Whenever the redundant features have been eliminated, the performance of the Naive Bayes improves considerably, as discussed by Langley and Sage [63]. Moreover, when modest dependencies prevail in the data, the Naive Bayes performs exceptionally, as discussed by Domingos and Pazzani [64]. A minimal execution time is needed from Naive Bayes to train the data.
2.3.3. J48(C4.5) Decision Tree Generator. C4.5 [65] decision tree is applied in this study. C4.5 is a descendent of an ID3 algorithm. C4.5 is commonly known as J48 in the Weka library. C4.5 constructs the decision trees, and the pruning is performed on the decision trees with the help of the top-down method. The construction of the trees is performed by C4.5 by finding the feature sets having distinct best characteristics so that on the root node of a tree, the testing of the features can be performed. The nodes of the tree relate to its features and branches that relate to its values. The leaf of the tree is reciprocal to the classes, and to classify the new instance, one needs to analyze the features that are tested at the nodes of the tree and pursue the branch corresponding to the values noticed in an instance. The process gets terminated, whenever it arrives at the leaf and also the nomination of the class to its instance. The greedy approach is used by C4.5 to construct the decision trees which applies the information-theoretic estimates. For obtaining the attribute for tree root, this algorithm splits the instances of the training into subsets which coincides with the attributes corresponding amount. If there is insignificant entropy among the labels of the class in a subset corresponding to labels of the class in the entire training dataset, gaining the information is done by dividing the attribute. The gain ratio principle is enforced by C4. 5       By employing the algorithm iteratively, subtrees are constructed in this algorithm. Furthermore, the algorithm terminates upon finding the likely subset that contains a distinct class. The main distinction between C4.5 and ID3 is that pruning is performed on decision trees by C4.5; hence, by applying the pruning, the simplification is done on the decision trees and has the high chance of reducing the overfitting on a training data. C4.5 performs pruning by employing the confidence interval upper bound on the resubstitution error. The succession of the node is preceded by the best leaf, whenever the error of the estimation of the leaf is situated within a single standard deviation from the predicted error of a node. C4.5 is considered as an efficient algorithm, whenever the

The Proposed Hybrid NID-Shield Network Intrusion Detection System
This section introduces the various techniques applied by the hybrid NID-Shield NIDS. The data preprocessing steps are performed by applying the transformation and normalization operations on datasets, and then, an effective hybrid feature subset selection technique called CAPPER is applied for obtaining the accurate and highest merit feature subsets. Finally, the hybrid NID-Shield NIDS is suggested as a whole exclusively.

Data Preprocessing.
In data preprocessing, the transformation and normalization operation is performed on NSL-KDD 20% dataset. It can help to better expose the underlying structure of the data to the learning algorithm and, in turn, may result in better predictive performance.

Data Transformation.
In the transformation operation, the nominal values are converted to numeric values. The IDSs are considered as the classification issue and some classification approaches are not able to handle the nominal features [66]. In the NSL-KDD 20% dataset, the attributes such as protocol_type, service, and flag are transformed from nominal to numeric values and the final NSL-KDD 20% dataset contains the entire numeric values for the classification process.

Data Normalization.
Data normalization is an essential paradigm, specifically in the area of classification. The instances are observed as a multidimensional area in the linear classification approaches. Without normalization, few objective functions do not work accordingly due to the wide variations of raw data. For example, if the particular feature has wide value ranges, then the range within the points is controlled by the distinct feature. Thus, normalization of the numeric features needs to be done so that every feature provides nearly proportional to the eventual distance. Therefore, by applying the normalization, there is a significant improvement in accuracy and speed. For this study, minimal-maximal normalization approach is applied to the dataset. The minimal-maximal normalization is given as The minimal-maximal normalization technique linearly scales each feature to the interval of [0, 1]. Resizing of the interval [0, 1] is performed by altering every feature value such that the minimum value is 0, and then, division is performed by the current maximum value. The current maximum value is the change among the initial maximum value and minimum value which is obtained from equation (2).

Feature Selection Approaches.
A hybrid feature subset selection approach named CAPPER [67] is employed for feature subset selection that combines the feature subsets from the CFS and Wrappers for the feature subset selection method. This section introduces the CAPPER approach.

Correlation-Based Feature Subset
Approach. CFS is the filter method that utilizes correlation-based searching for the appraisal of the feature subsets. The feature subset ranking is accomplished by conferring to correlation-based searching. The bias is accessed to those subsets which are greatly correlated to its class and uncorrelated among them. This approach ignores the features that are irrelevant and having fewer correlations among its class. The screening is applied to the features which are redundant and hugely correlated among its class. The acceptance of the features is done by the CFS when the residual features do not predict the predicted class in the instant space.
where F S is the heuristic merit of the feature subset S having the m features, n ff is the average feature-feature intercorrelation, and n cf are the feature mean class correlation. The searching of the space is performed with the help of a bestfirst approach. The high-quality subset of the features is obtained by equation (3), which aids in reducing the dimensional reduction of testing and training data. Moreover, the numerator of equation (3) illustrates that how remarkably the class predictability is with feature sets and the denominator denotes the redundancy between the features.

Wrapper Subset Selection Approach.
In the Wrapper approach, the feature subset selector is performed with the help of an induction approach. The searching of the feature subset space is performed with the help of backward elimination and forward selection methods. The backward elimination begins with complete feature sets and removing those features that degrade the performance. The forward selection  -fold cross-validation is also called an out-of-sample test or rotation estimation. S is the original sample, which splits into folds of S 1 , S 2 , S 3 ,...,S n relatively identical size every time t Є f1, 2,: ⋯ , kg trained on S \ S t and tested on S t . The induction approach is tested and trained k times. The estimation of the cross-validation accuracy is the comprehensive figure of accurate classifications divided from the total instances from the dataset. Let S i is the testing set that contains the instances p i = <v i , q i > ; then, the estimation of cross-validation accuracy is obtained as The best-first approach is applied as the search technique. Upon arriving at the goal, the best-first search usually terminates. The accuracy estimation is obtained from equation (4). By combining the feature subsets from Wrapper and CFS approaches, CAPPER attains the accurate and high-quality feature subsets.

Ensemble
Learning. Ensemble learning [68] was initially evolved in automated decision-making systems to lessen the variance and thus increase the accuracy. The problems in machine learning domains such as error correction, estimation confidence, missing features, and cumulative learning are strongly addressed by ensemble learning techniques. Ensemble learning is widely used in the area of pattern recognition, artificial intelligence, machine learning, data mining, and neural networks. Ensemble learning has proved its efficiency and functionality in an extensive area of real-world problems.
The ensemble learning combines various base learners or weak learners and integrates them to make a strong learner. The superiority of ensemble learning is that it increases the accuracy of the weak learning system so that the comprehensive accuracy of the classifier on the training datasets is increased as compared to the single base learning algorithms.
3.3.1. Stacking. In stacking [69], the cardinal classifier obtains a new dataset from the original datasets. If the same instances are generated from the original dataset by the cardinal classifier, then there is high speculation that the data gets overfitted, which is the primary reason the datasets with contemplating nature need to be obtained for discarding the overfitting of the data. There is a suggestion to use the cross-validation approach for the new instances of the cardinal classifier; also, the group of features has to be considered for the contemporary training dataset and the different categories of the learning algorithms on the Meta-learner. Distinct learning algorithms are applied for obtaining the cardinal learner. Then, the new datasets are used with Meta-learner to train the data. Stacking is the induction of numerous machine learning approaches.

k-Fold
Cross-Validation. Cross-validation techniques are frequently mentioned as test/train holdout approach by the researchers. In the k-fold cross-validation [70], the repetition on the dataset is performed k times. At every round, the data-set is split into k parts; one part is applied for the validation and the residual k-1 parts of the datasets are combined into a training subset for appraisal of the model. In k-fold crossvalidation, a complete set of testing and training data is used, and the main idea of this technique is to lessen the fatalistic bias by applying the major number of training data while keeping the large testing datasets separately. The folds of the test data do not overlap each other. In k-fold cross-validation, each of the samples is applied for validation. Sometimes, it is necessary to choose the exact value of k to avoid the high bias in the model. Usually, the value of k = 10 is chosen mostly, as the various experimental results show that the model has small bias and low variance whenever this value is applied. The results from this approach are then combined or averaged to generate the distinct estimation.

The Proposed Hybrid NID-Shield Network Intrusion
Detection System Using Hybrid Feature Selector. The preliminary design approach behind the hybrid NID-Shield is the classification of datasets according to different attack types. The advantage of classification of the dataset according to attack types is that it can find a set of arbitrary features. Moreover, the attack names found in the attack types help in predicting the vulnerability of individual attacks in various networks. Distinct machine learning algorithms are analyzed as per the individual attack types. The machine learning algorithms having high accuracy; low FPR are selected for different attack types and applied in the designing of the hybrid NID-Shield NIDS. The hybrid NID-Shield NIDS applies the hybrid approach called CAPPER for selecting the optimal feature subsets. The hybrid CAPPER approach for feature selection combines the optimal feature subsets from the CFS and Wrappers for the feature subset selection method. From the CFS approach, a prominently superior feature subset is obtained which is independent of irrelevant and redundant features. The wrapper method uses induction learning algorithms to attain a highly accurate feature subset. By combining the filter and wrapper approaches, high merit and accurate feature subsets are obtained which is then applied for training and testing purposes.
For designing the hybrid NID-Shield NIDS, single and ensemble learning algorithms are used together so that a high-performance rate and lower FPR can be achieved. Testing is performed with single and ensemble learning algorithms; it has been found that ensemble learning achieved high-performance results, where the NSL-KDD 20% having fewer samples in some of its attack types. The highperformance classifier is determined for different attack types, and for the classifier performance, the k-fold cross-  Figure 1 depicts the simple block diagram of the hybrid NID-Shield network intrusion detection system and Figure 2 displays the architecture of the hybrid NID-Shield NIDS.

Dataset Characteristics
For the performance of the hybrid NID-Shield NIDS, two contemporary UNSWNB-15 and NSL-KDD 20% datasets are utilized for evaluation purposes. These datasets are related to cybersecurity and are high-dimensional and class imbalanced datasets [71]. For the NSL-KDD dataset, the statistical prevalence of around 36% was found in denial of service (DoS), while for other attack types like Root to Local (R2L) and User to Root (U2R), the prevalence is lesser than 1%. This shows that NSL-KDD is a highly imbalanced data-set. For the UNSW-NB15 dataset, the normal class frequency is about 32%, while attack type frequency is very few and differs highly. For example, Worms and Exploits attack patterns vary around 257 times. This reflects that UNSW-NB15 is a highly imbalanced dataset.

NSL-KDD Dataset.
From DARPA 98 intrusion detection system appraisal programs, the KDD-Cup '99 dataset is obtained and widely applied dataset in the domain of IDS, but the main disadvantage of the KDD-Cup '99 datasets has various duplicate and redundant records. The duplicate records have a total of 75%. The redundant record has a total of 78%. Due to this duplication and redundant information hinders from categorizing the additional records [72]. A new NSL-KDD dataset was suggested [73] that does not contain the duplicate and redundant records in testing and training data [74], which aided in removing the duplicate and redundant issues which is an implicit issue in KDD-Cup '99 dataset. The arrangements of elected records from every adversity class level are inversely proportional to the percent of records available in the standard KDD datasets. With these results, the classifying rates of apparent machine learning approaches differ in an extensive range that makes it more efficient to obtain a precise appraisal of distinct learning approaches. The statistical records in the training and testing sets are feasible that causes it to be reasonable to conduct the experiments on an entire set, thus preventing the unnecessary need to randomly select the limited part. Therefore,  (iii) User to root (U2R). An intruder tries to acquire accessing the system roots or the administrator privileges by sniffing the passwords. The attacker then looks for the vulnerabilities in the system, to acquire the gain of the administrator authorization. In U2R, there are 13449 normal instances and 11 attack instances with three attack names, namely, loadmodule, buffer_overflow, and rootkit (iv) Root to local (R2L). The intruder attempts by gaining a connection to the remote machine, which does not  19 Wireless Communications and Mobile Computing have the necessary and legal privilege to access that machine. The attacker then exploits the susceptibility of the remote system and tries gaining access rights to the remote machine. There are 13449 normal instances and 209 attack instances in this dataset. There are eight attack names in this dataset, namely, ftp_write, guess_passwd, multihop, phf, imap, warezclient, spy, and warezmaster 4.2. UNSW-NB15 Dataset. The UNSW-NB15 [75] dataset was generated at the cyber range lab by the IXIA Perfect-Storm tool at the Center for cybersecurity, Australia. There are 2,540,044 records in the dataset. The part of the dataset is further divided into train and test sets. There are 82,332 records in the testing set and 1,75,341 records in the training set, having normal and attack instances. There are 45 features in this dataset obtained in immaculate format, including class and label. Moreover, there are nine attack types in a UNSW-NB15 dataset: DoS, Analysis, Backdoor, Exploit, Fuzzers, Generic, Worm, Shellcode, and Reconnaissance and a Normal instance. Table 5 shows the total instances in UNSW-NB 15 training and testing dataset, and Table 6 depicts the UNSW-NB 15 dataset and its features.
For the evaluation of the proposed approach, the machine learning workbench tool, Weka 3.8 [76], is used. In Weka, the Wrapper approach, the CFS approach, and the classifier algorithms like J48, Naïve Bayes, and Random forest are implemented in Java, and evaluation of code is accomplished on Intel i3 8100 processor with 2.20 GHz having 4.00 GB RAM and carried out on NetBeans 8.0.2.

Performance Metrics
For validation of the results, this section presents various performance evaluation metrics. The researchers apply false negative (FN), true negative (TN), true positive (TP), false positive (FP), etc. [77] for the justification of the results.
Definition 1 (confusion matrix). Also called error metric, which allows the interaction among actual and predicted classes. It is significant for calculating precision, recall, accuracy, specificity, AUC, and ROC curve. On the testing dataset, the confusion matrix allows visualizing the algorithms' efficiency and is usually adapted to describe the classifier performance. Table 7 shows the confusion matrix.
Definition 11 (kappa statistic). It is applied to calculate the concurrence between observed and predicted values of the datasets, while the concurrence is corrected that occurs unexpectedly. It is calculated by where p 0 is the comparative noticed concurrence between the estimates and p e is the assumed likelihood of possible concurrence.
Definition 12 (mean absolute error). It is the averaging of the magnitude of the distinctive error and the computing the standard of absolute errors. It is calculated as where p 1 is the value predicted on the test instances and a 1 is the actual value.
Definition 13 (root mean-squared error). The RMSE calculates the dissimilarities among observed values and predicted values of a model. It is given by where p 1 is the value predicted on the test instances and a 1 is the actual value.
Definition 14 (relative absolute error). The errors are normalized from the errors of simple predictors in which the average value is predicted. It is calculated as where p 1 is the value predicted on the test instances and a 1 is the actual value.
Definition 15 (root relative squared error). It normalizes the total squared error by division of the total squared error from the simple predictor. It is obtained from where p 1 is the value predicted on the test instances and a 1 is the actual value.
Definition 16 (AUC and ROC). ROC explains detection ratio changes in contrast to its internal verge to develop a high or low FPR. The larger the AUC values, the better the performance of the classifier.

Performance Evaluation with NSL-KDD 20% according to Attack
Types. This section evaluates the DOS, Probe, U2R, and R2L, the types of attack of the NSL-KDD 20% dataset. The NID-Shield NIDS is assessed with J48 as an attribute selection approach, and finally, the selected attributes are appraised with a machine learning algorithm as a classifier.

Evaluation of DoS Attack with Normal and Attack
Instances on Hybrid NID-Shield NIDS (1) DoS Attack Evaluated with Hybrid NID-Shield NIDS. The following algorithms were applied for evaluation of feature subsets: attribute evaluator: CAPPER, attribute evaluator algorithm: J48, search method: best first, classifier evaluator: random forest.

22
Wireless Communications and Mobile Computing feature selector obtains the highest merit and accurate feature subsets from the combination of CFS and Wrapper approaches. Table 8 depicts the metrics of the DoS attack with its attack names classified individually. In the DoS, there are six attacks, namely, neptune, back, land, smurf, pod, and teardrop, and the normal instances. Figure 3 shows that the NID-Shield NIDS achieved an accuracy of 100% on the normal instances and 100% accuracy on the attack names such as land, back, teardrop, and neptune, while on the names of the attack such as pod and smurf, the NID-Shield NIDS achieves an accuracy of 94.7% and 99.6%, respectively. Overall, the weighted average of the accuracy of the normal and attack names is calculated; the NID-Shield NIDS achieves 100% accuracy on normal and all the attack types. Figure 4 shows the NID-Shield NIDS achieved a TP rate of 1.000 on the normal instances and a TP rate of 1.000 on attack names such as land, back, teardrop, and neptune, while on the attack names such as pod and smurf, the NID-Shield NIDS achieves a TP rate of 0.947 and 0.996, respectively. Overall, the weighted average of the TP rate is measured on normal and all attack names; the NID-Shield NIDS achieves 100% TP rate on normal and all attack names. Figure 5 depicts the FP rate evaluated by the NID-Shield NIDS on normal and attacks names, the NID-Shield NIDS achieves a 0.000 falsepositive rate on all attack names, and an FP rate of 0.000 is achieved on the normal instance. Figure 6 illustrates the precision of the NID-Shield NIDS which is assessed with normal and attack names. The NID-Shield NIDS obtained a precision of 1.000 on all normal instances and a precision of 1.000 on attack names such as neptune, back, land, and teardrop, while the precision of 0.998 and 0.973 is obtained on smurf and pod attack by the NID-Shield NIDS. Overall, a weighted average of 1.000 is obtained on precision for normal instances and attack names. Figure 7 depicts the recall appraised with NID-Shield NIDS on normal and attack names, the normal instances achieve a recall of 1.000 by the NID-Shield NIDS, and the attack names such as neptune, land, back, and teardrop achieve a recall of 1.000 by the NID-Shield NIDS, while the NID-Shield NIDS achieves a recall 0.996 and 0.947 on the attack names such as smurf and pod, respectively. Overall, the weighted average of recall is appraised for normal and all types of attack names; the NID-Shield NIDS obtained a recall of 1.000 on normal and attack names. Figure 8 depicts the F -measure of the NID-Shield NIDS appraised with the normal and attack names; the NID-Shield NIDS achieves an F

Wireless Communications and Mobile Computing
-measure of 1.000 on normal instances and attack names such as neptune, land, back, and teardrop; and the NID-Shield NIDS achieves an F-measure of 1.000, while for attack names such as smurf and pod, the NID-Shield NIDS obtains the F-measure of 0.997 and 0.960, respectively. Overall, the weighted average is appraised for F-measure on normal and all attack names; the NID-Shield NIDS obtained an F -measure of 1.000 on normal and attack names. For the MCC, the NID-Shield NIDS is appraised with the normal and attack names; the NID-Shield NIDS achieves an MCC of 1.000 on normal instances and with attack names such as neptune, back, land, teardrop; and the NID-Shield NIDS achieves an MCC of 1, while for attack names such as smurf and pod, the NID-Shield NIDS obtains the MCC of 0.997 and 0.960, respectively. Overall, the weighted average of MCC is measured for normal and on all attack names; the NID-Shield NIDS obtained an MCC of 1.00, respectively.
For the ROC area, the NID-Shield NIDS achieves an overall 1.000 on normal and all attack names, respectively. For the PRC area, the NID-Shield NIDS obtained a 1.000 on normal instances, while for attack names such as land, back, teardrop, smurf, and neptune, the NID-Shield NIDS obtained a PRC area of 1.000 and for attack names called pod, the NID-Shield NIDS obtained a PRC area of 0.997. Overall, the weighted average is calculated for the PRC area on normal and all attack names; the NID-Shield NIDS achieved a PRC area of 1.000 on normal and all attack names.

Evaluation of Probe Attack with Normal and Attack
Instances on Hybrid NID-Shield NIDS (1) Probe Attack Evaluated with Hybrid NID-Shield NIDS. The following algorithms were applied for evaluation of feature subsets: attribute evaluator: CAPPER, attribute evaluator algorithm: J48, search method: best first, classifier evaluator: random forest.
The CAPPER evaluated subsets are as follows : 2, 3, 4, 12, 24, 27, 29, 31, 32, 35, 36, 37, and 40. In this section, the Probe attack is evaluated with the hybrid NID-Shield NIDS on the Probe attack dataset. The stacking is applied for further improvement of the metrics. The stacked ensemble applies the random forest plus the Naive Bayes as a base classifier. Table 9 shows the Probe attack evaluation metrics without stacking ensemble, and Table 10 shows the evaluation of the Probe attack with a stacked ensemble. A considerable improvement in the FP rate is noticed when the NID-Shield NIDS is evaluated with a stacked ensemble. In the Probe attack, there are four attacks, namely, portsweep, satan, ipsweep, and nmap, and

24
Wireless Communications and Mobile Computing the normal instances. Figure 9 shows that the NID-Shield NIDS achieved an accuracy of 99.90% on the normal instances and for the attack names such as portsweep, satan, ipsweep, and nmap, the NID-Shield NIDS achieved an accuracy of 99.7%, 97.7%, 99.3%, and 96.3%, respectively. Overall, the weighted average of accuracy is calculated on normal and attack names; the NID-Shield NIDS obtains 99.7% accuracy on normal and on all attack names. Figure 10 depicts that the NID-Shield NIDS achieved a TP rate of 0.999 on the normal instances and attack names such as portsweep, satan, ipsweep, and nmap; the NID-Shield NIDS achieved an accuracy of 0.997, 0.977, 0.993, and 0.963, respectively. Overall, the weighted average of the TP rate is measured on normal and attack names; the NID-Shield NIDS achieves a TP rate of 0.997 on normal and all attack names. Figure 11 depicts the FP rate evaluated by the NID-Shield NIDS on normal and attacks names; the NID-Shield NIDS achieves a 0.000 false-positive rate on attack names such as portsweep and satan; and for other attack names like ipsweep and nmap, the NID-Shield NIDS obtains an FP rate of 0.001, respectively. For the normal instance, an FPR of 0.010 is achieved by the proposed NIDS. Figure 12 depicts that the precision of the NID-Shield NIDS is assessed with normal and attack names. The NID-Shield NIDS achieves a precision of 0.998 on normal instances, and for attack names such as portsweep, satan, ipsweep, and nmap, the NID-Shield NIDS achieved a precision of 1.000, 0.993, 0.987, and 0.967, respectively. Figure 13 depicts the recall appraised with NID-Shield NIDS on normal and attack names; the normal instances achieve a recall of 0.999, while for the attack names such as portsweep, satan, ipsweep, and nmap, the NID-Shield NIDS achieves a recall of 0.997, 0.977, 0.993, and 0.963, respectively. Overall, a weighted average of the recall is appraised for normal and on all types of attack names; the NID-Shield NIDS obtains a recall of 0.997. Figure 14 illustrates the F-measure of the NID-Shield NIDS assessed with the normal and attack names, the NID-Shield NIDS achieves an F-measure of 0.999 on normal instances, and on attack name types such as portsweep, satan, ipsweep, and nmap, the NID-Shield NIDS achieves an F-measure of 0.998, 0.985, 0.990, and 0.965, respectively. Overall, the weighted average is appraised for F-measure on normal and on all types of attack names; the NID-Shield NIDS obtained an F -measure of 0.997. For the MCC, the NID-Shield NIDS is appraised with the normal and attack names, the NID-Shield NIDS achieves an MCC of 0.990 on normal instances, and with attack names such as portsweep, satan, ipsweep, and nmap, the NID-Shield NIDS achieves an MCC of 0.998, 0.984, 0.990, and 0.964, respectively.
Overall, the weighted average of MCC is calculated for normal and on all attack names; the NID-Shield NIDS obtained an MCC of 0.990, respectively. The NID-Shield NIDS obtained a ROC of 0.999 on normal instances, and with attack names such as portsweep, satan, ipsweep, and nmap, the NID-Shield NIDS achieves a ROC area of 1.000, 0.995, 0.999, and 0.997, respectively. Overall, the weighted average of 0.999 is obtained by the NID-Shield NIDS in the ROC area. For the PRC area, the NID-Shield NIDS achieves 1.000 on normal instances, and with attack names such as portsweep, satan, ipsweep, and nmap, the NID-Shield NIDS achieves a PRC area of 1.000, 0.990, 0.994, and 0.986, respectively. Overall, a weighted average is appraised for the PRC area; the NID-Shield NIDS achieves a PRC area of 0.999, respectively.

Evaluation of U2R Attack with Normal and Attack
Instances on Hybrid NID-Shield NIDS (1) U2R Attack Evaluated with Hybrid NID-Shield NIDS. The following algorithms were applied for evaluation of feature subsets: attribute evaluator: CAPPER, attribute evaluator algorithm: J48, search method: best first, classifier evaluator: random forest.
The CAPPER evaluated subsets are as follows : 3, 4, 6, 9, 10, 13, 14, 17, 18, 33, and 36. In this section, the U2R attack is evaluated by the hybrid NID-Shield NIDS on the U2R attack dataset. Table 11 shows the metrics of the U2R attack with the three attack names in the U2R attack, namely, buffer_overflow, loadmodule, and rootkit. Figure 15 shows that the NID-Shield NIDS achieved an accuracy of 100% on the normal instances and all attack types. Figure 16 shows the NID-Shield NIDS achieved a TP rate of 1.000 on the normal instances and all attack names. Figure 17 depicts the FP rate evaluated by the NID-Shield  Figure 18 depicts the precision of the NID-Shield NIDS assessed with normal and attack names. The NID-Shield NIDS achieves a precision of 1.000 on all normal instances and attack names. Figure 19 depicts the recall appraised with NID-Shield NIDS on normal and attack names the normal instances and attack names achieve a recall of 1.000. Figure 20 illustrates the F-measure with NID-Shield NIDS evaluated with the normal instances and attack names; the NID-Shield NIDS achieves an F-measure of 1.000 on normal instances and attack names.
For the MCC, the NID-Shield NIDS is appraised with the normal and attack names; the NID-Shield NIDS achieves an MCC of 1.000 on normal instances and attack names. For the ROC area and PRC area, the NID-Shield NIDS achieves an overall 1.000 on normal and all attack names, respectively.

Evaluation of R2L Attack with Normal and Attack
Instances on Hybrid NID-Shield NIDS

Wireless Communications and Mobile Computing
(1) R2L Attack Evaluated with Hybrid NID-Shield NIDS. The following algorithms were applied for evaluation of feature subsets: attribute evaluator: CAPPER, attribute evaluator algorithm: J48, search method: best first, classifier evaluator: random forest.
The CAPPER evaluated subsets are as follows : 4, 5, 6, 10, 11, 17, 22, 31, 32, 33, 36, and 38. In this section, the R2L attack is evaluated by the hybrid NID-Shield NIDS approach on the R2L attack dataset. Table 12 shows the evaluation metrics of the R2L attack. In the R2L attack, there are eight attack names, namely, ftp_ write, guess_passwd, phf, imap, warezmaster, multihop, warezclient, and spy, and normal instance. Figure 21 shows that the NID-Shield NIDS achieved an accuracy of 100% on the normal instances and for attack names such as ftp_write, guess_passwd, phf, imap, warezmaster, and multihop, the NID-Shield NIDS achieved an accuracy of 100%, respectively, while for the attack names such as warezclient and spy, the NID-Shield NIDS achieves an accuracy of 97.4% and 91.7%, respectively. Overall, the weighted average in terms of accuracy is appraised for the normal and attack names; the NID-Shield NIDS achieves 99.99% accuracy on normal and all attack names. Figure 22 depicts that the NID-Shield NIDS achieved a TP rate of 1.000 on the normal instances and for the attack names such as ftp_write, guess_ passwd, phf, imap, warezmaster, and multihop, the NID-Shield NIDS achieved a TP rate of 1.000, respectively, while the attack names such as warezclient and spy, the NID-Shield NIDS achieved a TP rate of 0.974 and 0.917, respectively. Overall, the weighted average of the TP rate is measured on normal and an attack name; the NID-Shield NIDS achieves a TP rate of 0.999 on normal and all attack names. Figure 23 depicts the FP rate evaluated by the NID-Shield NIDS on normal and attacks names; the NID-Shield NIDS achieves a 0.000 false-positive rate on all attack names. For the normal instance, an FPR of 0.019 is achieved. Overall, a weighted average FP rate of 0.019 is obtained on normal and attack names. Figure 24 shows that the precision of the NID-Shield NIDS is evaluated with normal and attack names. The NID-Shield NIDS achieved a precision of 1.000 on normal instances, and for attack names such as guess_ passwd, ftp_write, multihop, warezmaster, and spy, the NID-Shield NIDS achieved a precision of 1.000, respectively, while for attack names such as imap, phf, and warezclient, the NID-Shield NIDS obtained a precision of 0.875, 0.900, and 0.974, respectively. Overall, a weighted average precision of 0.999 is achieved on normal and attack names. Figure 25 depicts the recall appraised with NID-Shield NIDS on normal and attack names, the normal instances achieve a recall of 1.000, and for the attack names such as guess_passwd, ftp_write, imap, phf, multihop, and warezmaster, the NID-Shield NIDS achieves a recall of 1.000, respectively, while for attack names such as warezclient and spy, a recall of  Figure 26 depicts the F-measure with the NID-Shield NIDS assessed with the normal and attack names, the NID-Shield NIDS achieves an F -measure of 1.000 on normal instances, and with attack names such as guess_passwd, ftp_write, multihop, and warezmaster, the NID-Shield NIDS achieves an F-measure of 1.000, respectively, while for attack names such as warezclient, spy, phf, and imap, the NID-Shield NIDS achieves an F-measure of 0.974, 0.957, 0.947, and 0.933, respectively.
Overall, the weighted average is calculated for F-measure on normal and all types of attack names; the NID-Shield NIDS obtained an F-measure of 0.999, respectively, on normal and attack names. For the MCC, the NID-Shield NIDS is appraised with the normal and attack names, the NID-Shield NIDS achieves an MCC of 0.978 on normal instances, and with attack names such as guess_passwd, ftp_write, multihop, and warezmaster, the NID-Shield NIDS achieves an MCC of 1.000, respectively, and on attack names such as imap, phf, warezclient, and spy, the NID-Shield NIDS obtained an MCC of 0.935, 0.949, 0.974, and 0.957, respectively. Overall, a weighted average is appraised for MCC; the NID-Shield NIDS achieves an MCC of 0.978 for normal and attacks names. For the ROC area, the NID-Shield NIDS achieves a 1.000 on normal instances and attack names. For the PRC area, the proposed NID-Shield NIDS achieves a 1.000 on normal instances, and with attack instances such as guess_passwd, ftp_write, phf, multihop, and warezmaster, the NID-Shield NIDS achieves a PRC area of 1.000, and for attack names such as warezclient, imap, and spy, the PRC area obtained is 0.999, 0.982, and 0.969, respectively. Overall, a weighted average PRC area of 1.000 is obtained by the NID-Shield NIDS for all normal instances and attack names. The CAPPER evaluated subsets for Generic attack are as follows: 2, 3, 7, 8, 9, 25, 31, 39, and 40. In this section, the UNSW-NB15 dataset attack is evaluated by the hybrid NID-Shield NIDS approach on the UNSW-NB15 testing dataset. Table 13 illustrates the evaluation metrics of the UNSW-NB15 normal and attack instances. In the UNSW-NB15 dataset attack, there are nine attack names, namely, Backdoor, Reconnaissance, Exploits, DoS, Fuzzers, Analysis, Worms, Generic, and Shellcode, and normal instances. Figure 27 shows that the NID-Shield NIDS achieved an accuracy of 100% on the normal instances and Worms attack while for other attacks such as Backdoor, Reconnaissance, Exploits, DoS, Fuzzers, Analysis, Generic, and Shellcode, the NID-Shield NIDS achieved an accuracy of 99.71%, 99.45%, 98.70%, 99.10%, 90.14%, 99.20%, 99.70%, and 99.61%, respectively. Overall, the weighted average in terms of accuracy is appraised for the normal and an attack name; the NID-Shield NIDS achieves 99.89% accuracy on normal and all attack names. Figure 28 shows that the NID-Shield NIDS achieved a TP rate of 1 on the normal instances and Worms attack while for other attacks such as Backdoor, Reconnaissance, Exploits, DoS, Fuzzers, Analysis, Generic, and Shellcode, the NID-Shield NIDS achieved a TP rate of 0.997, 0.994, 0.987, 0.991, 0.901, 0.992, 0.997, and 0.996, respectively. Overall, the weighted average in terms of TP rate is appraised for the normal and attack names; the NID-Shield NIDS achieved an accuracy of 0.998    29 Wireless Communications and Mobile Computing appraised for the normal and attack names; the NID-Shield NIDS achieved a precision of 0.999 on normal and all attack names. Figure 31 depicts the recall appraised with NID-Shield NIDS on normal and attack names, the NID-Shield NIDS achieved a recall of 1.000 on the normal instances and Worms attack, while for other attacks such as Backdoor, Reconnaissance, Exploits, DoS, Fuzzers, Analysis, Generic, and Shellcode, the NID-Shield NIDS achieved a recall of 0.999, 0.998, 0.982, 0.991, 0.941, 0.993, 0.998, and 0.999, respectively. Overall, the weighted average in terms of recall is appraised for the normal and attack names; the NID-Shield NIDS achieved a recall of 0.998 on normal and all attack names. Figure 32 shows the F-measure of the NID-Shield NIDS evaluated with the normal and attack names; the NID-Shield NIDS achieved an F-measure of 1.000 on the normal instances and Worms attack, while for other attacks such as Backdoor, Reconnaissance, Exploits, DoS, Fuzzers, Analysis, Generic, and Shellcode, the NID-Shield NIDS achieved an F-measure of 0.999, 0.997, 0.982, 0.997, 0.962, 0.996, 0.999, and 0.997, respectively. Overall, the weighted average in terms of F-measure is appraised for the normal and attack names; the NID-Shield NIDS achieved an F-measure of 0.997 on normal and all attack names.
For the MCC, the NID-Shield NIDS is appraised with the normal and attack names; the NID-Shield NIDS achieved an MCC of 1.000 on the normal instances and Worms attack, while for other attacks such as Backdoor, Reconnaissance, Exploits, DoS, Fuzzers, Analysis, Generic, and Shellcode, the NID-Shield NIDS achieved an MCC of 0.999, 0.995, 0.993, 0.997, 0.972, 0.998, 0.997, and 0.997, respectively. Overall, the weighted average in terms of MCC is appraised for the normal and attack names; the NID-Shield NIDS achieved an MCC of 0.992 on normal and all attack names. The NID-Shield NIDS achieves a ROC and PRC area of 1.000 on normal and attack instances. Table 14 shows the hybrid NID-Shield NIDS with the existing approaches in this literature. The details of the existing approaches are shown in Table 1. For the evaluation of the hybrid NID-Shield NIDS approach, the proposed hybrid NID-Shield NIDS evaluates the attack names on the UNSW-NB15 dataset, and overall performance metrics are considered such as Probe, DoS, R2L, and U2R, and attack names on the NSL-KDD 20% dataset. The NID-Shield NIDS achieves a 99.89% on the UNSW-NB15 dataset and overall accuracy of 99.90% on the NSL-KDD dataset, which is the highest among all other approaches. When the TP rate is calculated, overall, the NID-Shield NIDS obtained a TPR of 0.999 on the NSL-KDD 20% dataset and 0.9998 on the UNSW-NB15 dataset which is the best among all other approaches. When FPR is comprehensively evaluated, the literature proposed by Cavusoglu achieves an overall best FPR of 0.000035 and the NID-Shield NIDS For the insight of the discussion of the results, CAPPER and the random forest is the primary speculation for obtaining high metrics on both datasets. CAPPER is an effective feature subset selection technique that obtains accurate and high merit feature subsets from CFS and Wrapper methods. CFS searches the space of the feature subset by employing the best first search method and calculates the feature-class correlations and feature-feature correlations by applying the approaches based on conditional entropy. The high merit subset is measured by equation (3), which greatly aids in dimensionality reduction of both the testing and training data. In Wrapper, the feature subset search is executed by the best first search approach. The best first search at each iteration creates its successors having a node with maximal estimation accuracy. The induction algorithm is employed as a feature subset selection approach. The induction algorithm is run k times, and the training set uses the k − 1 partitions, while the test set employs other partitions. Five fold cross-validation techniques are applied as the subset evaluation approach. The estimation of the accuracy is obtained by equation (4). To obtain the accurate and finest feature subsets, the machine learning approaches are applied by the Wrapper approach. The accurate and high merit feature subsets obtained by CFS and Wrapper are then combined to obtain the reduced dataset.
The random forest is considered as the most efficient classifier as compared to other classifiers. The foremost reason for obtaining the high accuracy is applying the bagging by the random forest. Employing bagging has mainly two benefits. Firstly, the accuracy is increased each time the random features are enforced. Secondly, estimation of the generalization error containing the ensemble tree combination and the correlations and its intensity appraisal is provided by the bagging. The assessment is carried out-of-bag. The main approach behind the out-of-bag estimation is the incorporation of nearly one-third of classifiers from the continuing prevailing sequence. Whenever the statistic of the sequence is incremented, the rate of error declines. Therefore, the contemporary error rate can be augmented by out-of-bag estimation; hence, it is necessary to pass on from the area where the merging of the error occurs. In the cross-validation, there is a high probability of the existence of bias; also, the degree of extent of the bias is unfamiliar, whereas the out-of-bag estimation is free from bias. The random forest applies two-thirds of the data and for testing onethird of the data from training data, to grow the tree. Out-of-bag data is simply the one-third data from the training data. Pruning is not performed by the random forest and thus aids in fast and high performance. Moreover, having the multiple-tree construction, the random forest performs reasonably well with an additional tree framework and it achieves a higher performance rather than any other decision tree method.