– Maintenance and Reliability

For mitigating and managing risk failures due to Internet of Things (IoT) attacks, many Machine Learning (ML) and Deep Learning (DL) solutions have been used to detect attacks but mostly suffer from the problem of high dimensionality. The problem is even more acute for resource starved IoT nodes to work with high dimension data. Motivated by this problem, in the present work a priority based Gray Wolf Optimizer is proposed for effectively reducing the input feature vector of the dataset. At each iteration all the wolves leverage the relative importance of their leader wolves’ position vector for updating their own positions. Also, a new inclusive fitness function is hereby proposed which incorporates all the important quality metrics along with the accuracy measure. In a first, SVM is used to initialize the pro - posed PrGWO population and kNN is used as the fitness wrapper technique. The proposed approach is tested on NSL-KDD, DS2OS and BoTIoT datasets and the best accuracies are found to be 99.60%, 99.71% and 99.97% with number of features as 12,6 and 9 respectively which are better than most of the existing algorithms. Highlights Abstract Proposed swarm intelligent based PrGWO result-• ing in optimal feature set. Contributed by improving classification of indi-• vidual classes present in datasets. Overall the accuracy improved for two of the da-• tasets, with third very close to best. Proposed new fitness function resulting in inclu-• sive performance measurement.

For mitigating and managing risk failures due to Internet of Things (IoT) attacks, many Machine Learning (ML) and Deep Learning (DL) solutions have been used to detect attacks but mostly suffer from the problem of high dimensionality. The problem is even more acute for resource starved IoT nodes to work with high dimension data. Motivated by this problem, in the present work a priority based Gray Wolf Optimizer is proposed for effectively reducing the input feature vector of the dataset. At each iteration all the wolves leverage the relative importance of their leader wolves' position vector for updating their own positions. Also, a new inclusive fitness function is hereby proposed which incorporates all the important quality metrics along with the accuracy measure. In a first, SVM is used to initialize the proposed PrGWO population and kNN is used as the fitness wrapper technique. The proposed approach is tested on NSL-KDD, DS2OS and BoTIoT datasets and the best accuracies are found to be 99.60%, 99.71% and 99.97% with number of features as 12,6 and 9 respectively which are better than most of the existing algorithms.

Introduction
Internet of things (IoT) application is ever increasing since its inception due to its widespread use in areas of smart buildings, smart vehicles, smart highways and wireless sensor network which will grow exponentially in coming years [27]. The network layer is particularly more vulnerable to attacks due to attackers' ability to launch it at several locations. Various famous attacks aimed at the network layer are categorized into Denial of service (DoS) attacks, User to Root (U2R) attacks etc [8]. Apart from the network layer, attacks may be directed at the application layer or other layers, too. Over the years research efforts have been aimed at improving the algorithms for classification and thereby improving Intrusion detectin system (IDS), however still there are lot of problems left to be adequately addressed in order to make the system robust.

Problems and motivation:
Developing a feature selection technique to find the most optimal 1. reduced feature vector. The requirement for optimized feature vector is necessary to a) reduce the training time of the model and b) test the incoming traffic in real time.
Developing an IDS that can improve classification accuracy, de-2.
tection rate, false positive rate and other important performance metrics using this optimal reduce feature vector.

Objective:
Motivated by the problems listed above, the authors intend to devise an IDS technique/methodology to reduce and find optimal feature vector giving maximum fitness in terms of accuracy, Detection rate (DR) and False positive rate (FPR) for a given traffic/dataset. Thus the present problem can be conceived as a multiobjective optimization problem. This optimal feature vector can then be used for classification on the IoT nodes. The IoT nodes based on the trained model can then test for incoming traffic using this optimal reduced feature set (instead of considering the entire feature set)in real time.

Contribution:
A priority based Gray wolf Optimizer with SVM and kNN (PrG-1. WO-SK) is proposed, in which the support vector Machine (SVM) alongwith first layer fitness function is used to find the reduced feature vector which then acts as feeder vector for the initialization of Priority based Gray wolf optimizer (PrGWO) wolves/particles.
PrGWO is proposed wherein the intra-group priority of different 2.
leader hunting wolves is used by other wolves to update their own positions during each iteration. The leader wolves uses the same concept in a modified form to update their own positions. In the present work, two fitness functions are proposed to be used 3.
in the two different layers. The first layer fitness function focuses solely on the accuracy while the second layer fitness function balances not only the requirement of accuracy but considers other metrics like detection rate, false positive rate and length of feature vector.
taset and newly captured DS2OS, BoTIoT datasets using various metrics-Accuracy,FPR, Recall, Precision and F1-score and the results were found to be encouraging.

Related work
The implement security mechanism in vehicular adhoc network. To reduce the training time, subsampling technique was used to filter out the less useful data thus validating the utility of SVM in different applications. In the literature, extensive research has been done in the field of metaheuristic algorithms for traversing the search space and converging to a solution. Tama et al. [35] in their paper have used ensemble technique to classify the data. In this paper feature selection was accomplished through Ant colony optimization and PSO. However the work suffers from methodological complexity in using a large number of algorithms. Kunhare et al. [19] used the random forest algorithm for feature selection coupled with the PSO algorithm. The accuracy and number of features were optimally generated, however the comparison with other work was not extensive. Wei et al. [41] in their paper used the jaccard fitness function for evaluating the optimality of the feature set. The accuracy was measured to be very good but the number of features increased. J. Gu et al. [9] used the concept of Naive Bayes to enhance the differences in the feature values to enable SVM to clearly distinguish between normal and attack types.
Though the accuracy improved, the number of features was neglected.
T. Wisanwanichthan et al. [42] in their paper used double layered approach i.e. SVM and Naive Bayes to classify the data. In their work the emphasis was on R2L and U2R but overall the accuracy suffered. Inspired by swarm intelligence concept, Mirjalili S. et al. [21] developed gray wolf optimization algorithm which used swarm technique to combine multiple greedy best solutions to update the subsequent solutions. Here the optimal solution was reached by calculating the fitness function. E. Emary et al. [5] in their paper has proposed the binary version of Gray wolf optimizer for feature selection. The proposed approach was validated by using it over a number of datasets. However, for NSL-KDD dataset no experiments were performed. M. Safaldin et al. [28] in their work used five leader wolves to guide the new positions of all wolves instead of four. The paper discussed about the enhancement ratio concept and compared the results according to population size. Apart from the above mentioned works, some of the important related work is depicted in the Table 1.   [23] datasets are used. NSL-KDD dataset is a legacy and widely used network dataset for intrusion detection work thus making it possible to compare the present work with others work in a comprehensive way. The DS2OS dataset is a collection of traces captured at the application layer in IoT environment while the BoTIoT dataset is a relatively recent dataset captured by designing network environment. Thus the testing of proposed approach on three different datasets-each one having their unique characterstic-helped validate the proposed work. Secondly, the used dataset is pre-processed using section 3.2 to convert into a form acceptable to the algorithm. Thirdly the data set is divided into 80% training and 20% testing subsets through random selection. Fourthly, the data is passed to the first layer (SVM with Radial basis Function(rbf)) which used first layer fitness function for generating reduced feature set. The SVM with rbf kernel has the advantage of seperating non-linearly seperable, closely related classes in dataset. The reduced feature set is passed to the second layer which used this to initialise the population of search agents of proposed algorithm 2 (These initialized search agents represent distinct feature sets). Initialization of search agents is an important step in swarm intelligence algorithms thus this work focusses on this aspect specifically. For achieving this two layered model is used in PrGWO-SK.

Proposed PrGWO-SK
Here the first layer exclusively considers accuracy as the basis for classification as it is one of the most important metrics in context of security mechanism. Thus the objective of first layer is solely to provide good solution for initialization of swarm population.
In the second layer the focus is not only on accuracy but also on other performance metrics, especially the length of feature vector finally selected. Thus, the second layer seeks to optimize both the length of feature vector alongwith accuracy and other important measures. To reiterate this can then be conceived as the multi objective optimization problem.
The algorithm then used these initialized search agents coupled with kNN as wrapper technique for generating classification results. kNN is used for its utility to evolve with new datapoints i.e new attack and normal access types as it does not involve explicit function generation. Thus it can give flexible non linear decision boundaries in a simple way. The relative performance of these distinct feature sets is evaluated through use of second layer fitness function. Accordingly four top search agents are assigned as leader wolves. In each subsequent iteration the position of each search agent is updated with the guidance of assigned leader wolves (four best search agents of the previous iteration). At the end of last iteration the feature set corresponding to best search agent is assigned as the optimal feature vector and passed to the real world classifier for detection.
Algorithm 1 gives step by step account of PrGWO-SK. Notations used in the algorithm are specified below. i W (i=1..N) denotes the wolves, FNR stands for False negative rate, TNR for True negative rate; rest of the notations are defined in the algorithm itself. In the first phase/layer data preprocessing is done using techniques of section 3.1. Also the number of wolves participating in the algorithm are specified as parameter. This population of wolves is initialized randomly and algorithm 2 called with following parameters-Initialized population vector, fitness function 1 and wrapper method as SVM. The algorithm 1 stored the returned reduced feature vector. In the second layer/phase the feature vector returned by the first layer is used to initialize the population of wolves. All features of the returned feature vector are mandatorily selected union random selection over rest of the features. Once again the algorithm 2 is called with this set of initialized population, fitness function 2 and wrapper method kNN. The returned feature vector by algorithm 2 is stored as optimal feature vector and given to the real world classifier for classification using only the features present in this optimal feature vector.

Preparing dataset and preprocessing
NSL-KDD: Published in 2009, it is an enhanced version of the CUP99 dataset from KDD. Before the assessment of NSL-KDD, several researchers used the KDD'CUP99 dataset. But KDD'CUP99 has many duplicates and this makes the dataset redundant and biased towards some of the attacks. NSL-KDD has got 41 independent features and one dependent feature. These 41 independent features can broadly be classified as-basic features, content features and traffic features. Regarding dependent feature it is categorised into 39 attack types. Broadly the attack types can be categorised into DoS, U2R, Remote to local (R2L) and Probe types. Additionally one type present is 'normal' to denote normal class. Regarding approximate distribution of various attack and normal types it is as: Normal=53.45%, Probe=9.25%, DoS=36.45%, U2R=0.04% and R2L=0.78%.
DS2OS: Published in 2018, this open-source data set was obtained via Kaggle. In a virtual IoT environment, the distributed smart space orchestration system (DS2OS) is used to create the dataset. The entire virtual architecture, which is a collection of many micro-services in an IoT context. In the DS2OS dataset there are 12 independent features-majority having nonnumerical values. Regarding attack types these can be categorised into 7 types-DoS, Malicious control (MC), Malicious Operations (MO), Probe, Scan, Spy and Wrongsetup (WS).
Regarding approximate distribution they are as: Normal=97.2%, DoS=1.59%, MC=0.25%, MO=0.23%, Probe=0.09%, Scan=0.43%, Spy=0.14% and WS=0.03%. BoTIoT: Published in 2019, the UNSW Canberra Cyber Range Center's practical network configuration was built in order to build the BoT-IoT dataset. A workable substitute for IoT solutions, this data collection is produced utilising the message queuing telemetry transport (MQTT) protocol, which connects machine-to-machine interactions. In total there are 43 independent features while the attack categories present in BoTIoT dataset are 4 in number-DoS, Distributed denial of service (DDoS), Reconnaissance and theft. Regarding approximate distribution of attack and normal types it is as: Normal=0.012%, DDoS=52.5%, DoS=44.9%, Recon=2.4% and theft=0.002%.
As discussed before, all datasets are preprocessed before they can be used in the actual ML algorithm. Below described techniques are used to transform them in a form amenable to classification algorithms: Handling Missing values:Very often the dataset contains few missing values which affect the classification model's performance. In case of NSL-KDD and BoTIoT there are no missing values, while for DS2OS dataset the missing values are replaced with "missing" term.
Feature mapping:A few independent features like protocol type, flag and service are of nominal type which needs to be converted into numerical type. In the literature approaches like one hot encoding have been used. However, one hot encoding makes the dataset much more sparse thereby increasing the computation complexity. Therefore, in the present work ordinal encoding is used for converting nominal into numerical values. For example for protocol_type attribute the original values are converted as: TCP=1, UDP=2, ICMP=3. The same technique is used for other features also.
Feature normalization: Feature normalization or scaling is required to infuse uniformity in the values of features. In absence of feature normalization the higher values tend to dominate the trained model parameters. In the present work, all the independent features are scaled to a uniform level using min-max scaling technique: Here, f is feature which is normalized, max f is a feature's maximum value, min f is its minimum value present in the dataset. After scaling is done the next step is to split entire data set into training and test subdataset. Random selection technique is used for dividing the entire dataset into 80:20 ratio.The algorithm was repeated for 10 runs with different train:test subset using randomization. This is required to avoid chance selection bias in results.

Input:
are the independent features and d D the label in the dataset \  Output: : The set of optimal features Steps: Fitness fit = and Output optimal F , Acc, DR, FPR, FNR, TNR, Precision, F1-score.

Proposed PrGWO
Background: The Gray wolf optimizer algorithm [21] is inspired by the metaheuristic approach which searches through the solution space for optimal solutions. Notionally, here a pack of leader wolves is responsible for guiding the followers' pack towards the prey's location and then hunting it down. The leader pack consists of best three solution vectors and are called as alpha, beta, delta wolves. Though all wolves start from a random solution vector but they move towards optimal solution vector through help of their leader wolves. The update equation for all wolves was given as: Here: For adapting the general algorithm having continuous values to feature selection task with each feature represented by only binary variable ∈ 0,1 ò the above algorithm was modified [5]. In this algorithm the mapping was done from continuous values to the binary values. In the modified scenario, the update equation was written as: Here crossover is defined as : Same equations (6)-(9) are used for other leader wolves-beta, delta and omega with relevant substitutions.
PrGWO: In PrGWO, firstly instead of using only three leaders four active leader wolves have been taken. To each non-leader wolf this provides more variety; availability of increased number of leader position vectors for updating their own position vector.
Secondly, the criteria of relative importance of individual four leaders in influencing updation of positions of all the other non-leader wolves was incorporated. For instance, alpha wolf is the best leader among all leader wolves hence it needs to be given higher weight influence than the other three. Similarly beta, delta and omega wolves need to be given weightage in order of their importance. Unlike proposed in the present paper, the original GWO and bGWO had given equal weight influence to all the wolves disregarding the fact that the alpha wolf is the most important leader in terms of acquiring best position followed by beta, delta and omega wolves respectively. In this proposed algorithm, different impact factor to different wolves viz alpha, beta, delta and omega wolves is proposed.
The mathematical model used in the proposed algorithm is as: ωi=1...4 are binary vectors and W i (iter+1) are the updated position vectors of individual non-leader wolves. In equation no.10 different weights of influence have been assigned to different leader wolves based on their relative importance. For example, if the random number (generated in range [0,1]) is less than 0.5 the updation of non-leader wolf is based on alpha wolf. This weight of influence is reduced respectively for the beta, delta and omega wolves as 0.25, 0.15 and 0.10 respectively. This is contrary to the original bGWO where the weight influence for all the wolves was uniformly distributed as 0.33. Thirdly, for alpha, beta, delta and omega wolves (leaders guiding the other wolves)a different criteria for updating of their own positions is proposed. Each leader wolf updates its position relative to its own, its predessesors and its immediate successor. This was done to ensure that already better position holding wolves do not get deviated under influence of other less efficient successor leader wolves except its immediate successor. Immediate succcessor was required as in absence of it the alpha wolf position would not have updated and exploration of search space would have been restricted. Accordingly, updation of alpha wolf position was done relative to its own and beta wolf's position, updation of beta wolf's position was done with respect to alpha, beta and delta wolves' position, delta wolf position was done with respect to positions of alpha, beta, delta and omega wolves while for omega wolf it was with respect to all leader wolves. For updation of alpha,beta, delta wolves equations 15-17 were used.

W beta iter Crossover
[ ] Omega wolf can also be updated similar to delta wolf using equation 17. From eq 18-20 it can be seen that for updation of alpha wolf's position, ratio of 0.66:0.33 in terms of weight of influence is used between alpha and beta wolves. Similarly for beta wolf's updation ratio used is 0.5:0.3:0.2. In case of the delta wolf's updation, the weights of influence used are 0.5:0.25:0.15:0.10. Fig. 2 depicts the PrGWO in graphical form while algorithmic depiction is presented as Algorithm 2. In this algorithm the initialized N feature vectors corresponding to N wolves are used to create N models. These models are then applied to the test data to classify the data. Based on the classification report and fitness function, fitness value of each wolf's feature vector is calculated. The fitness values are sorted in ascending order and the position/feature vector corresponding to the best fitness value is assigned as alpha position. Similarly the second best is assigned as beta, third best as delta and fourth one as omega. In the subsequent iteration these values(alpha, beta, delta and omega) are used to update the position vectors of other wolves and their own using equations enlisted in the algorithm. Again the fitness value of each wolf's updated feature vector is calculated. Based on these newly sorted fitness values, reassignment of alpha, beta, delta and omega is done. The cycle is repeated till the last iteration. At the end of last iteration the alpha position vector is returned as optimal reduced feature vector.

Proposed fitness function
Fitness function has an important role in determining the optimal number of features for use in classification algorithms like SVM and kNN without compromising on performance metrics. For systematic initialization of the Gray wolves in the second layer, SVM is used as the wrapper method in the first layer. In this first layer, the accuracy measure has been used for getting the reduced feature set. This is done to ensure that the most important measure of efficiency is accounted for exclusively in the first layer; the other measures being accommodated in the second layer later on without sacrificing accuracy.
Thus the fitness function used in first layer is: where 1layer fit is the fitness function of first layer, cor is the correctly predicted instances and tot are the total number of instances. As for the second layer wherein PrGWO along with kNN is being used there are other important metrics like Detection rate, False positive rate and number of features which needs to be incorporated to make the fitness function truly inclusive. However, recognizing the accuracy as the most important measure for ensuring good security here again, a fitness model is proposed which gives higher weightage to the accuracy. The proposed fitness function is as:  where 2layer fit is the fitness function of second layer, Tpos is True-Positive cases, Fneg is false negative, Fpos is false-positive and Tneg is true-negative. The last term Nfeat is the number of features. Multiplying by b factor is necessary to scale this term to the level of other terms in the expression. Lastly, in the model the goal of the algorithm is to minimize the fitness function. The least valued fitness function will be assigned as alpha wolf and its space position will be the best feature set.

Performance metrics used
Present work was tested using various quality parameters. Equation (26) True positive signifies that the data point was predicted as malicious and actually it was malicious. False Positive signifies that the data point was predicted as malicious, however actually it was not. True negative signifies that the data point was predicted as not malicious and actually it was not malicious. False negative signifies that the data point was predicted as not malicious, however it was actually malicious.
Detection Rate (DR) is also known as recall and TPR. It identifies the number of times the classifier predicted a 'positive' result over the number of times positive results were to be predicted.
Precision (PR) determines the measure of correctness of results got. Thus it uses true positive over true positive and false positive combined.
F1-score is used when class distribution is not balanced. It calculates the weighted average of detection rate and precision.

Experimental results and discussion:
Experiments were performed using Matlab 2021 and Python programming language on anaconda distribution over the Intel Core i7-10th generation machine.

Parameters used:
Various parameters were tested experimentally for finding the best combination of parameters. Table 2 illustrates these parameters. The algorithm was run 10 times, Number of wolves were taken to be 5,7 and 10, No of iterations within each run was 25, SVM was used with RBF, two layers were used with different fitness functions and values of b were taken as 0.001 and 0.0001. The reason for taking different values of b is to give different weightage to number of features in the fitness function. In table 3 variation in results due to size of population is depicted. Notationwise NoF in the table stands for No. of features. Here 3 values of N were taken: 5,7 and 10 wolves. The results were compared not only in regard to DR, FPR, F1-score etc but also accuracy and length of feature set finally selected by the algorithm. The results show that when the size of population was 7, the length of feature set was least and accuracy was maximum. Only two parameters-FPR and TNR were found to be better in case of N=10. Similarly the 'k' value in the kNN was experimented for k=1,3 and 5. Table 4 shows the variation in results with change in k value. For k=1, the experiments yielded best results in terms of accuracy i.e 99.60% alongwith other metrics However, the best length of feature vector i.e 8 was found for k=3 though the accuracy and other metrics were not better than k=1.
Hence based on experimental results it is appropriate to remark that a cost benefit analysis needs to be done in terms of accuracy and number of features while choosing value of k for real world scenarios.

Performance of PrGWO-SK for different classes:
The performance of present approach was tested on different classes of dataset. Table 5 shows the measures for the Detection rate, False positive rate, F1-score, Precision etc for all the classes. The experimental results show DR to be highest for DoS attacks being 99.89% while for R2L and U2R the DR was least among all. However, the False positive rate was found to be the best for these two classes viz. 0.036% and 0.043% among all. Considering the precision, the two classes -Normal and DoS performs to the extent of 99.58% and 99.89%. As regards DS2OS dataset table 6 depicts the classification results for various classes present in the dataset. Most of the performance metrics shows best results for the spying class. In case of FPR and TNR both Wrongsetup and Spying classes shows the results as 0 and 100% respectively, i.e no false alarm is generated if the classes are neither Wrongsetup or Spying. However, the algorithm performs worst for the Scan class where the Detection rate was measured as 98.73% in contrast to the other classes where the DR was measured more than 99%. Similar to DS2OS and NSl-KDD the classwise performance of the BoTIoT dataset was also evaluated. From table 7 it can be seen that for DDoS the algorithm performs best in terms of DR at 99.98%, FPR at 0.0001%, precision at 99.92% and F1-score at 99.95%. The class DoS shows results very close to the DDoS class. The least Detection rate was recorded for class 'Theft' at 96.2% which shows the difficuly in correctly predicting this class of attack. Table 8 depicts the various performance measures iterationwise for b=0.0001. For this value of b the algorithm focusses more on getting optimal accuracy even at the cost of increasing the length of feature vector.

Convergence of algorithm:
Similarly table 9 depicts the experimental results for b=0.001. Here the focus shifts more towards optimizing the length of feature vector.

Consistency of experimental results through box plots
The values shown in the tables in this paper are the results of the best run during 10 runs of the algorithm. However, an algorithm's robustness and efficiency can be measured by its consistency over several runs. Fig 6 depicts the median and the interquartile range of the different runs of experimental results graphically. For NSL-KDD dataset  For measuring effectivess of any algorithm it is important to compare it with other existing algorithms in terms of common performance metrics. Table 10, 11 and 12 shows the classwise performance of the proposed algorithm compared to other algorithms. Table 10 shows that the proposed approach outperforms for 'Normal' and 'DoS' classes giving 99.76% and 99.89% while for other classes different algorithms performs better. Table 11 shows that the proposed approach performs best for the DoS and MC attacks. For DoS it gives DR as 99.45% which is much better than the second best of 66%. For MC it gives 99.6%. Similarly, table 12 shows that the proposed approach performs best in terms of detecting "theft" attack type giving DR as 96.2% while the second best was 93%.

Comparison in terms of Accuracy and Length of feature set of different algorithms with proposed algorithm:
Tables 13, 14 and 15 shows the comparison of the proposed approach with others' work in terms of accuracy, number of features and the ratio. Table 13 shows that in terms of the accuracy two best performing algorithms are NNIA+GHSOM-pr (99.47%) and DT-EnSVM2 (99.41%). However, proposed approach has been able to outperform these efficient algorithms measuring 99.60%. In terms of length of feature vector the best performing algorithm is RF+PSO giving output feature vector of length 10. In this case also the proposed work outperforms RF+PSO giving vector of length 8. However, in this case the accuracy reduces to 99.36%. Table 14 shows that most of the algorithms gave accuracy close to 99.43% with length of feature vector as low as 6. Though PrGWO-SK was not able to reduce the length of feature vector further below 6 but the accuracy showed substantial improvement measuring 99.71%.
Similarly table 15 shows the results for BoTIoT dataset. Here the best accuracy was achieved by Kumar et al. as 99.99% with feature length as 10. Considering the optimal feature length the best was achieved as 8 by Soe et al. However, the accuracy measured in this case was only 99.1%. Proposed approach was able to measure accuracy as 99.97% with length of feature vector as 9. Thus the proposed approach was able to outperform these algorithms -first algorithm in terms of length without compromising much on accuracy and second algorithm in terms of accuracy by a substantial margin.

Conclusions and limitations
This paper aimed to find reduced optimal feature vector of a traffic dataset to reduce the computational complexity by removing the data dimensionality curse. A two layered structure was used -first layer employed the SVM technique while second layer used the kNN as wrapper technique. For searching the solution space swarm intelligence based modified form of GWO called PrGWO was proposed alongwith two different fitness functions. The effectiveness of the experimental results were established through use of metrics -Accuracy, Detection rate, FPR, TPR, Precision. Through extensive experiments it was found that the proposed methodology and algorithm performed better for several individual classes -Normal, DoS for NSl-KDD, DoS and MC for DS2OS datasets and Theft for BoTIoT dataset. In terms of overall accuracy, the PrGWO-SK performed better for NSL-KDD at the same time length of feature vector was also reduced. For DS2OS, though the length of feature vector could not be reduced still notable increase in accuracy was witnessed. In case of BoTIoT the combination of accuracy and length of feature vector was effectively optimal. Althougth the present work performs better as mentioned above, it has shown limitations in detecting classes like U2R , R2L where the DR was not found to be better than some of the existing algorithms. Hence as a future work classification of classes like U2R and R2L can be pursued futher. Moreover, the authors intend to take up parameter tuning as further research objective in future.