A New Design of Custom Optimized Cnn-Lstm Assists to Detect Network Anomaly Using Categorical Data

: - For traditional intrusion detection model, the system effectiveness is fully based on training dataset and feature selection. During feature selection, it needs more labour charge and trusted mainly on expert’s knowledge. Moreover, the training dataset contains more imbalanced data which in terms model tends to be biased. Here, an automatic approach is introduced to correct deficiency in the system. In this paper, the author proposes novel network anomaly detection (NID) build using categorical data. A model has to be designed with modified form of deep neural network primarily utilized for detecting anomaly within the network. Custom CNN-LSTM with Harris Hawks Optimization (named as custom optimized CNN-LSTM) is designed as a new classifier majorly used to detect the anomaly from word cloud to distinguish the data with effective performance. The experimental result shows that the proposed method achieves a promising output for network anomaly detection.


I. INTRODUCTION
Nowadays, an increasing availability of network technologies has been greatly influenced by spammers, cyberattacks or intruders etc.This major thread becomes an unavoidable challenge for network security.Therefore, a defence system of network anomaly detection has become a great deal to solve this issue [1].Anomaly is an undesirable behaviour insists to find the abnormalities within the network.For multipurpose applications, an anomaly patterns are often called by different names such as faults, noise, outliers, defects, damage, peculiarities, errors etc., [2].Observation of anomalies detection is being considered as great advance in many different aspects.This detection process is extensively considered as an interesting active research emergence in the field of biosurveillance, industrial process control, data entry errors, unauthorized access, fraud detection, fault diagnosis and so on [3,4].A major goal of network intrusion detection system (NIDS) is to distinguish the abnormal activities from the normal behaviour of the network.It is more flexible to detect a possible outcome of new emerging threads and also highly sensitive to fall in false alarm rate [5].Although, the other complication arises in the anomaly detection technique is that they exhibit failure in single point, speed scalability, detection rate etc.. To overcome these issues, network intrusion detection system using machine learning methodologies are introduced to enhance the system performance.Generally, three categories avail to fulfil the machine learning techniques are supervised, weaklysupervised and unsupervised.These are the predominant learning approaches helps to design several IDS system in various literatures [6][7][8][9][10].A traditional model of machine learning such as Random Forest, Naïve Bayes, Support Vector Machine etc., has been well-adopted to design anomaly detection but they limited its performance evaluation while using unbalanced datasets.In modern era, network security problem is being addressed and provide solution by using deep neural network.Multi-layer containing deep structure of convolutional neural network (CNN), deep belief network (DBN), Auto-encoders and Recurrent Neural Network (LSTM) are relatively provide promising solution in multiple disciplines [11].Therefore, more robust deep learning techniques are required to accurately classify the network system.For detecting network anomaly, a deep architectural model of convolutional neural network (CNN) based metaheuristic optimization is pre-trained using categorical data which are taken into practice.CNN is a well-established classifier technique withstands to conduct their experimentation in both balanced and unbalanced dataset.In data analysis applications, CNN outperform to distinct the classified data as normal and abnormal with huge training samples.Even though considering these benefits, CNN suffer from the problem of time-consuming procedure especially for patch-wise feature extraction [12].An extended version of RNN is a longshort term memory (LSTM) that works as a time series dependencies adopted to convey their feature data into the CNN model.In this research work,a custom optimized CNN-LSTM with Harris Hawks optimization technique is employed to design the intrusion detection model.This model fit to provides effective result by discriminating the data with properly extracted features.Finally, the performance measure is employed to analyse the categorical data against several state-of-art techniques.
II. RELATED WORKS Some survey papers have been reviewed the machine learning methodologies to identify the anomaly (outlier).These article are discussed as follows, Laskov et al. [13] identifies the unknown attacks by comparatively evaluate the accuracy performance of both supervised and unsupervised machine learning approaches.A survey took for network intrusion detection using supervised and unsupervised techniques by Ghorbani et al. [14].Mahbod [15] and D. K. Bhattacharyya et al. [16] presented an anomaly based IDS which is probably compare their performance with several techniques such as SVM, Naïve Bayes etc., using NSLKDD dataset.Sandhya Peddabachigari et al. [17] presented a hybrid IDS with achieving reduced computational complexity and improved detection performance.It uses a hybrid intelligent system of combining decision tree with SVM (DT-SVM) which attains better generalization performance.As evident to shows that the proposed technique in this paper could be performed well for identifying R2L, Probe rather than the DOS and U2R attacks.Yasser Yasami et al. [18] proposed an unsupervised classifier to distinct normal and anomaly activities using ID3 DT and K-means clustering techniques.Training samples are partitioned into cluster and built each cluster by individual decision tree thereby enhancing the classification accuracy but this approach is restricted to specific database.Dewan et al presented a new learning algorithm based on Naïve Bayes and ID3 algorithm for NIDS [19].A benchmark dataset of KDD99 with 5 classes are successfully investigated to improve the false positive rate using this technique.To most of the user attacks, it needs better improvement for false alarm rate.Another anomaly based IDS with hybrid approach is the enhanced data distribution presented by Roshan Chitrakar et al. [20].For better classification, the technique established two combinations namely K-Medoids and Naïve Bayes.In this study, the aforementioned technique is well performed to improve the detection capabilities, false positive rate etc., than the K-mean clustering.The reason is that K-mean algorithm is extremely sensitive to anomaly since they loss huge value while distributing the data into each group.Instead of mean value, the centroid is considered to construct the K-Medoids which is highly dependable to group the similar behaviour as a cluster after then the clustered data is categorized into different classes of attack using Naïve Bayes classifier.The shortcoming occur in the data distribution model is that they are unable to predict the data in different environments.Moreover, it requires more time when the data size grows exponentially.In order to compensate the time complexity issues, Naïve Bayes classifier is replaced by adding support vector machine with the K-medoids clustering technique and it is proposed by Chitrakar et al. [21].But this technique is applicable to reduce the time complexity for small-size data distribution not for large dataset.Dino et al. [22] presented an outlier detection scheme of semi-supervised approach using categorical data.Distance learning of categorical attributes (DILCA) is a new approach performed to achieve a reasonable result.For unique framework, both ordinal and nominal data attributes are used to extent the research work with adding new active learning approach.Ashima et al. [23] designed a host based IDS incorporated with stacked CNN with Gated Recurrent Unit (GRU).It is highly feasible to minimize the training time and improves the intrusion detection system performance.The research work is organized as follows, section III discuss the brief description of working methodology of our proposed model.Section IV and V explains the simulation work and conclusion part.

III. METHODOLOGY
In order to design an accurate IDS system, the benchmark dataset of NSL-KDD is used to analyse the data.To build a model, the research work comprise of four stages namely pre-processing the data, feature extraction, classification by custom optimized CNN-LSTM and model evaluation.Anomaly detection system is an essential tool utilized to detect the abnormalities present within the network database.Since the database has a majority of normal data and very low abnormal data [24].It leads an unsafe and data integrity problem in the network, so NIDS technique is necessary to compensate the issues.The input dataset comprises of both numerical and textual (categorical) content.Initially, the intrusion is detected in the cloud environment by loaded the test and trained data into the system.Before providing the input to the custom optimized CNN-LSTM, feature analysis is done to extract features and learn about the features.Some of the infrequent data in the dataset are removed as unused data and is sorted into dataset of reasonable size.Therefore, feature selection is an essential factor which selectively extracts most useful information by improving the learning rate accuracy and computational complexity.From the partitioned table, the text data is extracted for each of the labels by creating word cloud.To work with this extracted text data, pre-processing is done where the lowercase texts are converted and the punctuations are erased.Alternatively the words are encoding by building the document as padded and truncated to make them of the same length.Thus the concerned document is finally converted into sequences of numeric indices.At last, the converted data of important features are forward to the next stage.Algorithm: Algorithm for proposed IDS model Input : intrusion dataset Output : Classification of intrusion 1.
Partition the dataset for training and testing 4.
Extract the text data into a table 5.
Create wordcloud chart from a table 6.
Pre-processing the training and testing data 7.
Create a word embedding 8.
Convert the documents into image indices 9.
Set an optimizer threshold value as 1 14.
Set Initial learn rate as 1 15.
Test the custom optimized CNN-LSTM network 16.
Convert the text documents to sequences 17.
Classify the documents 18. End.

A. Custom optimized CNN-LSTM
A hybrid approach of combining Convolutional neural network with Long Short Term Memory (LSTM) that should be optimized by using Harris hawks algorithm.This new approach is proposed for NIDS and it is named as custom optimized CNN-LSTM.
The paper starts to describe the general view of optimization technique, Convolutional neural network and longshort term memory followed by the proposed description.The proposed block diagram is shown in Figure 1. , Z_mean specifies the mean location of every hawk, n and j are size and position of each hawk respectively.
Where ()    ()represents the present position of hawks and rabbit,( + 1)is the next iteration of each hawk's position, r,q are random variables specified in [0, 1] interval and lb and ub represents lower and upper bonds variables.At transforming from exploration to exploitation phase, it represents the external strength of the rabbit (S) and it is attained using equation ( 2) ) Where, S0represents the physical strength of rabbit changes randomly over (-1, 1) interval.If the prey is actually energetic, then S0 maximize its value from 0 to 1 or else minimize from 0 to -1.At exploitation phase, the hawks attempt to attack their prey in four possible strategies.Here r is the chance of rabbit when it successfully escape before pounce or not.During chasing, the prey tries to escape from danger which in terms it significantly reduces its energy.At that moment, the hawks effortlessly capture the prey using the besiege process.The four possible pouncing process in exploitation phase is hard besiege, soft besiege, rapid dives while hard and soft besiege [31].In this concern, the soft besiege occur when || ≥ 0.5and hard besiege occur when || < 0.5.The following condition from r and |F| reveals the four strategies, whether the prey is escape from hawks or not.Soft besiege & hard besiege: the condition is  ≥ 0.5  || ≥ 0.5& ≥ 0.5  || < 0.5 During this attempt, the prey tries to escape but can't.The hawk softly and hardly encircle the prey and make a surprise pounce.The mathematical expression for both besiege are modeled as,

𝑍(𝑖𝑡𝑒𝑟
where, M represents the random value of 1xd size, v indicates the soft besiege by hawks evaluating the next move, u represents irregular movement of rabbit based on LF approach.Hard besiege with advanced rapid dives: When  < 0.5  || < 0.5, the prey cannot escape from hawks since it doesn't have enough energy and also hard besiege make a shot of surface pounce on it.This strategy is similar to soft besiege with advanced rapid dives but the only difference is that hawk tries to minimize the average distance.The following equation is formulated below,

C. Convolutional Neural Network
It is a neural architecture comprise of multi-hidden layers to extract the features from input to output layer.For deep image extraction, we have to use more hidden layer which in terms an improved result is generated at the end.Following stages arise within the network layer are convolutional layer, activation layer, pooling layer followed by fully-connected layer.Convolutional layer uses certain parameters such as kernel, padding, stride etc. to create a feature map.The corresponding output is generated based on convolving the input with the kernel.The next stage of ReLU activation layer is to improve the nonlinearity in feature map.After then, the pooling layer downsample the input dimensionality to minimize the parameters.Finally, FC layer collect each feature maps from previous layer to effectively classify the data

D. Long-short term memory
It is an extended version of RNN that works as a chain like structure with repeating module.LSTM has a capability to learn long term dependencies using feedback mechanism.A major purpose of this architecture is to forget or add information to the cell state.With regard to this information, it has both short and long term memory.To construct a LSTM layer, it consists of one memory state and three main gates of input, forget and output as shown in Figure 2. In order to sustain long terms dependency, the gate mechanism in LSTM network can decide which information to hold or leave [27].
Where,   ,      are forget gate, input gate and output gate layer respectively,  represents the sigmoid function, w and b are the weight and bias representing the gate,     −1 specifies the hidden node at    − 1,   represents the memory state in which they hold new information as  ̂The output of hidden and memory state are turns as an input to the next LSTM layer.

E. Proposed custom optimized CNN-LSTM
A layer description of custom optimized CNN-LSTM model should be clearly explained in detail.It is a deep learning network composed of sequence input layer, convolution2dlayer, batch normalization, relu layer, 2 layer of lstm and dropout layer, fully connected layer, softmax and classification layer.It is a sequential layer operation carried out to process the previous layer output to the next layer input.Firstly, the extracted text document is a sequence index that act as an input to the convolutional layer.Each layer contains several neurons that constitute in the form of convolving the input with its filter size.This convolution process moves over the region to create several feature maps.For each region, the convolution layer size works with insist of parameters such as filter, padding, stride etc.Each region subset has been adjusting its parameters vertically and horizontally over the network to generate number of feature maps.To improve the convergence speed, the batch normalization is utilized to normalize each training samples of batch data and is place in between the convolutional and relu activation function.
In optimized layer, the Harris hawks optimization is deployed as an activation function by even replacing relu.An optimization technique is defined within the predict function which in terms a proper functioning is activated via each layer.The predict function is given by, Where,  1 …   is the m th neuron output from the previous layer which propagates to the next layer of n th neuron  1 …   .Dropout is placed in between 2 LSTM layer followed by connecting another dropout layer.With its special memory state, LSTM has been able to hold long temporal data for better extraction process.It uses 125 hidden units for LSTM1, 100 hidden units for LSTM2 and 0.2% for dropout layer.By adding dropout next to LSTM layer may have a possibility to memorizing something and also reduce over fitting within the network.Finally, the output layer is trained by connecting each neuron output to the single flatten-wise layer called fully connected layer.It adjusted its weight and bias value to train the categorical samples and consequently, softmax layer of activation function assign its real vector in between 0 and 1, so that they easily interpret its value as a probability function.
The formula for soft-max is given by, Where, k represents each classes within the network,   specifies the i th output vector of dense layer.Finally, the classification layer mutually gathers data from softmax to establish the resultant output.
During training, the system train the network effectively to computes the validation accuracy and loss value at regular time interval so that the performance is evaluated properly.

IV. EXPERIMENTAL RESULT AND DISCUSSIONS
Network anomaly detection is classified using custom optimized CNN-LSTM model.In this experimentation, NSL-KDD dataset of categorical text document is used to distinct the normal and abnormal data.Due to this context, the proposed technique started to train the document of sequence indices so that the accuracy and loss functions are validated.For training process, the categorical data is partitioned into training, testing and validation data and also its protocol type, number of attacks are tabulated in   When training the samples, the proposed technique improves its performance level of accuracy at every iteration whereas loss factor gradually reduces.As compared to other state-of-art techniques, the proposed approach obtains greater performance.The training progress of NSL-KDD dataset using custom optimized CNN-LSTM is depicted in Figure 6.

Figure 6. Training progress of proposed technique
Validation loss is another factor which shows 3.2982% at 1st iteration, 2.2795% at 50th iteration, 0.4176% at 100th iteration and 0.4639% at 150th iteration for the proposed technique.Comparatively analysing the loss function, the proposed observed an improved result than the existing techniques of CNN [28], LSTM [29] and CNN+LSTM [30].The validation accuracy, mini-batch loss and validation loss of proposed and existing techniques are comparatively simulated and plotted its resultant value in table 2. Furthermore, the performance is evaluated for proposed and exiting techniques that should be tells the effectiveness of the system.It is measured by means of using parameters such as accuracy, sensitivity, precision, specificity and F-score.Precision is more evident to correctly predict normal data among total predicted document and it is expressed in eqn (16).Sensitivity or recall is formulated as a ratio of correctly predicted normal data to total normal data within the dataset as in eqn (17).Specificity actually corresponds to correctly predict the abnormal data among the abnormal categories and is formulated in eqn (18).Accuracy is defined as a ratio of correctly predict both normal and abnormal data among the total data as in eqn (15).F-score is defined as a ratio of weighing both recall and precision into single measure and is expressed in eqn (19).
Figure 7, shows that the performance measure for proposed and existing techniques with reasonable values.When comparing the result, the proposed technique provides better performance than the other approaches.Precision is assessed for the overall techniques and it shows 0.6201 for LSTM, 0.4522 for CNN, 0.0581 for CNN+LSTM and 0.6202 for proposed method.Accuracy is evaluated for the proposed technique and it improves its effectiveness by 36%, 40.64% and 35.89% than the LSTM, CNN and CNN+LSTM respectively.

Fig. 3 .
Fig. 3. Data distribution from NSL-KDD datasetTo detect intrusion from cloud environment, the NSL-KDD data contains several attacks with maximum of normal data.Various attacks are positioned from the categorical dataset as classified events and are depicted in Figure3.Before initiate the process, the dataset is partitioned into set of training and testing samples.Then frequent data of important features are only extracted and is stored in word cloud.To visualize the text data by word cloud and the chart is depicted in Figure4.It uses protocol types such as tcp, udp and icmp standards to hold the data within the cloud.Additionally, the service provided for the corresponding protocol are also created in word cloud as shown in Figure5.Pre-process the text is the next phase which erase the unwanted distortion in the document.

Fig. 7 .
Fig. 7. Performance measure using NSL-KDD dataset V. CONCLUSIONThis study introduced a new deep network of custom optimized CNN-LSTM effectively used for detecting anomaly within the network.It is designed in the form of connecting LSTM and Harris hawks optimization within the convolutional neural network, which is capable of providing learned temporal parameters and proper functioning made to improve the system performance.In order to enhance the generalization capability of the proposed system, NSL-KDD dataset is used for investigation.From the simulated result, the proposed model of custom CNN-LSTM with Harris hawks optimization classified the categorical data as a validation accuracy of 98.24%.The system effectiveness is proved by measuring the performance metrics and it shows that the proposed technique improves its accuracy by 36%, 40.64% and 35.89% than the existing techniques of LSTM, CNN and CNN+LSTM respectively.

Table 1 Table 1 .
NSL-KDD dataset defined for analysis purpose

Table 2 .
Evaluating values of accuracy and error rate for different techniques