Network security situation assessment based on dual attention mechanism and HHO-ResNeXt

The traditional convolutional neural network (CNN) has a limited receptive field and cannot accurately identify the importance of each channel, making it difficult to solve increasingly complex network security problems. To solve these problems, this paper combines ResNeXt with the Efficient Channel Attention (ECA) module and the Contextual Transformer (COT) block to construct a model to assess network conditions. The optimal hyperparameters of the model are selected by the Harris Hawks Optimization (HHO) algorithm. The model can accurately obtain the importance of each channel to assign weights to each channel while making full use of the rich contexts among neighbour keys, effectively enhancing the convolutional neural network. Furthermore, this paper calculates the network security situation value (NSSV) of the adopted datasets based on attack impact. Lastly, experiments on two cybersecurity datasets show that the comprehensive performance of the model on the three indicators of accuracy, precision and F-scores, as well as network security situation assessment, are superior to other models.


Introduction
With the rapid development of technologies such as artificial intelligence, cloud computing, the Internet of Things (IoT) and 5G, network and information technology are more and more important to the economy, society, and national defense.The increasingly complex network environment requires more effective cybersecurity solutions to protect systems and networks against cyber threats.Against this background, network security situation awareness (NSSA) technology came into being, which can solve network threats more proactively, quickly, and accurately.NSSA can integrate the detection of security events by various network components and perceive network security status and risks in real-time.It is a new way to solve problems in the field of network security.
The new notion of situation awareness (SA) was put forward by Endsley (1988).The concept of SA was first applied to the field of network security by Bass and Gruber (1999), and the notion of SA of intrusion detection framework after multi-sensor integration was proposed by Bass (2000).Because Bass put forward the notion of NSSA, more and more scholars have carried out further research on NSSA.Gong et al. (2017) discussed the mission scope of NSSA, redefined the concept of NSSA, and expounded the status of research and some existing problems of NSSA.Wu and Yang (2022) summarised and sorted out the meaning of NSSA.Jiang et al. (2022) systematically reviewed and analysed the levels of cyber situation awareness supported by visualisation, while examining the maturity of visualisation, challenges, and practices related to cyber situation awareness visualisation in preparation for a comprehensive analysis of the current state of cyber situation awareness in an organisational context.Based on the results that highlight the important concerns in cyber situation awareness visualisations, a list of future research directions is recommended.Nazir and Han (2022) introduce the origin, concept, aim and characteristics of NSSA, discuss the current application research of NSSA in the fields of security, transmission, survivability, and system assessment, and outline the future development direction of NSSA.Based on the division of multiple conceptual models, NSSA mainly has three core functions, namely, network security situation element extraction, network security situation assessment and network security situation prediction.
CNNs have excellent learning ability and adaptability, so many fields adopt CNN to solve difficult problems.In recent years, the damage caused by cyber-attacks is getting more and more serious, and the attack methods have become more and more secretive.It is very meaningful to use CNN to solve cybersecurity risks.However, due to the limited receptive field and the inability to accurately identify the importance of each channel, the traditional CNN cannot quickly and accurately find out increasingly stealthy cyber-attacks.Therefore, this paper combines ResNext (Xie et al., 2017) with the CoT block (Li et al., 2022) and the ECA module (Wang et al., 2020) to construct a model and use HHO (Heidari et al., 2019) to select the hyperparameters, which can accurately identify the importance of each channel and make full use of the rich contexts among neighbour keys.This model enables CNNs to better assess network conditions.Since our model can effectively protect various networks, our method can also be applied to fields such as artificial intelligence, cloud computing, the IoT, and 5G, where there are many potential network vulnerabilities and threats due to rapid development.
This paper is derived and expanded from our conference paper under the approval of editors (Zhao et al., 2022).Compared with the conference papers, this paper adds experiments on the NSL-KDD dataset and since the selected dataset has no NSSV, this paper adopts a method for calculating NSSV based on attack impact to calculate the NSSV of the two datasets.Experiments show that our model has the best fit with the actual situation value.
The contributions of this paper are summarised as follows: (1) This paper discusses and analyses the existing methods in the field of NSSA and has a full understanding of the research status of NSSA.Compared with existing methods, this paper innovatively combines ResNeXt with the dual attention mechanism to effectively identify the importance of each channel and expand the receptive field, enhancing the role of CNNs in the field of NSSA.
(2) Because the datasets used in this paper have no NSSV, an NSSV calculation method based on attack impact is adopted to calculate the NSSV of the datasets.(3) Experiments on the NSL-KDD dataset and UNSW-NB15 dataset show that the comprehensive performance of the model on network security situation assessment is superior to other models.

Related work
Network security situation assessment can accurately assess the network status, prevent network security risks, and ensure the normal operation of the network.Dempster-Shafer theory was used by Zheng (2020) to improve the stability of assessing network conditions.The advantage of their model is that less prior knowledge is required during training, however, more training time is consumed due to the inference process.As the security concerns raised by the increasing reliance on the Internet of Things (IoT), Akwetey and Danquah (2022) assessed Cyber Situational Awareness (CSA) elements to predict cyberattacks in the power generation sector.This method has great significance in predicting network attacks.
In recent years, more and more studies have applied machine learning and deep learning to network security to solve the problems encountered in NSSA.Cao et al. (2021) applied ensemble learning to cybersecurity and used Random Forest (RF) algorithm to identify attacks in the network.However, the training time and space of the model will increase with the increasing number of decision trees (DT).Ke et al. (2021) proposed the I-ABC-SVM algorithm to predict the network condition and improve the model performance.They optimised the parameters of the support vector machine (SVM) through the improved artificial bee colony algorithm (I-ABC).The advantages of Ke's method are fast training and a small memory footprint, but it also has the disadvantage of relatively low prediction accuracy.To ensure network security, Yang et al. (2021) applied adversarial deep learning to NSSA.This model is more accurate for identifying network attacks and can evaluate the network conditions more comprehensively and flexibly.To improve the network environment, Samuel (2021) applied the DT-ANN model to network security.The DT can extract features, and the method extracts useful features through this function and inputs these features into ANN.Zhang, Xie, et al. (2019) used a deep convolutional neural network (DCNN) to enhance the accuracy of identifying attack behaviours in the network.However, the traditional CNN may lose some information in the learning process due to its limited receptive field, so the improvement in accuracy is limited.To improve the network environment, Yuan (2021) applied the PSO-RBF model to the field of network security.The Radial Basis Function (RBF) neural network is optimised by particle swarm optimisation (PSO) in the field of network security.Zhang, Zhang, et al. (2019) applied an improved CNN with depthwise separable convolutions to the NSSA domain.The depthwise separable convolutions can convert traditional convolutions into several smaller convolutions to reduce the number of parameters.However, Since the model has only a few convolutional layers, it is difficult to obtain comprehensive feature information.A model combining the features of a bidirectional gate recurrent unit (BiGRU), the attention mechanism, and a parallel feature extraction network (PFEN) is proposed by Yang et al. (2022).The model can extract key data from different network attack behaviours and can effectively and comprehensively evaluate network conditions.Different from the above methods, after analysing the existing methods in the field of NSSA, this paper combines ResNeXt with the dual attention mechanism to effectively identify the importance of each channel and make full use of the rich contexts among neighbour keys, which optimises CNN and strengthens the ability of CNN to protect the network.

The ResNeXt block
ResNeXt is a variant of ResNet (He et al., 2016), so it can have many convolutional layers without worrying about degradation.And since the topological structure of each branch is the same, the number of hyperparameters is reduced while maintaining high accuracy.The output of the ResNeXt block is: where L represents the topology structure, x indicates the input, y refers to the output, and G denotes the number of branches.Equivalent building blocks of ResNeXt is shown in Figure 1.

Efficient channel attention (ECA) module
The SE module (Hu et al., 2018) can calculate the weight of each feature channel, knowing that the importance of each channel is different.However, the fully connected (FC) layer part of the SE module will make the model more complex, and the dimensionality reduction and dimensionality raising in the SE module may have negative effects.Therefore, the ECA module is proposed.This module reduces the number of parameters and makes the model less complex by avoiding dimensionality reduction and raising, which effectively increases the value of the channel attention mechanism.The specific structure of the ECA module is presented in Figure 2.
The core part of the ECA module can be implemented by a fast one-dimensional (1D) convolution of size k.The convolution kernel size k denotes the number of neighbours participating in the attention prediction of this channel.The ECA module can adaptively determine the suitable k without manual resizing, where k is related to the channel dimension.The specific formula for calculating k is as follows: where C represents channel dimension and |t| odd indicates the nearest odd number of t. b and γ are usually set to 1 and 2, respectively.

Contextual transformer (CoT) block
Since traditional CNNs can only model local information, they are lacking in long-distance modelling and perception.The transformer has great advantages in long-distance modelling.The Self-Attention structure in the original Transformer just calculates the attention matrix based on the interaction of query and key, and it cannot make full use of the rich contexts among neighbour keys.To alleviate this problem, the CoT block was proposed.It fully explores the contextual information among neighbour keys to improve self-attention learning in an efficient way, which improves the expressive ability of output features.The CoT block is presented in Figure 3.In Figure 3, the size of input feature maps is H × W × C. The queries, keys and values are set to Q = X, K = X and V = XW v , respectively.First, CoT module uses k × k group convolution over all the neighbour keys within k × k grid spatially to obtain the contextualised keys K 1 ∈ R H×W×C .After that, the contextualised keys K 1 and queries Q is concatenated to obtain the attention matrix A by two consecutive convolutions with the convolution kernel size of 3 × 3: ( 3 ) Next, the attended feature map K 2 (also known as the dynamic context) is obtained by aggregating the contextualised attention matrix A : where ⊗ denotes the local matrix multiplication operation that measures the pairwise relations between each query and the corresponding keys within the local k × k grid in space.
Finally, the static context K 1 and the dynamic context K 2 are fused by the CoT block to output.

Harris hawks optimisation (HHO)
HHO originates from the group cooperative behaviour of the Harris' hawks when preying.
It is an excellent swarm intelligence algorithm with the advantages of requiring few control parameters and excellent global search ability.

Exploration phase
At this stage, all individuals of the Harris' hawk are in a wait state, and it searches for prey in random places according to two strategies: a: When q < 0.5, the hawk's perching location is related to the position of the prey and other members of the population.b: When q ≥ 0.5, the hawks will perch on random locations within the population's range.
These two strategies can be formulated as: where ub and lb indicate the upper and lower bounds of the search space, X rand (t) represents the position of random individuals, X m (t) indicates the average location of all members in the population, X rabbit (t) represents the position of the prey, t denotes the number of iterations, X(t + 1) represent the location of Harris' hawks in the next iteration and r 1 , r 2 , r 3 , q and r 4 are random numbers inside (0,1).

Transition from exploration to exploitation
According to the escape energy of the rabbit, the Harris' Hawk can switch between different attack modes.When the rabbit escapes, the escape energy E will be reduced as: where T denotes the maximum of iterations, E 0 indicates the initial value of the escaping energy of the rabbit, and E represent the escape energy of the rabbit.

Exploitation phase
Assuming that r is the escape probability of the rabbit, according to the escape behaviour of the rabbit and the capture strategy of the Harris' hawks, r and E will have different values and cause the hawks to adopt different attack strategies.

Soft besiege
The soft besiege is selected by Harris' hawks when r ≥ 0.5 and |E| ≥ 0.5, and the attack behaviour can be expressed as: Among them, X(t) indicates the distance between the prey's position and the hawk's current position in iteration t, r 5 is a random number within (0,1) and J = 2 * (1 − r 5 ) represents the escape distance of the rabbit throughout the escaping procedure.

Hard besiege
Harris' hawks select the hard besiege when r ≥ 0.5 and |E| < 0.5.In this case, the calculation of the current position of the hawk is as follow:

Soft besiege with progressive rapid dives
The soft besiege with progressive rapid dives is selected by Harris' hawks when r < 0.5 and |E| ≥ 0.5, and the attack strategy can be expressed as follow: If this movement brings worse results after updating the location, execute another attack strategy: where S is a random vector of size 1 × D and D represents the dimension of problem.LF indicates the levy flight function.The LF can be expressed as: Among them, β is usually set to 1.5 and u, v are random numbers within (0,1).Hence, the final attack strategy is as follow:

Hard besiege with progressive rapid dives
The hard besiege with progressive rapid dives is selected to capture the prey when r < 0.5 and |E| ≥ 0.5, and the hawks' subsequent attack behaviour is determined by: where Y and Z are obtained by the following formulas ( 16) and ( 17).

The ECA-ResNeXt block
Embedding the ECA module into the ResNeXt block can form the ECA-ResNeXt block, which is an important part of our model.The details of the ECA-ResNeXt block are shown in Figure 4.The input of the block is X, and the output is X.

The CoTNeXt block
The COT block can replace the 3 × 3 convolution in the ResNeXt block to form the CoTNeXt block, which is one of the advantages of the COT block.The CoTNeXt block is the core of our model, and its details are shown in Figure 5.The input of the block is X and the output is Y.

The complete structure of our model
Our model mainly consists of six parts, as shown in Figure 6.The first part is a convolutional layer, with its convolution kernel size of 3 × 3. A CotNeXt block and two CoT blocks constitute the second part.A ResNeXt block and two convolutional layers with the kernel size of 3 × 3 constitute the third part.The ECA-ResNeXt block which is formed by combining the ResNeXt block with the ECA module is the fourth part.A ResNext block is the fifth part, and the sixth part is constituted by a global average pooling (GAP) and a FC layer.Figure 6 presents the structure of the model, and the hyperparameters of the model are listed in Table 1.

Network security situation assessment process
The flow chart of the situation assessment model based on the dual attention mechanism and HHO-ResNeXt is shown in Figure 7.
Specifically, the operation process of our proposed NSSA model can be summarised as the following five steps: Step 1. Acquire network security datasets and preprocess datasets, including numeralisation, normalisation, and data augmentation.

Dataset description
The NSL-KDD dataset (Revathi & Malathi, 2013) solves many of the problems in the KDD-CUP99 dataset (Tavallaee et al., 2009), which is a standard dataset in the field of cybersecurity and is adopted by a lot of security research works.Each piece of data in this dataset consists of 41 features and is marked as normal and attack.There are 39 attack types in the dataset, which are divided into four categories.In our experiment, we used KDDTrain+.txtand KDDTest+.txtas the training dataset and the test dataset.The UNSW-NB15 dataset (Moustafa & Slay, 2015, November) is an authoritative dataset used by many researchers to study cybersecurity.Each data in the dataset contains 49 features and there are a total of nine attack types.The test dataset and training dataset are UNSW_NB15_testing-set.csvand UNSW_NB15_training-set.csv respectively.

NSL-KDD Dataset Preprocessing
In this dataset, each piece of data originally has 42 features.Since the 42nd feature indicates the difficulty of correctly classifying each sample, i.e. meaningless to this experiment, we deleted it.There are many attack types in this dataset, so a mapping dictionary is established, and the attack types are divided into four categories: Dos, R2L, Probe, and U2R.This paper divides normal traffic into the Benign category.
There are three categorical features in the features of this dataset, namely protocol_type, service and flag.Since these three features are not numerical values, they need to be digitised by One-Hot Encoding.The number of features in this dataset after one-hot encoding is 122.For the convenience of operation, the feature matrix of the input network is generally a square matrix.Therefore, this paper expands the number of features after one-hot encoding.The specific implementation method is to add 22 "0" features after the 122nd feature, so that the number of features reaches 144.When loading the data into the model we transform it into a 12 × 12 feature map.
In the dataset, different features vary widely.To reduce the effect of this difference on the experiment, the features are normalised by min-max normalisation as: where x max and x min represent the maximum and minimum of each feature, respectively.And because the dataset is extremely unbalanced, this paper adopts the oversampling technique to balance the train dataset.

UNSW-NB15 dataset preprocessing
This dataset also contains 3 categorical features, so the processing is similar to NSL-KDD dataset.After this dataset is processed, the number of features becomes 196, and when loading the data into the model we transform it into a 14 × 14 feature map.This paper also normalises the UNSW-NB15 dataset following Equation ( 18).Because the ratio of the number of each attack type to the total number of samples in the training dataset is larger than that in the test dataset, which affects the model performance, this paper redistributes the training dataset and the test dataset.

Network security situation quantification
The two datasets selected in this paper do not have actual situation values, so by analysing the impact of various attacks on the network to generate the actual situation values of the datasets.X i represents the impact of various attack types on network security, and N represents the number of various attack samples in a group.Tables 2 and 3 show the impact of each attack type on the network in the two datasets, respectively.
In the NSL-KDD dataset, every 1200 samples are divided into a group, and a total of 19 groups are achieved.The situation value (SV) of each group is calculated as: After situation values are calculated, normalise the situation values.The calculated actual NSSV of the test dataset are shown in Figure 8.In the UNSW-NB15 dataset, every 3300 samples are divided into a group, and a total of 25 groups are achieved.The situation value SV of each group is calculated by Equation ( 19).After situation values are calculated, we normalise the SA.The actual NSSV of the test dataset calculated are shown in Figure 9.

Hyperparameters selection
In this experiment, our model is built based on the tensorflow-gpu-2.4.0 framework.The Mean Squared Error (MSE) is used as the evaluation function when using HHO to find the optimal parameters.The population size is five, and the maximum number of iterations is set to 10.In order to improve the optimisation efficiency and make the Harris' hawk efficiently search for prey, a limit is set for the hyperparameters that need to be optimised.
The optimal hyperparameters are found by Harris' hawks when MSE reaches the minimum value.The final hyperparameters are shown in Table 1.

Multi-class classification results
To verify the performance of the model, we compares our model with other methods, i.e. the algorithms compared with our model are the random forest algorithm proposed by Cao et al. (2021), the decision tree and artificial neural network proposed by Samuel (2021), the DCNN algorithm proposed by Zhang, Xie, et al. (2019), The improved CNN algorithm proposed by Zhang, Zhang, et al. (2019), ResNet proposed by He et al. (2016), and ResNeXt proposed by Xie et al. (2017).The evaluation indicators selected in this paper are accuracy, precision and F-scores.The experimental results of different models are shown in Table 4.
The evaluation results of the three indicators are shown in Table 4.The comprehensive performance of our model on the three indicators of accuracy, precision and F-scores has achieved the best results on both datasets.Therefore, our model has the best comprehensive performance compared to other models.
Among them, n represents the number of samples, y i and y i represents the true value of the sample and the predicted value of the sample, respectively.The smaller the values of these indicators, the better the model's performance.Figures 10 and 11 show the situation assessment results of each model on the datasets used in this paper.
According to Figures 10 and 11, our model has the highest fitting degree with the actual situation values, and the fitting deviation of the other six models is relatively large.The detailed data of the three evaluation indicators of each model are shown in Table 5.
Table 5 show the evaluation results of the three indicators, it can be observed that our model has the smallest MAPE, MSE and MAE on both datasets, which means that our model can excellently evaluate the network environment and eliminate potential hidden dangers.Therefore, compared with other methods, our model has the best comprehensive performance.

Conclusion
This paper proposes a network security situation assessment model based on dual attention mechanism and HHO-ResNeXt.Our model combines ResNeXt with ECA module and COT block.And we select the optimal hyperparameters of the model through the HHO algorithm.The model solves the problem that the traditional CNN cannot obtain the importance of each channel, and the receptive field is limited, dramatically improving the CNN's performance.
Experiments on NSL-KDD dataset and UNSW-NB15 dataset show that our proposed method has the best comprehensive performance and outperforms existing methods.
The experiment will be improved in the future in the following directions: (1) Our results are based on simulation experiments, and the method has not been applied to the real network for practice.Therefore, our model will be applied to the real environment for further testing in the future.(2) The experiments are only conducted on two datasets, which are not enough.To verify the effectiveness of our model, it is necessary to conduct experiments on more datasets.

Figure 2 .
Figure 2. The efficient channel attention module.

Figure 6 .
Figure 6.The complete structure of our model.
Step 2. The obtained cybersecurity dataset is divided into the training dataset and the test dataset.Step 3. Part of the training dataset and the test dataset are used for the HHO optimisation operation, and the Mean Absolute Error (MAE) is used as the evaluation function during optimisation.After the optimisation, the optimised hyperparameters are input into the model.Step 4. The preprocessed training dataset is sent to the model built in this paper for training.After each epoch, the test dataset is sent to the model for testing, and the test results are compared with all previous test results.If the performance is the best this time, save the model after this training and continue training.Otherwise, directly perform the next training until the preset training times is reached.Step 5. Send the test dataset to the best-performing model for prediction, derive the prediction result and calculate the situation value.

Figure 7 .
Figure 7.The flow chart of the network security situation assessment model.

Figure 8 .
Figure 8. Actual network security situation values for NSL-KDD dataset.

Figure 10 .
Figure 10.Situation assessment results on the NSL-KDD dataset.

Table 1 .
Details of this model.

Table 2 .
Impact of attacks in the NSL-KDD dataset.

Table 3 .
Impact of attacks in the UNSW-NB15 dataset.

Table 4 .
Experiment results of different models.

Table 5 .
Network security situation assessment results of each model.To verify the situation assessment ability of the model, three typical evaluation indicators are selected to evaluate each model, namely Mean Absolute Error (MAE), Mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE).These three evaluation indicators can be expressed as follows: