Novel hybrid firefly algorithm: an application to enhance XGBoost tuning for intrusion detection classification

The research proposed in this article presents a novel improved version of the widely adopted firefly algorithm and its application for tuning and optimising XGBoost classifier hyper-parameters for network intrusion detection. One of the greatest issues in the domain of network intrusion detection systems are relatively high false positives and false negatives rates. In the proposed study, by using XGBoost classifier optimised with improved firefly algorithm, this challenge is addressed. Based on the established practice from the modern literature, the proposed improved firefly algorithm was first validated on 28 well-known CEC2013 benchmark instances a comparative analysis with the original firefly algorithm and other state-of-the-art metaheuristics was conducted. Afterwards, the devised method was adopted and tested for XGBoost hyper-parameters optimisation and the tuned classifier was tested on the widely used benchmarking NSL-KDD dataset and more recent USNW-NB15 dataset for network intrusion detection. Obtained experimental results prove that the proposed metaheuristics has significant potential in tackling machine learning hyper-parameters optimisation challenge and that it can be used for improving classification accuracy and average precision of network intrusion detection systems.


INTRODUCTION
The firefly algorithm (FA), proposed by Yang (2009), is a swarm intelligence algorithm designed for exploration, exploitation, and local search of solutions, inspired by social behaviour and flashing activities exhibited by the fireflies. The original FA algorithm is tested against the updated CEC2013 benchmark function set in this article. Also, this article presents the performance of a well-known XGBoost classifier, whose parameters have been optimised using the FA algorithm for the problem of Network Intrusion Detection (NIDS) optimisation. Different NIDS have a simple purpose: to monitor network traffic and detect malicious user activities. They are usually implemented as nodes on strategic points in the network.

BACKGROUND
This section introduces NIDS, the problem of optimisation, in general, and concerning NIDS, and different algorithmic approaches to optimising NIDS network event classification methods. Different machine learning approaches are presented towards the end of the section, leading up to the overview of works related to this problem.

The problem of network intrusion detection
In the last two decades, the web has become the centre stage for many businesses, social, political and other activities and transactions that all happen on the global network. Endpoints of those network transactions are users, usually located within smaller computer networks, such as companies, small Internet provider sub-networks etc. Therefore, security has become an important issue for the contemporary Internet user, even though different intrusion detection solutions have been around for almost 40 years (Neupane, Haddad & Chen, 2018). There are many solutions created to protect users from malicious activities and attacks (Patel, Qassim & Wills, 2010;Mugunthan, 2019).
According to Neupane, Haddad & Chen (2018), traditional NIDS come in the forms of firewalls, and statistical detection approaches usually applied either on the transport or the application layers, which require extensive setup, policy configurations etc. More modern systems use sophisticated approaches. According to Sathesh (2019), ML is used to solve the problem of intrusion detection, even though these approaches have different challenges, as reported by Jordan & Mitchell (2015). Regardless, ML approaches are efficient in finding optimal solutions for time-consuming problems, such as training efficient NIDS network event classifiers. Different solutions are based on various types of ML methods, such as artificial neural networks (ANN), evolutionary algorithms (EA), and other supervised and unsupervised learning methods, according to Verwoerd & Hunt (2002).
When creating and evaluating a NIDS, it is important to measure its performance accurately. For this reason, previously mentioned false positives (FP) and false negative (FN) measurements are used together with true positive (TP) and true negative (TN) measurements to correctly evaluate the classification accuracy of a NIDS, according to the general formula shown in Eq. (1).
From values TP, TN, FP and FN, it is possible to also determine the system's sensitivity, specificity, fallout, miss rate, and prevision through methods presented in Eqs. (2)-(6):

Optimisation and optimisation algorithms
Optimisation aims to find an optimal or near-optimal solution for a certain problem within the given set of constraints. Many population-based stochastic meta-heuristics were developed for solving the problem of optimisation, according to Beheshti & Shamsuddin (2013).
Non-deterministic polynomial-time-hard problems are hard to solve with traditional deterministic algorithms. They can take a long time to complete on commonly available hardware. Therefore, these solutions are usually impractical.
On the other hand, optimal solutions to these types of problems can be found using stochastic meta-heuristics, which do not guarantee an optimal solution, but acceptable sub-optimal ones in reasonable time-frames, according to Spall (2011). Commonly, these algorithms are labelled as Machine Learning Algorithms (MLA).

Swarm intelligence algorithms
A special type of nature-inspired stochastic meta-heuristic MLA are population-based algorithms, among which are swarm intelligence algorithms (SIA). These algorithms inspire different naturally occurring systems, where individual self-organising agents interact with each other and their environment without a centralised governing component. These systems give an impression of globally coordinated behaviour and have inexpensive abilities in solving very demanding optimisation problems (Mavrovouniotis, Li & Yang, 2017).
The most notable and popular methods that have proven themselves as powerful optimiser with respectable performances include the ant colony optimisation (ACO) introduced by Dorigo, Birattari & Stutzle (2006), artificial bee colony (ABC) proposed by Karaboga & Basturk (2007), particle swarm optimisation (PSO) developed by Kennedy & Eberhart (1995), as well as the FA, introduced by Yang (2009) and used as a foundation for the algorithm proposed in this paper. More recent algorithms that have shown good results include the grey wolf optimiser (GWO) (Mirjalili, Mirjalili & Lewis, 2014), moth search (MS) (Wang, 2018), monarch butterfly algorithm (MBA) (Wang, Deb & Cui, 2019), whale optimisation algorithm (WOA) (Mirjalili & Lewis, 2016), and the Harris hawk's optimisation (HHO) (Heidari et al., 2019). Additionally, the differential evolution algorithm (Karaboğa & Okdem, 2004) and the co-variance matrix adaptation (Igel, Hansen & Roth, 2007) approaches have also recently exhibited outstanding performances. Recently, algorithms inspired by the properties of the mathematical functions gained popularity among scientific circles, and the most notable algorithm is the sine-cosine algorithm (SCA), which was proposed by Mirjalili (2016). SCA was also utilised in this research to hybridise the basic FA search.
These algorithms, on their own, have strengths and weaknesses when applied to different problems, and often, they are used to optimise different higher-level models and their hyper-parameters instead of being used to perform classification on their own. This article presents this synthesis of an optimisation algorithm used for hyper-parameter tuning and optimising a higher-order classification system.

METHODS
The original implementation of the FA is shown in this section, followed by the descriptions of known and observed flaws of the original FA. The section suggests improvements to the original algorithm to address the described flaws.
The original version of the firefly algorithm Yang (2009) has suggested a swarm intelligence system that was inspired by the fireflies' lighting phenomenon and social behaviour. Because the behaviour of actual fireflies is complicated, the FA metaheuristics model, with certain approximations, was proposed.
The fitness functions are modelled using the firefly's brightness and attraction. In most FA implementations, attractiveness depends on the brightness, determined by the objective function's value. In the case of minimisation problems, it is written as Yang (2009): where I(x) is the attractiveness and f(x) represents the objective function's value at x, which is the location.
Therefore, the attractiveness of a firefly is indirectly proportional to the distance from the source of light (Yang, 2009): When modelling systems where the environment absorbs the light, the FA uses the light absorption coefficient parameter γ. I(r) and I 0 are the intensities of the light at the distance of r and the source. Most FA implementations combine effects of the inverse square law for distance and γ to approximate the following Gaussian form Yang (2009): As indicated in Eq. (10), each firefly employs β (representing attractiveness), which is proportionate to the intensity of the firefly's light, which is reliant on distance.
where β 0 represents the attractiveness at r = 0. However, Eq. (10) is commonly swapped for Eq. (11) (Yang, 2009): Based on Eq. (11), the equation for a random firefly i, moving in iteration t + 1 to a new location x i in the direction of another firefly j, which has a greater fitness value, according to the original FA, is (Yang, 2009): where α is the randomisation parameter, κ is the random uniformly distributed number, and r i,j is the distance between fireflies i and j. Values that often give good results for most problems for β 0 and α are 1 and [0, 1]. The r i,j is calculated as follows, and represents the Cartesian distance: where D is the number of parameters of a specific problem.

Reasons for improvements
The original FA has performed exceptionally for many benchmarks (Yang & He, 2013) and practical problems (Strumberger et al., 2019). Past research suggests that the original FA has several flaws regarding exploration and an inappropriate intensificationdiversification balance (Strumberger, Bacanin & Tuba, 2017;Xu, Zhang & Lai, 2021;Bacanin & Tuba, 2014). The lack of diversity is noticeable in early iterations, when the algorithm cannot converge to optimum search space areas in certain runs, resulting in low mean values. In such cases, the original FA search technique (Eq. (12)), which mostly performs exploitation, is incapable of directing the search to optimal domains. In contrast, the FA achieves satisfactory results when random solutions are created randomly in optimal or near-optimal areas during the initialisation phase.
An examination of the original FA search equation (Eq. (12)) reveals that it lacks an explicit exploration technique. Some FA implementations employ the dynamic randomisation parameter α, which is continuously reduced from its starting value α to the specified threshold α min , as shown in Eq. (14). As a result, at the start of a run, exploration is prioritised, whereas subsequent iterations shift the balance between intensity and diversification toward exploitation (Wang et al., 2017). However, based on simulations, it is concluded that the use of dynamic α is insufficient to improve FA exploration skills, and the suggested technique only somewhat alleviates this problem.
where t and t + 1 are current and next iterations, and T is the maximum iteration count in a single run. Past research has shown that FA exploitation abilities are effective in addressing a variety of tasks, and FA is characterised as a metaheuristic with substantial exploitation capabilities (Strumberger, Bacanin & Tuba, 2017;Xu, Zhang & Lai, 2021;Bacanin & Tuba, 2014).

Novel FA metaheuristics
This work proposes an improved FA that tackles the original FA's flaws by using the following procedures: • A technique for explicit exploration based on the exhaustiveness of the answer; • gBest chaotic local search (CLS) approach.
• Hybridisation with SCA search by doing either FA or SCA search at random in each cycle based on a produced pseudo-random value.
The FA's intensification may be improved further by applying the CLS mechanism, as demonstrated in the empirical portion of this work. A novel FA is dubbed chaotic FA with improved exploration due to proposed modifications (CFAEE-SCA).

Explicit exploration mechanism
The purpose of this mechanism is to ensure the convergence to the best section of the search space early on, while facilitating exploration around the parameter bounds of the current best individual x Ã later on. Each solution is represented using an additional attribute trial. It increases this attribute when it cannot further improve the solution with the original FA search (Eq. (12)). When the trial parameter reaches a set limit, the individual is swapped for a random one picked from the search space in the same way as in the setup phase: where x i,j is the j-th component of i-th individual, u j and l j are the upper and lower search boundaries of the j-th parameter, and rand is a random number in range [0, 1], from a uniform distribution.
A complete solution is one for which trial exceeds the limit. This term was adapted from the well-known ABC metaheuristics (Karaboga & Basturk, 2008), which have efficient exploration mechanisms (Moradi et al., 2018).
When the algorithm fails to find appropriate areas of the search space, replacing the exhausted solution with a pseudo-random person improves search performance early on. Later on, this type of substitution wastes functions evaluations. As a result, in subsequent iterations, the random replacement technique is replaced by the directed replacement mechanism around the bottom and higher parameter values of the population's solutions: where Pl j and Pu j are the lowest and highest values of the j-th component from the whole population P.

The gBest CLS strategy
Chaos is responsive to the initial conditions of non-linear and deterministic systems (Alatas, 2010). Chaotic search is more efficient than the ergodic (dos Santos Coelho & Mariani, 2008) because many sequences can be created by modifying the initial values. Literature reports many chaotic maps. After testing, it was determined that the logistic map yields the most favourable results in the case of the suggested innovative FA. The logistic map has been used in a variety of swarm intelligence methodologies so far (Li et al., 2012;Chen et al., 2019;Liang et al., 2020). The logistic map used by the proposed method is defined in K steps as: where r k i;j and r kþ1 i;j are chaotic variable for the i-th solution's j-th component in steps k and k + 1, and μ is a control variable. σ i,j ≠ 0.25, 0.5 and 0.75, σ i,j ∈ (0, 1) and μ is set to 4. This value was determined empirically by Liang et al. (2020).
The proposed method integrates the global best (gBest) CLS strategy. The chaotic search is performed around the x Ã solution. Equations (18) and (19) show how a new x Ã (x′ Ã ) is created in each step k, for component j of x Ã : where Eq. (17) determines r k j , and λ is the dynamic shrinkage parameter dependant on FFE (current fitness function evaluation) and maxFFE (maximum number of fitness function evaluations): Better exploitation-to-exploration equilibrium is formed around the x Ã by employing dynamic lambda. Earlier in the execution, a larger search radius around the x Ã was performed, whereas later, a fine-tuned exploitation commenced.
When the maximum number of iterations is used as the termination condition, the FFE and maxFFE can be substituted with t and T.
The CLS strategy is used to enhance x Ã in K steps. If the x′ Ã achieves greater fitness than where r 4 is a random value in range [0, 1]. The preceding equation demonstrates that the algorithm's four main parameters are r 1 , r 2 , r 3 , and r 4 . The r 1 parameter determines the region (movement direction) of the next location; it might be inside or outside the area between the destination and solution. The r 2 parameter specifies the amplitude and direction of the movement (towards the destination or outwards). The r 3 parameter assigns a random weight to the destination in order to reduce (r 3 1) or accentuate (r 3 > 1) the impacts of the destination in the distance definition. The r 4 parameter is used to alternate between sine and cosine components.
The SCA search algorithm is included in the proposed method in the following fashion. Each cycle generates a pseudo-random number. If the resulting value is more than 0.5, the FA search algorithm does. Otherwise, it executes the SCA search described in Eq. (22).

Chaotic FA with enhanced exploration and SCA search pseudo-code
A few factors should be examined to efficiently include the exploration mechanism and gBest CLS approach into the original FA. First, as previously indicated, the random replacement method should be used in the early stages of execution, while the guided one would produce superior outcomes later on. Second, the gBest CLS technique would not produce substantial gains in early iterations since the x Ã would still not converge to the optimal area, wasting FFEs.
The extra control parameter ψ is introduced to govern the behaviour as mentioned earlier. If t < ψ, the exhausted population solutions are replaced randomly Eq. (15) without activating the gBest CLS. Otherwise, it executes the guided replacement mechanism Eq. (16) and activates the gBest CLS.
The original FA search suggested approach uses dynamic alpha to fine-tune, according to Eq. (14). Based on the pseudo-random value, the method alternates between FA and SCA in each round.
Taking everything above into account, Algorithm 1 summarises the pseudo-code of the proposed CFAEE-SCA.
The flowchart of the proposed CFAEE-SCA algorithm is given in the Fig. 1.

Algorithm 1 The CFAEE-SCA pseudo-code
Initialise control parameters N and T

Initialise search space parameters D, u j and l j
Initialise CFAEE-SCA parameters γ, β 0 , α 0 , α min , K and ϕ Initialise random population P init = {x i,j }, i = 1, 2, 3 ⋯, N; j = 1, 2, ⋯, D using Eq. (15) in the search space Generate pseudo-random value rnd Update parameters α and λ using Eqs. (14) and (20) end while Return the best individual x * from the population Post-process results and perform visualisation The CFAEE-SCA complexity and drawbacks Because the most computationally costly portion of the swarm intelligence algorithm is the objective evaluation (Yang & He, 2013), the number of FFEs may be used to assess the complexity of the method. The basic FA evaluates objective functions during the startup and solution update stages. When updating solutions, the FA utilises one main loop for T iterations and two inner loops that go through N solutions, according to the Eq. (12) (Yang & He, 2013).
Basic FA metaheuristics have a worst-case complexity of O(N) + O(N 2 · T), including the initialisation phase. However, if N is large enough, one inner loop may be used to rate the beauty or brightness of all fireflies using sorting algorithms. Complexity in this situation is O(N) + O(N · T · log(N)) (Yang & He, 2013).
Because of the explicit exploration mechanism and the gBest CLS method, the suggested CFAEE-SCA has a higher complexity than the original FA. In the worst-case situation, if limit = 0, all solutions will be replaced in every iteration, and if ϕ = 0, the gBest CLS approach will be activated during the whole run. Assuming that K is set to 4, the worst-case CFAEE-SCA complexity is stated as: . In practice, however, the complexity is substantially lower due to limit and ψ control parameter modifications.
The CFAEE-SCA has certain drawbacks over the original design, including the use of new control parameters limit and ψ. However, the values of these parameters may be easily determined by performing empirical simulations. Furthermore, as proven in the next sections, the CFAEE-SCA outperforms the original FA for benchmark tasks and the XGBoost optimisation challenge from the machine learning domain.

RESULTS OF PROPOSED ALGORITHM AGAINST STANDARD CEC2013 BENCHMARK FUNCTION SET
The CEC2013 benchmark functions suite consists of 28 challenging benchmark function instances belonging to the different classes. Functions 1-5 belong to the group of unimodal instances, functions 6-20 are multi-modal instances, while functions 21-28 belong to the composite functions family. The CEC2013 functions list is presented in Table 1. The challenge is to minimise the functions. Each class of functions has its purpose -unimodal benchmarks test the exploitation, multi-modal benchmarks target exploration. In contrast, the composite benchmarks are utilised to assess the algorithm's performances due to their complex nature.
The basic implementation of FA and the proposed CFAEE-SCA algorithms have been validated against five recent cutting-edge metaheuristics tested on the same benchmark function set. The competitor metaheuristics included practical genetic algorithm (RGA) (Haupt & Haupt, 2004), gravitational search algorithm (GSA) (Rashedi & Nezamabadi-pour, 2012), disruption gravitational search algorithm (D-GSA) (Sarafrazi, Nezamabadi-pour & Saryazdi, 2011), clustered gravitational search algorithm (BH-GSA) (Shams, Rashedi & Hakimi, 2015), and attractive repulsive gravitational search algorithm (AR-GSA) (Zandevakili, Rashedi & Mahani, 2019). The introduced CFAEE-SCA method has been tested in the same way as proposed in Zandevakili, Rashedi & Mahani (2019). That publication was utilised to reference the results of other methods included in the comparative analysis. Authors Zandevakili, Rashedi & Mahani (2019) proposed a novel version of the GSA by adding the attracting and repulsing parameters to enhance both diversification and intensification phases. It is worth noting that the authors have implemented all algorithms used by Zandevakili, Rashedi & Mahani (2019) on their own and tested them independently by using the same experimental setup proposed by Zandevakili, Rashedi & Mahani (2019). The novel CFAEE-SCA has been implemented and verified on all 28 benchmark functions with 30 dimensions (D = 30), together with the basic FA implementation. In Tables 2-4, the results of the CFAEE-SCA on CEC2013 instances with 30 dimensions and 51 independent runs for uni-modal, multi-modal and composite functions, respectively, have been evaluated against six other swarm intelligence metaheuristics. As mentioned before, the same simulation conditions were utilised as in (Zandevakili, Rashedi & Mahani, 2019), with the same stop criteria of the number of fitness functions evaluations with the maximum number being 1.00E+05. Furthermore, the experiments have been conducted with 50 solutions in the population (N = 50).
Convergence graphs of the proposed CFAEE-SCA method for two unimodal, four multimodal and two composite functions that were chosen as examples have been     presented in Fig. 2. The proposed CFAEE-SCA has been compared to the basic FA, and cutting-edge metaheuristics such as AR-GSA, GSA and RGA. From the presented convergence graphs, it can be seen that the proposed method in most cases converges faster than the other metaheuristics included in the experiments. Additionally, the proposed method is significantly superior to the basic FA metaheuristics, that in most cases stagnates while the CFAEE-SCA accelerates the convergence speed.
In order to provide more objective way for determining the performances and efficiency of the proposed method against other competitors, statistical tests must be conducted. Therefore, the Friedman test that was introduced by Friedman (1937Friedman ( , 1940, together with the ranked two-way analysis of variances of the suggested approach and other implemented algorithms were conducted.
The results obtained by the eight implemented approaches on the set of 28 challenging function instances from the CEC2013 benchmark suite, including the Friedman and the aligned Friedman test, are given in the Tables 5 and 6, respectively.
According to the findings presented in Table 6, the proposed CFAEE-SCA outscored all other algorithms, together with the original FA which achieved the average rank of 133.463. Suggested CFAEE-SCA achieved an average ranking of 56.838.
Additionally, the research by Sheskin (2020) suggested the possible enhancement in terms of performance by comparing with the χ 2 value. Therefore, the Iman and Davenport's test introduced by Iman & Davenport (1980) has been applied as well. The findings of this test are presented in Table 7.  The obtained findings show a value of 2.230E+01 that indicates significantly better results than the F-distribution critical value (F(9,9 × 10) = 2.058E+00). Additionally, the null hypothesis H 0 has been rejected by Iman and Davenport's test. The Friedman statistics score of (χ 2 r = 1.407E+01) results in better performance than the F-distribution critical value at the level of significance of α = 0.05.
The final observation that can be drawn here is that the null hypothesis (H 0 ) can be rejected and that the proposed CFAEE-SCA is obviously the best algorithm in the conducted tests.
As both executed statistical tests rejected the null hypothesis, the next type of test, namely the Holm's step-down procedure has been performed. This procedure is a nonparametric post-hoc method. The results of this procedure have been presented in Table 8. The p value is the main sorting reference for all approaches included in the experiment, and they are compared against the α/(k − i). The k represents the degree of freedom, and the i denotes the number of the method. This paper used the α parameter at the levels of 0.05 and 0.1. It is worth mentioning that the values of p parameter are given in scientific notation.  The summary of the conducted Holm's procedure presented in the Table 8 indicates that the significant enhancement has been achieved by the proposed method in case of both levels of significance.

THE XGBOOST CLASSIFIER TUNING WITH CFAEE-SCA
In this section, the basic information relevant to the framework for optimising the XGBoost model by using the proposed CFAEE-SCA algorithm are shown. Later on, this section presents the results of the proposed approach on two sets of network intrusion detection experiments. First experiment was conducted by utilising the NSL-KDD benchmark dataset, while the second experiment used more recent, UNSW-NB15 network intrusion dataset.

The CFAEE-SCA-XGBoost overview
The XGBoost is an extensible and configurable improved gradient Boosting decision tree optimiser with fast computation and good performance. It constructs Boosted regression and classification trees, which operate in parallel. It efficiently optimises the value of the objective function. According to Chen & Guestrin (2016), it works by scoring the frequency and by measuring the coverage of the impact of a selected feature on the output of a function.
XGBoost utilises additive training optimisation, where each new iteration is dependant on the result of the previous one. This is evident in the i-th iteration's objective function calculation method: In Eqs. (23)-(27), g and h are the 1 st and 2 nd derivatives, w are the weights, R is the model's regularisation term, γ and λ are parameters for configuring the tree structure (larger values give simpler trees). F i o is the i-th iteration's object function, l is the loss term in that iteration, and C is a constant term. Finally, the score of the loss function, which is used to evaluate the complexity of the tree structure: The proposed CFAEE-SCA-XGBoost model's parameters are optimised using the CFAEE-SCA algorithm. The six optimised parameters shown in Table 9. The parameters have been chosen based on the several previous published research including Jiang et al. (2020), as they have the most influence on the performances of the model. The same parameters have been optimised for both conducted experiments.
Therefore, the proposed CFAEE-SCA solution is encoding as a vector with six components, where each vector's parameter represents one XGBoost hyper-parameter from Table 9 which is subject to optimisation process. Some of the components are continuous (eta, gamma,sub-sample,colsample_bytree) and some are integer (max_depth and min_child_weight) and this represents a typical mixed variables NP-hard challenge. During the search process, due to the search expressions of the CFAEE-SCA optimiser, integer variables are transformed to continuous, and they are eventually transformed back to integers by using simple sigmoid transfer function.
The fitness of each solution is calculated by constructing the XGBoost model based on the solution and validating its performance on the training set, while for the global best solution (the one that establishes the best fitness on the training set), the constructed XGBoost model is validated against the testing set and these metrics are reported in the results' tables. Pipeline of the CFAEE-SCA-XGBoost framework is presented in Fig. 3.

Experiments with NSL-KDD dataset
The proposed model was trained and tested using the NSL-KDD dataset, which was analysed for the first time in Tavallaee et al. (2009). The NSL-KDD dataset can be retrieved from the following URL: https://unb.ca/cic/datasets/nsl.html. This dataset is prepared and used for intrusion Detection system evaluation. Dataset features are described in Protić (2018). A summary describing the main features of the dataset is shown in Table 10. The proposed model was tested with the swarm size of 100 agents throughout 800 iterations, with 8,000 fitness function evaluations (FFE). This setup was proposed by Jiang et al. (2020). There are five event classes which represent normal use, denial of service (DoS) attack, probe attack, user to root attack (U2R), and remote to local user (R2L). As very well documented by Protić (2018), the dataset has predefined training and testing sets, whose structure is shown in Table 11, while visual representation is provided in Fig. 4.
The proposed model was tested, following instructions set up by Jiang et al. (2020), with the substituting of their optimisation algorithm with the proposed CFAEE-SCA algorithm, for this experiment.  Because of different types of data in the dataset, data-points are standardised into a continuous range: In Eq. (29), M represents the total number of records in the dataset, d is an individual data-point for the i-th feature of the j-th record, and d′ is the corresponding data-point's standardised value. After standardising all data-points, they are normalised: In Eq. (30), d″ is the normalised value of the corresponding d′ data-point. d min and d max are the minimum and maximum values of the j-th feature.  The proposed model is evaluated using precision, recall, f-score, and the P-R curve. The P-R curve is used instead of the ROC curve due to its better ability to capture the binary event situation measurement impact, as explained by Sofaer, Hoeting & Jarnevich (2019). Specifically, these events happen in this dataset due to a limited number of U2R attack cases related to other events. P-R curve-based values, including the average precision (AP), mean average precision (mAP) and macro-averaging calculations, further help evaluate the model's performance.
Experimental results of the proposed model are presented and compared to results of the solution with the pure XGBoost approach, the original FA-XGBoost and the PSO-XGBoost. The experimental setup is the same as the setup proposed in Jiang et al. (2020), that was used to reference the PSO-XGBoost results. It is important to state that the authors have implemented the PSO-XGBoost and tested it independently, by using the same conditions as in Jiang et al. (2020). Results for the FA and CFAEE-SCA supported versions of the XGBoost framework are shown in Table 12, together with the PSO-XGBoost and basic XGBoost results. The best results are marked in bold. As the presented results show, the proposed CFAEE-SCA-XGBoost approach clearly outperforms both other metaheuristics approaches for the observed classes. Additionally, it can be seen that the CFAEE-SCA-XGBoost significantly outperforms the basic XGBoost method. The basic FA-XGBoost obtained similar level of performances as PSO-XGBoost. Table 13 shows AP values from the P-R curves of the CFAEE-SCA-XGBoost model compared to the values of XGBoost, FA-XGBoost and PSO-XGBoost models for all event types and classes. The proposed CFAEE-SCA-XGBoost approach performed better than other compared approaches for all types and classes. It is important to note that the NSL-KDD is imbalanced dataset, and the proposed CFAEE-SCA-XGBoost managed to achieve high performances (even for minority classes) for the accuracy and recall without modifying the original dataset. The PR curve of the basic XGBoost approach is shown in Fig. 5, while the PR curve of the proposed CFAEE-SCA-XGBoost method is presented in Fig. 6. To help visualising the difference and the improvements of the CFAEE-SCA-XGBoost method against the basic XGBoost, Fig. 7 depicts the precision vs recall curve comparison between the proposed CFAEE-SCA method and the basic XGBoost implementation. Finally, Table 14 presents the values of XGBoost parameters determined by the proposed CFAEE-SCA method.

Experiments with UNSW-NB15 dataset
In the second set of experiments, the proposed model has been trained and tested by utilising the more recent UNSW-NB15 dataset, that was first proposed and analysed by Moustafa & Slay (2015) and Moustafa & Slay (2016). The UNSW-NB15 dataset can be     Moustafa & Slay (2015).
In total, the UNSW-NB15 dataset contains 42 features, out of which 39 are numerical, and three are categorical (non-numeric). The UNSW-NB15 contains two main datasets: UNSW-NB15-TRAIN, utilised for training various models and the UNSW-NB15-TEST, utilised for testing purposes of the trained models. The proposed model has been tested by following the instructions specified by Kasongo & Sun (2020), in order to provide common grounds to compare the proposed model against their published results. The train set was divided into two parts, namely TRAIN-1 (75% of the training set) and VAL (25% of the training set), where the first part was used for training and the second part was used for validating before proceeding to test phase.
The UNSW-NB15 is comprised of instances belonging to the following categories that cover typical network attacks: Normal, Backdoor, Reconnaissance, Worms, Fuzzers, DoS, Generic, Analysis, Shellcode and Exploits. The research by Kasongo & Sun (2020) utilises XGBoost as the filter method for feature selection, and the features are normalised by using Min-Max scaling during the data processing. This was followed by application on various machine learning models, such as support vector machine (SVM), linear regression (LR), artificial neural network (ANN), decision tree (DT) and k-nearest neighbours (kNN).
The first phase of the experiments used the full feature size (total of 42 features) for the binary and multiclass configurations. The second part of the experiments utilised the feature selection powered by XGBoost as the filter method, resulting in the reduced number of features (19), that were subsequently used for the binary and multiclass configuration (details about the reduced features vector can be found in Kasongo & Sun (2020)). The parameters used for ANN, LR, kNN, SVM and DT are summarised in Table 15. It is important to state that the authors have implemented and recreated all experiments by utilising the same conditions as in Kasongo & Sun (2020) and tested them independently, with maximum FFE as termination condition.
The simulation results are shown in Tables 16-19. As mentioned before, the results for the ANN, LR, kNN, SVM and DT were obtained through independent testing by authors and those values have been reported and compared to the values obtained by the basic XGBoost (with default parameters' values), PSO-XGBoost, FA-XGBoost and the proposed CFAEE-SCA-XGBoost. The best result in each category is marked in bold text.  Table 16 reports the findings of the experiments with different ML approaches, basic XGBoost and three XGBoost metaheuristics models for the binary classification that utilises the complete feature set of the UNSW-NB15 dataset. On the other hand, Table 17   depicts the results of the binary classification over the reduced feature set of the UNSW-NB15 dataset. Tables 18 and 19 present the results obtained by different ML models, basic XGBoost and three XGBoost metaheuristics models for the multiclass classification that uses the complete and reduced feature vectors, respectively. In every table, Acc training represents the accuracy obtained over the training data, Acc val stands for the accuracy obtained over the validation data partition, and finally, Ac test denotes the accuracy obtained over the test data.
The experimental findings over the USNW-NB15 IDS dataset clearly indicate the superiority of the hybrid swarm intelligence and XGBoost methods over the standard machine learning approaches. All three XGBoost variants that use metaheuristics significantly outperformed all other models, both in case of binary classification and in case of multiclass classification. Similarly, the swarm based approaches outperformed the traditional methods for both complete feature set, and for the reduced number of features. Among the three XGBoost variants that use metaheuristics for optimisation, the PSO-XGBoost achieved the third place, basic FA-XGBoost finished second, while the proposed CFAEE-SCA-XGBoost obtained the best scores on all four test scenarios by the significant margin. This conclusion further establishes the proposed CFAEE-SCA-XGBoost method as a very promising option for the intrusion detection problem.

CONCLUSIONS
This article has presented a proposed an improved FA optimisation algorithm CFAEE-SCA, that was devised with a goal to overcome the deficiencies of the basic FA metaheuristics. Several modifications have been made to the basic algorithm, including explicit exploration mechanism, gBest CLS strategy, and hybridisation with SCA to further enhance the search process. The proposed improved metaheuristics was later used to optimise the XGBoost classifier for the intrusion detection problem. The CFAEE-SCA-XGBoost framework has been proposed, based on the XGBoost classifier, with its hyperparameters, optimised and tuned using the newly proposed CFAEE-SCA algorithm. The proposed model was trained and tested for network intrusion detection using two well- The best achieved performance metric in all comparative analysis results tables are marked in bold.