Systematic literature review on intrusion detection systems: Research trends, algorithms, methods, datasets, and limitations

: Machine learning (ML) and deep learning (DL) techniques have demonstrated signi ﬁ cant potential in the development of e ﬀ ective intrusion detection systems. This study presents a systematic review of the utilization of ML, DL, optimization algorithms, and datasets in intrusion detection research from 2018 to 2023. We devised a comprehensive search strategy to identify relevant studies from scienti ﬁ c databases. After screening 393 papers meeting the inclusion criteria, we extracted and analyzed key information using bibliometric analysis techniques. The ﬁ ndings reveal increasing publication trends in this research domain and identify frequently used algorithms, with convolutional neural networks, support vector machines, decision trees, and genetic algorithms emerging as the top methods. The review also discusses the challenges and limitations of current techniques, providing a structured synthesis of the state-of-the-art to guide future intrusion detection research.


Introduction
Machine learning (ML) and deep learning (DL) techniques are transforming intrusion detection systems (IDS) [1][2][3], enabling enhanced security, adaptability, and scalability.Network intrusions through malicious attacks can disrupt services and operations, necessitating intelligent detection systems capable of identifying known and unknown threats.This has motivated extensive research on applying advanced ML and DL algorithms for IDS over the past decade.However, the rapid growth of studies presents challenges in synthesizing state-of-theart advancements in a structured manner.This study provides a systematic literature review of ML-and DLbased intrusion detection techniques published between 2018 and 2023.A comprehensive search strategy is devised to survey recent studies from major scientific databases.After screening and analyzing 393 qualified papers, we extracted key information to understand publication trends, frequently adopted algorithms, datasets, limitations, and future challenges.Bibliometric analysis techniques help visualize research themes, prominent authors, and frequently studied algorithms like convolutional neural networks (CNNs), support vector machines (SVMs), XGBoost, and genetic algorithms (GAs) [1][2][3][4][5].The structured taxonomy of the review categorizes techniques into four broad categories: ML, DL, optimization algorithms and datasets.Detailed sub-categorizations are presented to summarize advancements within each domain.The findings reveal SVM, CNN, decision trees (DTs), and GAs as leading techniques for attaining high classification performance [6][7][8][9][10].
Comparative analysis provides insights into the relative effectiveness and limitations of different algorithms and datasets.This systematic review consolidates scattered developments into an organized synthesis to benefit researchers and practitioners working on ML/DL-driven IDS.By clarifying the state-of-the-art, it can guide selection of appropriate algorithms and datasets while also highlighting open challenges for advancing intelligent anomaly detection.The taxonomy of techniques provides a starting point for new researchers to swiftly comprehend key concepts, methods, and terminology in this rapidly progressing field.The systematic literature review of IDS adds significant value to the field by addressing gaps, challenges, and establishing its significance in advancing the domain of cybersecurity.Here is how the research achieves these objectives: (1) Addressing gaps: The review consolidates scattered developments into an organized synthesis, providing a structured taxonomy of techniques and categorizations within the domain of IDS.By identifying key concepts, methods, and terminology, the review serves as a starting point for new researchers to swiftly comprehend the rapidly progressing field of trustworthy ML.The study offers insights into the relative effectiveness and limitations of different algorithms and datasets, addressing the need for comparative analysis in the field of IDS.(2) Adding value: The comprehensive science mapping analysis helps organize the outcomes of previous investigations, summarizes key issues, and identifies potential research gaps, contributing to the existing body of knowledge.The review provides valuable insights into the conceptual framework of trustworthy ML, benefiting practitioners, policymakers, and academics, thus adding value to the understanding of the current state of research in this area.By visualizing research themes, prominent authors, and frequently studied algorithms, the review offers a comprehensive overview of the research landscape in IDS, thereby adding value to researchers and practitioners working in this domain.(3) Establishing significance: The study establishes significance by offering insights into the annual scientific production and advancements in reliable ML in intrusion detection over the past 10 years, highlighting the continuous evolution and significance of the field.Through the utilization of bibliometric analysis techniques and the extraction of key information from a large number of qualified papers, the review establishes the significance of its findings in understanding publication trends, frequently adopted algorithms, datasets, limitations, and future challenges in the field of IDS.The identification of the most popular and crucial keywords from previous research using a word cloud adds significance by highlighting the broad and diverse nature of the field of trustworthy ML, covering a wide range of subjects and applications.In summary, the systematic literature review of IDS adds value by addressing gaps, providing valuable insights, and establishing its significance in advancing the field of cybersecurity through comprehensive analysis and synthesis of research findings.The reviewed literature has identified several challenges and limitations of current intrusion detection techniques.These include: (1) Limited availability of labeled datasets: The availability of labeled datasets is a significant challenge in intrusion detection research, as it limits the ability to train and evaluate ML models effectively.(2) Imbalanced datasets: Imbalanced datasets, where the number of samples in one class is significantly higher than the other, can lead to biased models and reduced performance.(3) Adversarial attacks: Adversarial attacks, where attackers intentionally manipulate data to evade detection, pose a significant challenge to IDS. (4) Interpretability and explainability: The interpretability and explainability of ML models are crucial in intrusion detection, as it is essential to understand how the models make decisions and identify potential vulnerabilities.(5) Scalability: The scalability of IDS is a significant challenge, particularly in large-scale networks, where the volume of data can be overwhelming.These challenges and limitations can significantly influence the overall effectiveness and reliability of IDS.For instance, limited availability of labeled datasets can lead to poorly trained models, while imbalanced datasets can result in biased models that perform poorly on underrepresented classes.Adversarial attacks can evade detection and compromise the security of the system, while the lack of interpretability and explainability can limit the ability to identify and address potential vulnerabilities.Finally, scalability challenges can limit the ability to deploy IDS in large-scale networks, reducing their overall effectiveness and reliability.Overall, addressing these challenges and limitations is crucial for enhancing the effectiveness and reliability of IDS, and future research should focus on developing solutions to overcome these obstacles.

Methodology
In this analytical section (Figure 1), we followed the recommended reporting guidelines for a systematic review and meta-analysis technique.The procedure entailed the utilization of several bibliographic citation databases encompassing a broad spectrum of medical, scientific, and social science periodicals across various domains.Specifically, we considered three prominent digital databases: Scopus, IEEE Xplore, and Web of Science when searching for the target papers.
Scopus is renowned for its reliable resources across a wide array of disciplines, encompassing engineering, technology, science, medicine, and health.IEEE is a comprehensive repository of technical and scientific literature, offering full-text articles and abstracts across various publications in the fields of computer science, electronics, and electrical engineering.On the other hand, the Web of Science (WoS) database is a cross-disciplinary resource that incorporates research papers from diverse fields, including science, technology, art, and social science.These databases collectively offer extensive coverage of research in all scientific and technological domains, delivering valuable insights to researchers.

Search strategy
The three databases under consideration (Scopus, IEEE, and WoS) each underwent a thorough bibliographic search for academic papers written in English.All scientific articles production from 2018 to 2023 were included in this search.
This search used a Boolean query to link the keywords trustworthy, using two operator (AND) and (OR) as follows: "Intrusion detection system OR IDS" (AND) "machine learning OR deep learning OR classification algorithms" (AND) "optimisation algorithm OR optimization algorithm").Systematic literature review on intrusion detection systems  3

Inclusion and exclusion criteria
The most crucial aspect of this conducted systematic review of the literature is the criteria taken into consideration for the inclusion/selection of studies (Figure 1).And for this, the following parameters were taken into account: The papers had to be written in English, published in a journal or conference proceedings, and they had to take into account one or more reliable elements.
-Each of the components required to be considerably connected to reliable ML or DL in order to be integrated into various ML techniques/methods for the intrusion detection domain.-Highly relevant articles that were published from 2018 to 2023.
On the other hand, papers beyond the purview of this investigation were omitted based on the following exclusion criteria: -Papers that were not from peer reviewed publication forums.
-Papers that were not written in English.

Study selection
This study adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement for conducting a systematic literature review.The methodology involved several phases, with the initial phase focusing on the removal of duplicate papers.To achieve this, the titles and abstracts of the contributions were screened using the Mendeley program.All authors participated in this process, leading to the exclusion of numerous unrelated works.Disagreements and discrepancies among the authors were resolved by the corresponding author.The third step entailed a detailed examination of the full texts, during which articles failing to meet the previously established inclusion criteria (as outlined in Section 2.2) were eliminated.Only studies meeting the stipulated requirements were incorporated into this research.The initial search yielded 696 entries, of which 261 originated from Scopus, 175 from IEEE, and 260 from WoS.After removing approximately 217 duplicates and carefully scrutinizing the remaining entries, 393 studies remained.According to the inclusion criteria, studies were deemed relevant and subsequently included in the final collection of publications.The analysis of these gathered articles is subject to various bibliometric techniques, which are discussed in Section 3. The bibliometric analysis conducted in the systematic literature review involved the extraction and analysis of key information from the 393 papers meeting the inclusion criteria.The analysis aimed to understand publication trends, frequently adopted algorithms, datasets, limitations, and future challenges in the field of IDS.Key information extracted from the papers included: (1) Publication trends in the field of IDS from 2018 to 2023, indicating the number of published papers each year.(2) Frequently adopted ML, DL, and optimization algorithms for intrusion detection.(3) Utilized benchmark datasets for evaluating IDS.(4) Limitations and future challenges identified in the reviewed studies.To assess the significance and impact of the identified studies, bibliometric analysis techniques were used to visualize research themes, prominent authors, and frequently studied algorithms like CNN, SVM, XGBoost, and GA.Additionally, the structured taxonomy of the review techniques were categorized into four broad categories: ML, DL, optimization algorithms, and datasets, with detailed sub-categorizations to summarize advancements within each domain.The bibliometric analysis provided insights into the relative effectiveness and limitations of different algorithms and datasets, consolidating scattered developments into an organized synthesis to benefit researchers and practitioners working on ML-and DL-driven IDS.

Comprehensive science mapping analysis
Finding crucial information in previous studies has become increasingly challenging due to the growing volume of contributions and applied research.Keeping pace with this vast stream of theoretical and practical input can be daunting.Managing the vast literature has posed a significant challenge.To help organize the outcomes of previous investigations, summarize key issues, and identify potential research gaps, some academics recommend employing the PRISMA approach.Systematic reviews, in comparison, bolster the study's framework, contribute to the existing body of knowledge, and amalgamate the literature's findings.Nonetheless, systematic reviews still contend with issues of impartiality and reliability, as they depend on the authors' perspective to restructure the results of previous investigations.Various studies have proposed the following measures to enhance transparency when summarizing the findings of earlier studies.

Annual scientific production
In the previous 10 years, reliable ML in intrusion detection has advanced.In particular, the annual scientific output depicted in Figure 2 provides an explanation for the emergence of earlier theoretical and practical investigations on reliable ML.The annual scientific output for intrusion detection is depicted in Figure 2. The number of papers published in 2018 and 2019 reached approximately 23 papers, it can be seen that the quantity of publications has significantly expanded in recent years.There was an increase in the number of articles published in 2020 and 2021.In 2022, the number of articles grew even further, reaching a notable high of 138 papers.This pattern persisted in 2023, where 95 papers were published.The increase in research output suggests a growing interest and emphasis on the development and improvement of IDS over the years.Furthermore, specific authors have contributed significantly to this field, with some focusing on optimization and feature selection based on intrusion detection, while others have concentrated on IDS for Internet of Things (IoT) based on DL.These authors have achieved high accuracies in their research, indicating the advancement and effectiveness of the techniques employed in IDS.Overall, the observed publication trends demonstrate a substantial growth in research output in the field of IDS from 2018 to 2023, reflecting an increasing focus on enhancing the reliability and effectiveness of intrusion detection techniques.Figure 3 shows authors' production over time, where Motwakel [4][5][6][7][8][9][10] published seven papers in 2022 and 2023 and he focused in his research on optimization and feature selection based on intrusion detection.In his research, the highest accuracy he reached was 99.87 using sand paper optimization.As for Al_Qaness [11][12][13][14][15], he published five papers in 2021-2023) and he focused on IDS for IOT based on DL.In his research, he reached the highest Systematic literature review on intrusion detection systems  5 accuracy of 99.997 using swarm intelligence optimization.Bacanin [16][17][18][19][20] has published five papers from 2020 to 2023 and he focused on optimization algorithms and feature selection based on intrusion detection in his research, the highest accuracy he reached was 99.6878 using GOA and MPO.Chen [21,22] published five papers from 2020 to 2023 and he focused on intrusion detection using hybrid algorithms, the highest accuracy he reached was 99.44 using COBYLA optimization.Dahou [11][12][13][14][15] published five papers, one paper in 2021, two papers in 2022, and two papers in 2023 and he focused on IOT IDS using DL and optimization in his research, the highest accuracy he reached was 99.997 using swarm intelligence optimization.Fang [23][24][25][26][27] published five papers in 2018, 2020, and 2022 and he focused on optimization algorithm for feature selection of network intrusion detection in his research, the highest accuracy he reached was 97.89 using WOA and OPS optimization.Hilal [4,[28][29][30][31] published five in 2022 and 2023, he focused on DL algorithms optimization based on intrusion detection in his research, the highest accuracy he reached was 99.77 using optimization IFSO-FS.Zivkovic [16][17][18][19][20] published five papers, three papers in 2022 and two papers in 2023, he focused on intrusion detection for optimization algorithms and feature selection in his research, the highest accuracy he reached was 99.6878 using GOA and MPO.Dahou [32][33][34][35] published four papers in 2020, 2022, and 2023) and he focused on IDS for optimization algorithms in his research, the highest accuracy he reached was 100 using Artificial organism algorithm (AOA) optimization.In addition to the mentioned authors, Al-Janabi [36][37][38][39] has contributed significantly by publishing four papers, with one in 2020 and three in 2021.His research was focused on IDS, optimization algorithms, and feature selection.Remarkably, his work achieved the highest accuracy of 100% using NTLBO optimization.Table 1 presents a comprehensive overview of the most influential authors in the field.Each of these authors has demonstrated exceptional achievements by reaching the highest accuracy through the utilization of classification and optimization algorithms.

Three-field chart
A three-field chart is used to display data with three parameters.In this representation, the left field corresponds to the Research Title (RT), the middle field represents the Journal in which the Research is published or source (SO), and the right field contains the Researcher's Name (RN). Figure 4, is utilized to examine the  relationships between these three parameters.According to the study, the RT on the left side is most frequently cited by Scopus, IEEE Xplore (IEEE), and WoS, as observed in the middle field (SO) of Figure 4. Furthermore, among the Research Titles (TI) that focus on the subject of reliable and understandable ML, the Scopus journal stands out as the most prominent.Additionally, as indicated in the corresponding box (TI), when considering all keywords, the journals listed in the middle field (SO) most frequently match the most popular keywords, which include "IEEE Access," "Sensors," "Cluster Computing," "Neural Computing and Application," and "Soft Computing."

Word cloud
This study has effectively identified the most popular and crucial keywords from previous research using a word cloud.In order to provide a comprehensive overview of these keywords and reorganize the information, Figure 5 presents these essential keywords extracted from the results of previous studies.In Figure 5, the keywords are displayed in different sizes, with the size indicating their frequency in the literature.Larger keywords are more prevalent, while smaller keywords occur less frequently.Based on the term frequency illustrated in Figure 5, "ID," "DL," and "ML" emerge as some of the most frequently discussed topics in the field of trustworthy ML, with "DL" having the highest frequency.The image also highlights the significance of "IDS" and "Intrusion Detection System" as critical topics in this area.Additionally, other related terms with relatively high frequencies include "classification," "optimization," and "feature selection," emphasizing the importance  of considering these factors when designing and implementing ML systems.Figure 5 also showcases various specific applications of ML across a range of algorithms, such as GA, SVM, and PSO.Furthermore, it includes several methodologies related to ML, including "Internet of Things," "network security," and "artificial intelligence."The word cloud of trustworthy ML articles reveals the broad and diverse nature of this field, covering a wide range of subjects.

Co-occurrence
Another method employed in bibliometric analysis is the co-occurrence network, which involves studying common words in earlier research.This semantic network offers valuable insights into the conceptual framework of a specialized field, benefiting practitioners, policymakers, and academics.Figure 6 presents information specifically related to a co-occurrence network based on the titles of reliable ML articles.
The network is composed of nodes, representing individual words in the titles, and edges connecting the nodes indicate how frequently these words appear together in a title.Figure 6 displays various nodes, including the clusters to which they belong and their proximity, a metric for gauging how well-connected a node is to other nodes in the network.Evidently, the nodes are divided into eight distinct clusters, with each cluster comprising words associated with a specific theme or idea related to trustworthy ML.For instance, Cluster 1 features terms like "support vector machine," "DT," "Extreme learning machine," "cyber security," "genetic algorithm," and "bat algorithm."These terms suggest a connection to the establishment of dependable ML systems in optimization.Cluster 2, on the other hand, includes words like "neural network," "artificial intelligence," "feature extraction," "random forest," "anomaly detection," and "network security," signifying a focus on IDS.Similarly, other clusters are linked to subjects such as clustering, the IoT, the NSL-KDD dataset, ML, and IDS.The closeness of a node within the network measures its centrality, and nodes with higher closeness values are more central to the topic of trustworthy ML, signifying their importance within the network.This figure offers an overview of the relationships between different concepts and words associated with trustworthy ML, as derived from the titles of papers in the field.This information is valuable for understanding the current state of research in this area and for identifying areas where further investigation is warranted.

Findings and analysis: A taxonomy
The final set of papers met both the considered inclusion and exclusion criteria, indicating that ML in intrusion detection has been identified through the conducted procedure (see Section 2.2).Additionally, out of the 393 articles included, these were categorized into three broad categories.After analyzing each category, efforts were made to identify or generate subcategories using a variety of reliable ML algorithms in the context of intrusion detection.Within the first major category of 393 papers, we found 1. ML, comprising 249 papers 2. DL, with 144 papers 3. Optimization algorithms, covering all 393 papers 4. Dataset.
We also have a section displaying the taxonomy of subdivisions in Figure 7, which includes ML, DL, Optimization, and Dataset.
These categories are discussed here comprehensively, aiming to offer academics and practitioners valuable insights on trustworthy ML in intrusion detection.This endeavor is reported to enhance the reliability of ML in intrusion detection.Consequently, 249 out of the 393 articles fall under this category within the context of intrusion detection.

ML
Arthur Samuel originally described ML in 1959 as a branch of research that allows computers to learn without requiring them to first be programmed.This section describes network ID strategies with a focus on ML algorithms used to create security tools.In recent years, ML has gained increasing importance in IDS for computer networks [40][41][42].The foundation of this lies in the model for training and prediction, which has the capability to quickly identify both attacks and typical cases [32].The feature selection process can be considered as data preprocessing for ML algorithms.Intrusion detection can involve two types of classification: two-class, where intrusions are detected based on class labels, and multi-class, which categorizes attacks into different classes.ML techniques like Random forest (RF), SVM, ELM, and Naive Bayes classifiers can be applied in this field, as well as methods such as Self-Organizing Maps, Fuzzy clustering, and K-Means clustering.Figure 8 show the classification of ML algorithms.In the reviewed literature, several ML and DL algorithms have emerged as the most frequently used in intrusion detection research.Specifically, CNN, SVM, DTs, and GA have been prominently featured in the analyzed studies.Comparatively, these algorithms have demonstrated varying levels of popularity and effectiveness in the context of intrusion detection: (1) CNN: CNN has gained significant popularity and has been widely utilized in intrusion detection research due to its effectiveness in learning hierarchical representations of data, particularly in image-based intrusion detection scenarios.(2) SVM: SVM has also been frequently used and is known for its effectiveness in binary classification tasks, making it a popular choice for intrusion detection applications.(3) DTs: DTs have been commonly employed for their interpretability and ease of understanding, making them popular in certain intrusion detection contexts, especially when explainability is a priority.(4) GA: While GAs have been utilized, they may not be as prevalent as CNN, SVM, and DT in intrusion detection research.However, they offer the advantage of optimization and search capabilities, which can be beneficial in certain scenarios.In terms of effectiveness, the reviewed literature may provide insights into the comparative performance of these algorithms in specific intrusion detection contexts, such as their accuracy, precision, recall, and F1 scores.Additionally, the specific datasets and features used in the studies may influence the relative effectiveness of these algorithms.Overall, while CNN, SVM, and DT have emerged as popular and effective choices in intrusion detection research, the comparative effectiveness of these algorithms may vary depending on the specific context, dataset, and evaluation metrics used in the reviewed studies.

SVM
In 1998, Vapnik Chih-Fong Tsai introduced the SVM.The SVM begins by transforming the input vector into a higher-dimensional feature space and subsequently identifies the optimal separating hyperplane.What distinguishes the SVM is its creation of a decision boundary, or separation hyperplane, using support vectors rather than the entire training sample.This property makes it highly robust against outliers.SVM classifiers are tailored for binary classification, meaning they are designed to divide a set of training vectors into two distinct classes.It is worth noting that the support vectors represent the training samples at the decision boundary.Additionally, the SVM incorporates a user-specified parameter known as the penalty factor, allowing users to strike a balance between the number of incorrectly classified samples and the width of the decision boundary [1,4,43,44].
Table 2 highlights the top five authors globally, who utilized SVM algorithms in their research, each achieving the highest accuracy using SVM classification and optimization algorithms.Alqarni [45] achieved the highest accuracy of 100%, followed by Aljanabi and Ismail [36] at 100%.Lavanya and Kannan [46] reached an accuracy of 99.98%, while Dwivedi et al. [47] achieved 99.89%, and Liu et al. [48] reached an accuracy of 99.88%.

DT
Chih-Fong Tsai utilizes a sequence of decisions to categorize a sample, with each decision influencing the subsequent one.These decisions are represented in the form of a tree structure.When classifying a sample, you start at the root node and traverse the tree until you reach an end leaf node, each of which represents a distinct classification category.At each node, the sample's characteristics are considered, and the branch value matches the attributes.Classification and Regression Tree (CART) is a well-known tool for creating DTs.A classification tree employs discrete (symbolic) class labels, while a regression tree deals with continuous (numeric) attributes [48].
Many researchers used DT algorithms in their research.Table 3 highlights the top five authors worldwide, each achieving the highest accuracy using DT classification and optimization algorithms.Dahou [32] achieved the highest accuracy at 100%, followed by Injadat et al. [49] at 99.99%.Mousavi et al. [50] reached an accuracy of 99.92, Maza and Touahria [51] reached an accuracy of 99.83%, and Mahmood et al. [52] reached an accuracy of 99.36%.

ELM
The ELM approach, introduced by Huang et al., is known for its speed and simplicity as it does not require iterative training.It consists of three layers: the input layer, a single hidden layer, and the output layer.ELM is specifically a single hidden layer feedforward neural network (SLFN) because it employs only one hidden layer.It excels at solving complex nonlinear mapping problems, and its adaptive training sets random input weights and biases for a number of nodes in the hidden layer utilized ELM algorithms in their research [36].In Table 4, we highlight the top five authors globally, each achieving the highest accuracy using ELM classification and optimization algorithms.Al-Janabi and Ismail [36] achieved the highest accuracy at 100%, ElDahshan et al. [53] achieved the highest accuracy at 100%, followed by.Vaiyapuri et al. [54] reached an accuracy of 99.63%, while Ghasemi et al. [55]  ).Each learner learns from the errors of the preceding one, and the combination of many weak learners (often hundreds) forms a powerful final model [57].The authors employed boosting algorithms in their research.Table 5 present the top five authors globally, each achieving the highest accuracy using Boosting classification and optimization algorithms.Dahou [32] reached the highest accuracy at 100%, Kilincer et al. [58] attained the accuracy at 99.98%, followed by Xu and Fan [59] who achieved an accuracy of 99.92%.Bacanin et al. [16] reached an accuracy of 99.65% and Zivkovic et al. [17] reached an accuracy of 99.68%.KNN is simple to use and effective for large datasets.

RF
RF is an ensemble method that combines multiple DTs to enhance model effectiveness.Bagging is employed to divide data into subsets, and DTs are built from these subgroups.RF is known for its low classification errors and absence of overfitting issues.Individual trees in the forest are constructed using bootstrap samples from the dataset.The Gini impurity measurement is used to determine the optimal node for splitting, and the model includes a maximum of 25 trees [67].
The authors utilized KNN and RF algorithms in their research.Table 6 presents the top five authors globally, each achieving the highest accuracy using KNN and RF classification and optimization algorithms.Gaber et al. [60] achieved the highest accuracy at 99.99%, followed by Samawi et al. [61] at 99.98%.Mohi-ud-din et al. [27] reached an accuracy of 99.95%, Bangui and Buhnova [62] reached an accuracy of 95.6%, and Mahmood et al. [52] reached an accuracy of 99.36%.

Naïve Bayes (NB)
NB employs a probabilistic approach based on Bayes theorem and conditional probability calculations.It is referred to as "naïve" due to the simplifying assumption of predictor variable independence, meaning it assumes that all attributes are unrelated to each other.This class of methods includes those offering categorization functions without explicitly producing a tree or set of rules [68].Many researchers used the NB algorithm in their research.Table 7 presents the top five authors worldwide, each achieving the highest accuracy using NB classification and optimization algorithms.Shitharth et al. [63] achieved the highest accuracy at 99.99%, followed by Devi and Singh [64] at 99.91%.Kunhare et al. [57] reached an accuracy of 99.32%, while Samriya et al. [65] achieved and accuracy of 99.5%, and Iwendi et al. [66] achieved an accuracy of 98.81%.

DL
DL, a subcategory of ML, consists of multiple hidden layers and finds applications in various domains, including image processing and natural language processing.It excels in understanding the meaning of vast multidimensional data, performing feature selection, classification, and uncovering data correlations, particularly in speech recognition and language processing [69] (Figure 9).
The utilization of ML and DL techniques has significantly evolved in the development of IDS from 2018 to 2023.A systematic review of 393 studies revealed that ML and DL techniques have demonstrated significant potential in enhancing the reliability and effectiveness of IDS.The review identified frequently used algorithms, with CNN, SVM, DTs, and GA emerging as the top methods.Tables 2 and 3 and and 4 in the review indicate that SVM, DT, and ELM algorithms exhibit superior performance, particularly with the KDD Cup 1999 and NF-Bot datasets, both for multi-class and binary classification, as assessed by accuracy.In the realm of DL algorithms, Table 8 in the review showcases improved outcomes with the CNN algorithm and the NLS-KDD L dataset compared to Table 9, which demonstrates lower results with the RNN algorithm and the CICIDS2017 dataset, once again, gauged by accuracy.Overall, the utilization of ML and DL techniques has evolved significantly in the development of IDS, with CNN, SVM, DT, and GA emerging as the top methods.These techniques have demonstrated significant potential in enhancing the reliability and effectiveness of IDS.

CNN
The supervised learning algorithm CNN [47] is built upon the foundation of conventional artificial neural networks.CNN excels in strong feature extraction and efficiently analyzes high-dimensional data using shared convolutional kernels.While multilayer FNN has some drawbacks such as slow learning rates and susceptibility to overfitting, leveraging CNN features like local field perception, weight sharing, and pooling can enhance learning, expression, and neural network performance.Local field perception significantly reduces the number of weight parameters required for training, while weight sharing further reduces the training parameters.Additionally, pooling layers result in smaller-sized and dimension features.Many researchers employed the CNN algorithm.Table 8 presents the top five authors globally, each achieving the highest accuracy using CNN classification and optimization algorithms.Vijayalakshmi et al. [70] achieved a perfect accuracy of 99.99% followed by Fatani et al. [11] at 99.99%, Prabhakaran and Kulandasamy [67] at 99.98%, and Om Kumar et al. [71] at 99.95%.Chen et al. [22] achieved an accuracy of 99.84%.

RNN
Feedforward neural networks, the predecessors of RNNs [75], possess internal memory to handle input sequences of varying length.An RNN typically comprises an input layer, an output layer, and multiple hidden layers, often referred to as memory units.Each hidden layer depends on the output of preceding input layers and its current input to identify patterns in data.RNNs have found applications in various domains, including speech recognition, handwriting identification, sentiment analysis, and human activity recognition.They excel in handling sequential data due to their effectiveness with contextual information.RNNs have also been employed in intrusion detection and classification; however, they often face vanishing gradient problems 1.LSTM, a variant of RNN, was introduced to address the vanishing gradient issue.It consists of an input gate, an output gate, and a forget gate, allowing it to manage both single and series of input data.LSTMs find applications in areas such as speech recognition, handwriting recognition, and intrusion detection [69].2. Another RNN variation, GRU, utilizes a gating mechanism to handle sequential data.Unlike LSTM, GRU incorporates two gates, an update gate and a reset gate.Update gates capture long-term dependencies in input sequences, while reset gates focus on short-term dependencies.GRU is suitable for handling input sequences with substantial time steps and is applied in domains like signal processing, music modeling, and natural language processing [69].

Optimization (OP) algorithms
OP algorithms [76] is often the most efficient and accurate method for solving problems, although its definition can vary by context.In mathematics, it involves exploring the behavior of a problem by adjusting values within a specified range to either minimize or maximize a function.Optimization processes hold a significant role in DL, where various optimization functions are employed to minimize or maximize error functions.These functions have been developed in diverse environments.Figure 10 presents the most important OP algorithms used in research.
OP algorithms have been extensively integrated into intrusion detection research, with several studies focusing on OP algorithms for feature selection and DL algorithms based on intrusion detection.The review identified GA, ACO, and GWO as significant OP algorithms, consistently delivering high results across Tables 10, 12, and 14 .For instance, Fang published five papers between 2018 and 2022, focusing on OP algorithms for feature selection of network intrusion detection.The highest accuracy he reached is 97.89 using WOA and OPS optimization.Similarly, Zivkovic published five papers between 2018 and 2023, focusing on intrusion detection for OP algorithms and feature selection.The highest he reached is 99.6878 using GOA, MPO optimization.The integration of OP algorithms has contributed significantly to the overall performance of IDS.For instance, the KDD Cup 1999 dataset, which is one of the most widely utilized in the field of intrusion detection, has been used in comparison to the PSO algorithm, with Table 11 indicating a marginal difference of 0.1.Overall, OP algorithms have been extensively integrated into intrusion detection research, with GA, ACO, and GWO emerging as significant OP algorithms.These algorithms have contributed significantly to the overall performance of IDS, enhancing their reliability and effectiveness.

GAs
One of the most commonly used evolutionary metaheuristic algorithms for IDS design in the literature is GAs [22].Hogue utilized GAs to develop an IDS capable of effectively identifying various types of network intrusions, and his work has been published.This strategy incorporates an evolutionary information evolution mechanism for processing traffic data.The KDD Cup 99 standard dataset served as the foundation for developing and evaluating this IDS, with the results demonstrating a reasonable detection rate.To provide a comprehensive perspective, this IDS was compared with numerous other techniques.In a similar vein, a piece of work based on GA fuzzy-class association mining was presented by Dwivedi et al. [47].Many of the rules essential for creating an intrusion detection model are generated using GAs.Instead of generating every possible rule that satisfies the criteria for misuse detection, an association rule mining technique is employed to identify a sufficient number of key rules aligned with the user's goals.In an experimental study using the KDD Cup 99 intrusion detection dataset, Ibrahim Hayat Hassan proposed a method that exhibited a higher detection rate compared to traditional data mining approaches.
Many researchers applied the GA to enhance the performance and achieve higher accuracy.Table 10 showcases the top five authors in the field, each achieving the highest accuracy using classification and GA.Notably, Duo et al. [68] reached a remarkable accuracy of 100%, while Aljanabi and Ismail [36] also achieved a 100% accuracy rate.Additionally, Gorzałczany and Rudzinski [77] reached a perfect accuracy of 100%, and Injadat et al. [49] achieved an accuracy rate of 99.99%.Finally, Mahmood et al. [52] reached an accuracy rate of 99.36%.

PSO
Lavanya and Kannan introduced PSO [46], a technique inspired by the behavior of birds in a flock, which guides particles to explore the optimal global solution.PSO is generally easier to implement than GA due to the absence of evolutionary operators.
The authors employed a PSO algorithm.Table 11 highlights the top five authors globally, each achieving the highest accuracy using PSO for classification and to enhance their work's performance and achieve high accuracy.Notably, Injadat et al. [49] stands out with an accuracy of 99.99%, followed closely by Gaber et al. [60] OP GWO ABC ACO PSO GeneƟc   the number of features to a minimum, even if they are carefully chosen and relevant, does not always lead to higher accuracy.Instead, it is essential to select the right quantity of important and relevant features, which may even be a large number, to enhance the performance of ML models.

ACO
This algorithm is inspired by the real-world behavior of ants [51], which seek the shortest route between their colony and food sources, and ACO has been developed.ACO emulates the way ants communicate through pheromones within their population to discover the most optimal search space solution.It has been effectively employed to tackle discrete optimization challenges.ACO also offers an intriguing approach to feature selection for IDS, although its current application is somewhat limited.Many researchers utilized the ACO algorithm to enhance the performance and achieve high accuracy in their work.Table 12 presents the top five authors globally, each achieving the highest accuracy using ACO for classification and OP algorithms.Notably, Alqarni et al. [45] attained a remarkable accuracy of 100%, followed closely by Mousavi et al. [50] at 99.92%.Samriya et al. [21] achieved an accuracy of 99.5%, while Bangui and Buhnova [62] reached 95.6% accuracy.Thakkar and Lohiya [69] reached 90.6% accuracy.

ABC
The inspiration for the ABC algorithm stems from the foraging behavior of bees [78].Among the available solutions, ABC aims to locate the optimal one.The beehive consists of three types: scout bees, employed bees, and observer bees.These bees collaborate in various tasks, such as work distribution, food source selection, reproduction, scouting for the best food sources, and performing waggle dances to communicate the location of the optimal food sources.Initially, food sources are selected from the available options within the population.Employed bees then undertake random searches to discover superior food sources compared to those initially assigned to them.
Many researchers utilized the ABC algorithm to enhance the performance and achieve high accuracy.Table 13 shows the top five authors globally, each achieving the highest accuracy using ABC for classification and OP algorithms.Notably, Bacanin et al. [18] achieved an impressive accuracy of 99.65%, while Soni et al. [78] reached 97.42% accuracy.Mahboob et al. [79] achieved an accuracy of 97.23%, followed by Kalaivani et al. [80] with 97% accuracy, and Thakkar and Lohiya [69] reached 90.6% accuracy.

GWO
ML models often utilize meta-heuristic algorithms inspired by nature .One such algorithm, GWO, was introduced by Mirjalili et al. in 2014.GWO draws inspiration from the social structure and clever hunting tactics of grey wolves.In the natural world, grey wolves typically travel in packs consisting of 5-12 individuals.The GWO algorithm emulates the hunting behavior and leadership structure of these wolves [80].
Many researchers like Swarna Priya employed the GWO algorithm to enhance their work's performance and achieve high accuracy.Table 14 showcases the top five authors globally, each achieving the highest accuracy using GWO for classification and OP algorithms.Notably, ElDahshan et al. [53] attained a remarkable accuracy of 100%, while Alzubi et al. [33] reached an accuracy of 99.22%.Davahli et al. [81] achieved an accuracy of 99.10% and Swarna Priya et al. [82] reached 99.9% accuracy.Kunhare et al. [83] achieved an accuracy of 97.894%.
Many other researchers employed a range of OP algorithms in their research, including Firefly, Harris Hawk, Multi-verse optimizer, Whale OP algorithm, Cuckoo search, Bat algorithm, AOA, and more [84][85][86][87][88][89][90][91].The systematic review provides a structured synthesis of the state-of-the-art in intrusion detection research by consolidating and analyzing a comprehensive set of 393 papers meeting the inclusion criteria.Through bibliometric analysis and categorization, the review offers key insights and overarching themes derived from the analysis of the reviewed literature: 1. Publication trends: The review identifies increasing publication volumes in the field of intrusion detection, indicating a growing interest and research activity in this domain.2. Frequently adopted algorithms: The review highlights the dominance of specific ML and DL algorithms, such as SVM, CNN, DTs, and GA, as leading techniques for intrusion detection.3. Utilized datasets: The review emphasizes the significance of benchmark datasets, including KDD Cup 1999, NSL-KDD, UNSW-NB15, and CICIDS2017, as commonly used resources for evaluating intrusion detection models.4. Challenges and limitations: The review identifies challenges and limitations, such as limited availability of labeled datasets, imbalanced datasets, adversarial attacks, interpretability, explainability, and scalability, which influence the overall effectiveness and reliability of IDS. 5. Future research directions: The review suggests future research directions, including the exploration of DL methods, addressing computational complexity, enhancing model interpretability, and evaluating diverse new datasets.
By synthesizing these key insights and overarching themes, the review provides a comprehensive overview of the current state-of-the-art in intrusion detection research.It offers valuable guidance for researchers and practitioners, enabling them to understand the prominent trends, challenges, and potential areas for further investigation in the field of IDS.

Dataset
The datasets encompass fields containing both unprocessed and processed data extracted from underlying network traffic [90].These data are typically generated through studies aimed at identifying network intrusions.An intentional effort is made to manipulate the data, creating adversarial examples capable of deceiving classifiers and detection systems.When creating adversarial instances that alter the source data in network security applications, caution is essential, as highlighted by [90].The most prominent datasets utilized in research include KDD Cup99, NSL-KDD, CICIDS 2017, UNSW-NB15, AWID, Kaggle, and TON-IOT.Table 15 provides an overview of the most crucial datasets commonly used in the field of intrusion detection [37][38][39][91][92][93][94][95][96][97].
The most commonly used datasets in IDS are as follows: 1. KDD Cup 1999: This dataset stands as one of the most widely utilized in the field of intrusion detection.It comprises a substantial and diverse collection of network traffic data, encompassing potential attacks.2. NSL-KDD: An enhanced iteration of the KDD Cup 1999 dataset, it offers a more demanding and realistic environment for testing intrusion detection models.3. UNSW-NB15: This dataset consists of samples of network traffic extracted from real-world internet environments, providing a formidable challenge for detecting advanced attacks.4. CICIDS2017: This dataset encompasses a diverse range of data reflecting different attack scenarios, serving as an invaluable resource for evaluating the performance of intrusion detection models.
The reviewed studies have frequently utilized several benchmark datasets for evaluating IDS.The most commonly used datasets include: (1) KDD Cup 1999: This dataset is one of the most widely utilized in the field of intrusion detection.It comprises a substantial and diverse collection of network traffic data, encompassing potential attacks.(2) NSL-KDD: An enhanced iteration of the KDD Cup 1999 dataset, it offers a more demanding and realistic environment for testing intrusion detection models.(3) UNSW-NB15: This dataset consists of samples of network traffic extracted from real-world internet environments, providing a formidable challenge for detecting advanced attacks.(4) CICIDS2017: This dataset encompasses a diverse range of data reflecting different attack scenarios, serving as an invaluable resource for evaluating the performance of intrusion Large volume of data.
Yousef [75] The dataset exhibits class imbalance, with an uneven distribution between the dominant and minority classes in the database.154 features detection models.The utilization of these benchmark datasets has influenced the comparability of research outcomes by providing a standardized basis for evaluating the performance of IDS.Researchers can compare the effectiveness of different algorithms and techniques using these commonly accepted benchmark datasets, thereby facilitating the assessment of the reliability and generalizability of intrusion detection models.

Discussion
This section delves into the findings derived from comparing various ML algorithms.Notably, Tables 2-4 indicate that ML models exhibit superior performance when using SVM, DT, and ELM algorithms, particularly with the KDD Cup 1999 and NF-Bot datasets, both for multi-class and binary classification, as assessed by accuracy.In contrast, Table 5 presents slightly lower results when employing the XGBoost algorithm, and the CICIDS2017 dataset is utilized for multi-class classification.Tables 6 and 7 reveal results nearly identical to Table 5 with the RF and NB algorithms.Therefore, SVM, DT, and ELM algorithms outperform RF, NB, and XGBoost, though the margin is relatively small, typically within the range of 0.1-0.2 in terms of accuracy.In the realm of DL algorithms, Table 8 showcases improved outcomes with the CNN algorithm and the NLS-KDD L dataset compared to Table 9, which demonstrates lower results with the RNN algorithm and the CICIDS2017 dataset, once again, gauged by accuracy.Notably, GA, ACO, and GWO stand out as significant OP algorithms, consistently delivering high results across Tables 10 and 12 and and 14.For instance, the KDD CUP 99 and CICIDS2017 datasets are used in comparison to the PSO algorithm.While Table 11 indicates a marginal difference of 0.1, Table 13 reveals a discrepancy of approximately 0.34 when the ABC algorithm is utilized with the UNSW-NB15 dataset.The KDD CUP 99 dataset emerges as the most frequently employed in conjunction with ML and DL algorithms, signifying that these algorithms exhibit great potential for enhancing intrusion detection in both binary and multi-class classification scenarios.

Conclusion
This systematic review presents a structured synthesis of research on ML and DL techniques for intrusion detection published over the past 5 years.An analysis of 393 studies reveals a noticeable increase in publication volumes, indicating a growing interest in this field.The mapping of frequently used algorithms highlights SVM, CNN, DTs, and GA as dominant techniques.The most commonly used public datasets include KDD Cup 1999, NSL-KDD, CICIDS2017, and UNSW-NB15.The review methodology integrates findings from multiple studies to provide a holistic overview of the current state-of-the-art.The results can inform future research by identifying promising techniques and gaps for further investigation.For instance, DL methods show potential but require ongoing exploration.Aspects such as computational complexity, model interpretability, and evaluation on diverse new datasets require further attention.Overall, this review provides a valuable reference that captures the current landscape of intelligent intrusion detection techniques and datasets, helping researchers position their work in this evolving research domain and select appropriate methodologies for comparative evaluation.The conclusion of the systematic literature review on IDS presents key findings, insights, and implications derived from the research, emphasizing the significance of the study's outcomes in the broader context of the research area.The conclusion highlights the following key results, insights, and implications: (1) Key Results: Increasing publication trends: The review reveals increasing publication trends in the research domain of IDS, indicating the growing interest and significance of the field.
Frequently used algorithms: The study identifies CNN, SVM, DTs, and GA as the top methods frequently used in intrusion detection research, providing insights into the prevalent algorithms in the field.-Commonly used datasets: The review identifies widely utilized datasets such as KDD Cup 1999, NSL-KDD, UNSW-NB15, and CICIDS 2017, emphasizing the importance of diverse and realistic datasets for evaluating intrusion detection models.(2) Insights: ML and DL Techniques: The review underscores the significant potential of ML and DL

4. 1 . 5
KNN and RF 4.1.5.1 KNN KNN is a supervised classifier where data are divided into K clusters based on the Euclidean distance between data points.The data points with the smallest distance are grouped together due to their shared properties.
and multi-class classification It took a long time to reach the highest accuracy.4. Injadat et al. [49] 99.99 DT PSO AND Genetic CICIDS 2017 Optimal parameter Multi-class classification The model faces challenges in detecting attack patterns and behaviors.5. Mahmood et al. [52] 99.36% DT, SVM, KNN PSO and Genetic NSL-KDD Feature selection Multi-class classification The experimental results suggest that reducing

Table 1 :
Highest accuracy studies

Table 2 :
Highest accuracy of SVM algorithm studies
4.1.4Boosting (Light gradient boosting machine; LGBM, XGBOOST, Gradient boosting decision tree; GBDT) Boosting is a potent ensemble learning technique widely applied in IDS to enhance the performance of individual weak classifiers.It combines multiple weak classifiers to construct a strong classifier capable of effectively identifying intrusions.Notable boosting algorithms include LGBM, GBDT, and XGBoost.XGBoost, initially proposed by Tianqi Chen has gained widespread acceptance among researchers and developers.This technique applies boosting to machines, utilizing numerous weak learners like shallow DTs (typically of depth 1 or 2

Table 4 :
Highest accuracy of ELM algorithm studies

Table 5 :
Highest accuracy of Boosting algorithm studies

Table 6 :
Highest accuracy of RF algorithm studies

Table 7 :
Highest accuracy of RF algorithm studies

Table 8 :
Highest accuracy of CNN algorithm studies

Table 9 :
Highest accuracy of RNN algorithm studies

Table 10 :
Highest accuracy of GA algorithm studies

Table 11 :
Highest accuracy of PSO algorithm studies

Table 12 :
Highest accuracy of ACO algorithm studies

Table 13 :
Highest accuracy of ABC algorithm studies

Table 14 :
Highest accuracy of GWO algorithm studies

Table 15 :
Most commonly used datasets Denial of Service , Remote to Local, User to Root, and probing.The dataset comprises 41 features and one label, which provides information about the type of attack.Public Due to the limited availability of publicly accessible datasets for network-based IDS, this updated version of the KDD dataset still has some issues and may not fully represent the current real networks.IDS-2017 dataset, created by the Faculty of Computer Science, encompasses both regular and various attack data within network traffic.25,00,000 records https://www.unb.ca/cic/datasets/ids- PrivateThe AWID intrusion dataset encompasses various data types, including discrete, continuous, and symbolic (nominal) data with a wide range of values.These data variations pose a challenge for classifiers, even for high-performing classifiers like SVM, to effectively train on normal and abnormal patterns.