An efficient intrusion detection system for IoT security using CNN decision forest

The adoption and integration of the Internet of Things (IoT) have become essential for the advancement of many industries, unlocking purposeful connections between objects. However, the surge in IoT adoption and integration has also made it a prime target for malicious attacks. Consequently, ensuring the security of IoT systems and ecosystems has emerged as a crucial research area. Notably, advancements in addressing these security threats include the implementation of intrusion detection systems (IDS), garnering considerable attention within the research community. In this study, and in aim to enhance network anomaly detection, we present a novel intrusion detection approach: the Deep Neural Decision Forest-based IDS (DNDF-IDS). The DNDF-IDS incorporates an improved decision forest model coupled with neural networks to achieve heightened accuracy (ACC). Employing four distinct feature selection methods separately, namely principal component analysis (PCA), LASSO regression (LR), SelectKBest, and Random Forest Feature Importance (RFFI), our objective is to streamline training and prediction processes, enhance overall performance, and identify the most correlated features. Evaluation of our model on three diverse datasets (NSL-KDD, CICIDS2017, and UNSW-NB15) reveals impressive ACC values ranging from 94.09% to 98.84%, depending on the dataset and the feature selection method. Notably, our model achieves a remarkable prediction time of 0.1 ms per record. Comparative analyses with other recent random forest and Convolutional Neural Networks (CNN) based models indicate that our DNDF-IDS performs similarly or even outperforms them in certain instances, particularly when utilizing the top 10 features. One key advantage of our novel model lies in its ability to make accurate predictions with only a few features, showcasing an efficient utilization of computational resources.


INTRODUCTION
The accelerated advancement of technology and its increasing integration into various aspects of our lives have led to a surge in the proliferation of smart interconnected objects.
In this landscape, IoT has emerged as a propelling force behind numerous industries and large-scale enterprises (Xu, He & Li, 2014).It incorporates an array of sensors, actuators, and interconnected objects, reducing or even eliminating the need for human intervention (Delsing et al., 2016).However, its ubiquity has also rendered IoT a magnet for malicious attacks for various reasons (Hassija et al., 2019), in addition to facing inherent threats and vulnerabilities.The increasing reliance and growth of IoT highlight the urgency to ensure security and reliability within this interconnected network, leading to a surge in research focused on IDS and intrusion detection techniques (Albulayhi et al., 2021;Sohn, 2020;Shamshirband et al., 2020;Lakshminarayana, Philips & Tabrizi, 2019).
The security of IoT is of utmost importance in our interconnected world today.Sophisticated techniques are employed by IoT systems to identify and thwart potential threats, with IDS playing a crucial role in protecting IoT ecosystems (Mohy-eddine et al., 2023;Hazman et al., 2023;Thakkar & Lohiya, 2021;Sharma, Sharma & Lal, 2019).Three primary types of IDS are prevalent (Liao et al., 2013).Anomaly-based detection, which uses machine learning and data analytics, establishes normative patterns for typical IoT network behavior (Alsoufi et al., 2021).Any deviations from this norm trigger alerts, enabling quick responses to suspicious activities.Signature-based detection involves scanning network traffic against a database of known attack patterns (Masdari & Khezri, 2020), facilitating the identification of threats encountered previously.Behavior-based detection monitors the actions of IoT devices and applications, highlighting any unusual or malicious activities.Hybrid IDS amalgamates multiple techniques to offer a sturdy shield against the evolving threat landscape (Bhati & Khari, 2021).For maintaining the integrity and security of connected devices and data in a dynamic IoT environment, the effective implementation of intrusion detection systems is vital.
This article offers a significant contribution to the field of Network IDS (NIDS) by presenting a novel IDS based on the Deep Neural Decision Forest (DNDF) (Kontschieder et al., 2015).By employing the Interquartile Range (IQR) method for outlier identification, we aim to enhance data quality.Subsequently, feature selection techniques are utilized to improve performance, expedite prediction times, and optimize computational resources.Four unique feature selection methods-principal component analysis (PCA), SelectKBest, Lasso regression (LR), and Random Forest Feature Importance (RFFI)-are separately applied to retain only the top 10 performing features.The encoded data is then used to train the DNDF model, a consortium of neural decision trees suitable for classification scenarios, even in datasets with high dimensionality.
Our research aims to address a critical question: How can we enhance NIDS to better identify and manage network intrusions while effectively handling issues related to data quality, complexity, and resource efficiency?To contribute to this field, we are developing an innovative framework.This system leverages DNDF, a robust model for recognizing intrusion patterns, and integrates data quality improvement techniques such as preprocessing, outlier removal, and feature selection.With this approach, we are not only advancing NIDS but also surpassing current solutions by offering a comprehensive answer to some notable challenges of network intrusion detection, particularly in terms of detection accuracy and resource constraints.We aim to make NIDS much better and more reliable at protecting network systems from cyber threats, helping to build stronger and more resilient network security.This article addresses an issue frequently encountered with highly accurate models and solutions-their extensive use of features from the evaluation dataset.While this contributes to their precision, it often leads to increased resource consumption and longer prediction times.Such delays can be problematic, especially in scenarios requiring quick results.Therefore, this article focuses on addressing this problem of high resource consumption and extended prediction times, commonly associated with these accurate models.
The DNDF-IDS model enhances network anomaly detection in IoT ecosystems using a Deep Neural Decision Forest model.It employs four feature selection methods, including PCA, LR, SelectKBest, and RFFI, improving performance and identifying key features.With impressive accuracy values between 94.09% to 98.84% and a prediction time of 0.1 ms per record, it competes well with other models, especially when using the top 10 features.Its efficient predictions with fewer features highlight optimal use of resources, making it a potential solution for IoT network security against malicious attacks.
The subsequent sections of this article are organized as follows: "Background And Related Works" presents a background on IoT, IoT security, and intrusion detection, along with related works on machine learning-based intrusion detection systems."Proposed DNDF-IDS Model" details our proposed IDS model based on the DNDF framework."Experimental Study" provides an experimental study and "Discussion of Results" is a discussion of results.The article concludes with a summary and outlines potential future work.

BACKGROUND AND RELATED WORKS
This section in its first part presents a background on machine learning powered IoT security, IDS and a novel DNDF model, and in the second part, we present some related works of IDS approaches that use machine and deep learning techniques.

Background
IoT architecture serves as a fundamental framework that supports the design and operation of IoT systems.It encompasses physical objects that are integrated with software, forming a networked environment (Gupta & Quamara, 2020).This integration facilitates a seamless transfer of data and commands amongst devices, the internet, and potentially, cloud services (Dang et al., 2019;Sharma & Obaidat, 2020).Although a standardized IoT architecture remains elusive, the research community has proposed similar frameworks (Ray, 2018), such as the three-layer architecture.This model consists of three crucial layers: the perception layer, which employs sensors and actuators to collect data from the physical world; the network layer, which ensures secure and efficient data transmission to and from devices; and the application layer, which manages data processing, analysis, and execution of actions (Zhong, Zhu & Huang, 2017).The IoT architecture can also incorporate cloud services, analytics, and user interfaces, thereby paving the way for data-driven decisions, improved efficiency and autonomy (Nazari Jahantigh et al., 2020).
The IoT, being integral to numerous industries and businesses (Hussein, 2019), underscores the criticality of security.Establishing a secure IoT environment, particularly with the rapid proliferation of connected devices, poses a significant challenge (Mohamad Noor & Hassan, 2018).It necessitates the implementation of robust authentication measures, data encryption, and secure communication protocols to protect against unauthorized access and data breaches.Regular device updates, anomaly monitoring, and effective access control are also pivotal to maintaining IoT security.
In this context, intrusions signify unauthorized or malicious activities that threaten the security of connected devices and networks.IDS play an essential role in detecting and mitigating these intrusions (Santos, Rabadao & Gonçalves, 2018).These systems constantly monitor network traffic and device behavior, searching for suspicious patterns or deviations from typical activities (Zarpelão et al., 2017).Upon detecting anomalies or potential threats, the IDS issues alerts or initiates predefined actions to minimize the risk, thereby safeguarding IoT ecosystems from cyber-attacks, data breaches, and unauthorized access.
The importance of effective intrusion detection systems for preserving the integrity and security of IoT networks and the data they manage cannot be overstated, especially as the IoT landscape continues to evolve and expand.The relentless pursuit of novel prediction and classification methods leads to the development of new models and techniques each day.The research discussed in this article focuses on a recently introduced model, the Deep Neural Decision Forests (DNDF), as described by Kontschieder et al. (2015).DNDFs combine the concepts of classification trees and convolutional neural networks (CNN), resulting in a stochastic, differentiable model that supports back propagation.This breakthrough signifies a promising advancement in the sphere of decision tree-based machine learning.
In addition to the DNDF model, we incorporated feature selection using four methods, PCA, LR, SelectKBest and RFFI, PCA is a technique used for feature selection that focuses on reducing the dimensionality of data by extracting meaningful features based on the variance of the data.PCA evaluates the importance of features by analyzing their weights or coefficients on eigenvectors (Song, Guo & Mei, 2010).It aims to capture the most significant information in the data while reducing redundancy.
SelectKBest is a feature selection method that selects the top k features based on statistical tests like ANOVA or chi-squared.It ranks features according to their scores and selects the best ones for the model, discarding less relevant features (Khalid et al., 2023).
LASSO regression, also known as L1 regularization, is a method that adds a penalty term to the regression equation, forcing some coefficients to be exactly zero.This results in feature selection by shrinking less important features to zero, effectively removing them from the model and promoting sparsity (Muthukrishnan & Rohini, 2016).
Random Forest Feature Importance is a technique that measures the importance of each feature by looking at how much each feature improves the purity of the nodes in the decision trees within the random forest model.Features that lead to greater node purity are considered more important for prediction (Hasan et al., 2016).2023) proposed a practical machine learning-based IDS using K-NN and employed several feature selection methods to select ten valuable features.This approach resulted in significant performance improvement, reduced prediction time, and demonstrated that feature selection enhances IDS performance.Their work was evaluated using the Bot-IoT dataset.Attou et al. (2023) combined graphic visualization and random forest (RF) for cloud security to detect intrusions using a reduced set of two features.RF outperformed DNN, decision trees (DT), and SVM in predicting and classifying attack types.However, recall rates using NSL-KDD data were still suboptimal.

Related works
Mohy-eddine et al. ( 2023) developed a machine learning-based IDS for Industrial IoT (IIoT) edge computing security.They used Pearson's correlation coefficient (PCC) and isolation forest (IF) for computational efficiency and training.Feature engineering improved model accuracy and detection rates, achieving a 100% detection rate and 99.99% accuracy on the Bot-IoT dataset.This approach demonstrated advantages over other models.
Roy, Li & Bai (2022) presented a two-layer hierarchical IDS model for IoT networks, utilizing the Fog-Cloud Infrastructure.The fog layer uses a straightforward Feedforward Neural Network (FNN) with added functionality from a stacked autoencoder to perform binary classification.In contrast, the cloud layer employs a more intricate FNN to handle multi-class classification.This model effectively detects various intrusions and outperforms existing IDS systems in terms of detection accuracy.Attou et al. (2023) developed a new approach for intrusion detection in a cloud environment by integrating machine and deep learning algorithms.They used RF for feature selection and the Radial Basis Function Neural Network (RBFNN) for intrusion detection.Their approach achieved high accuracy exceeding 94% and low false negative rates less than 0.0831%, demonstrating the model's capability to accurately identify and classify intrusions.They also effectively addressed imbalanced datasets and highlighted the role of feature selection in enhancing the performance of the intrusion detection system.
Mohy-Eddine et al. ( 2023) introduced an IDS for IIoT networks using RF and PCC for classification and feature selection respectively.They also used IF to detect outliers.They used PCC and IF interchangeably.Their model effectively addressed the imbalance in the Bot-IoT dataset and performed well on the NF-UNSW-NB15-v2 dataset.Dushimimana et al. (2020) proposed a bidirectional recurrent neural network (BRNN) that outperforms other algorithms like a normal RNN and GRNN, achieving an accuracy of 99.04%.BRNN addresses the limitations of RNN and GRU by adding more cells and hidden neurons, leveraging information from both past and future states, making it a more effective choice for IoT security.2019) developed a deep network intrusion detection model, evaluated it using the AWID dataset, and achieved an outstanding detection accuracy of 99.8% for various types of cyberattacks.They compared their approach to recent methods from the literature and demonstrated a significant improvement in terms of accuracy and latency, using their proposed autoencoded DNN.
Hsu et al. ( 2019) developed an anomaly network intrusion detection system using a stacked ensemble model integrating an autoencoder, SVM, and random forest.Evaluated on NSL-KDD, UNSW-NB15, and Palo Alto logs, the system achieved around 92% accuracy, outperforming traditional models.Limitations include high resource usage, extended prediction times, and the need for further optimization and real-world testing.
Sarkar, Sharma & Singh (2023) presented an efficient ML ensemble technique for intrusion detection, focusing on parameter tuning, pre-processing, and dataset correction.It offers two classification methods on KDD Cup99 and NSL-KDD datasets, enhanced with data augmentation.Utilizing a cascaded MLP structure and a meta-classifier architecture, the approach achieves 89.32% accuracy with a 1.95% FPR, and 87.63% accuracy with a 1.68% FPR on the NSL-KDD dataset.
Mebawondu et al. ( 2021) developed an optimized IDS using an ensemble of decision trees (DT) on the UNSW-NB15 dataset.Comparing bagging and AdaBoost techniques, the study finds that AdaBoost with the C4.5 DT classifier achieves the best performance, with 98% accuracy and precision using a 90/10 train-test split.Gao et al. (2019) proposed an adaptive ensemble learning model to improve intrusion detection accuracy by combining different algorithms.The model achieved 85.2% accuracy, 86.5% precision, 85.2% recall, and an F1 score of 84.9%, surpassing other methods.Although DNN excel in detection, they are slow, affecting real-time application.The MultiTree algorithm outperformed DNNs, especially in imbalanced scenarios.The study highlights the need for better training data, feature extraction, and preprocessing, particularly for high-level threats like U2R attacks.The ensemble approach shows promise but requires further optimization for practical use.Lian et al. (2020) addressed the challenge of detecting and categorizing network attacks by proposing an intrusion detection method based on Decision Tree-Recursive Feature Elimination (DT-RFE) within an ensemble learning framework.The DT-RFE method selects relevant features and reduces feature dimensions, enhancing resource utilization and reducing time complexity.By using a Stacking ensemble learning algorithm that combines decision tree and recursive feature elimination (RFE), the study demonstrates improved performance of the IDS.Cross-validation on the KDD CUP 99 and NSL-KDD datasets shows that the proposed approach significantly enhances accuracy.However, the method's effectiveness may be limited by the quality of the training data and the need for further optimization in feature extraction and noise reduction.
Beyond traditional CNN and random forest methods, the emerging field of hybrid metaheuristics and machine learning offers a promising solution for complex security issues.This innovative area merges machine learning with swarm intelligence for excellent results.Studies like (Salb et al., 2023) and (Dobrojevic et al., 2023) prove the success of this integration in enhancing security.Hybrid methods utilize both fields' strengths to boost performance and robustness of security systems, marking progress in this crucial research area.A comparison of the various methods outlined above is demonstrated in Table 1.
Despite advances in neural networks and decision tree-based models, integrating these approaches for structured data classification is underexplored.Traditional CNNs are effective for image data but not optimized for structured data, and while random forests handle structured data well, they lack the end-to-end learning capabilities of neural networks.Recent research on neural decision forests shows promise, but there is a lack of empirical evaluations and optimization studies.Although current models and solutions provide accurate results, they rely heavily on nearly all features of the evaluation dataset, leading to high resource usage and extended prediction times.This research aims to fill these gaps by evaluating neural decision forests on structured data, optimizing their parameters, and comparing them with CNNs and random forests to provide a robust, endto-end model for structured data classification.Additionally, this article tackles resource and efficiency issues by utilizing a significantly reduced number of features selected through various feature selection techniques, conducting a comparative analysis of the results, and presenting the prediction times for each method within the same environment.

PROPOSED DNDF-IDS MODEL
In this section, we present our solution, which is based on a deep neural decision forest.To conserve resources and reduce execution time, while also improving data quality, we used feature selection and encoding, retaining only the features that performed best.Subsequently, we trained the DNDF model using the ten best selected and encoded features to develop the final intrusion detection system.

Proposed approach
Our model is based on five core components as described in Fig. 1: a data source, a preprocessing module, a feature reduction module, a decision module, and finally, a response module.Moreover, IDS often involve a network-level continuous analysis of traffic, which can cause a delay in operations.In this regard, training time and resource consumption are important factors.Therefore, feature selection is used in the development of our solution.
Aiming to increase speed, as IoT environments usually can't afford the delay caused by real-time intrusion detection, we want the main strength of our model to be the minimal use of features for the prediction process.Therefore, only 10 features are to be considered.We use four different feature selection methods aiming to maximize our model's efficiency.We used PCA, LR, SelectKBest, and RFFI separately in an effort to conduct a comparative review of all four.In the classification process, we used a deep neural decision trees model.This model combines the best of the worlds of random forests and CNN, the partitioning principle of decision trees paired with strengths of deep learning architectures.This facilitates effective feature extraction and therefore provides the ability for minimal feature use.

Description of solutions
As shown in Fig. 1, our solution consists of five core modules.The data source module forms the foundation of our model's learning datasets, which include labeled network traffic records identified as safe or threats.This data is prepared, cleaned, and encoded in the second module, where duplicates, missing data, and outliers are removed using the Interquartile Range method (IQR).
The third module, feature selection, processes the encoded data through PCA, LR, SelectKBest, and RFFI methods.This yields four distinct feature sets, each with the top 10 performing features.This enhances model quality while reducing training and prediction time.
In the fourth module, four variations of the DNDF model are trained using four distinct sets of reduced data.Each variation is trained with one of these reduced datasets.These models are then compared to identify the best performing one.Although this approach increases training times due to the necessity of training with four different subsets of the same dataset, it ultimately reduces prediction times by utilizing a smaller number of features.
Finally, the decision module utilizes the best performing model from the previous step to analyze network traffic in real-time and determine whether it's safe or a potential threat.The prediction model should be lightweight and have a very low prediction time to enable real-time capability.Due to the time-sensitive nature of IoT operations, it is crucial for the prediction process to be as fast and lightweight as possible.Our approach is to use only the 10 best performing features for training and prediction.We used four different feature selection methods separately, leveraging the benefits and drawbacks of each one.These methods include PCA, SelectKBest, LR, and RFFI.
PCA reduces dimensionality by transforming original features into a new set of orthogonal components.These components, or principal components, are sorted by how much variance they explain in the data.PCA effectively minimizes the dimensionality of the dataset by selecting a subset of these components that capture the most variance.SelectKBest evaluates each feature's significance individually using statistical tests.It ranks features based on their scores and selects the top K features with the highest scores, where K is a user-selected number.This method ensures the most informative features are selected based on their statistical relevance.LR is a linear regression method that applies a penalty to the absolute size of the coefficients, shrinking some to zero.This characteristic of Lasso encourages feature selection by automatically setting irrelevant features' coefficients to zero, effectively removing them from the model.LR helps select the most relevant features while maintaining model interpretability.RFFI measures each feature's importance based on how much it contributes to a random forest model's overall performance.It computes feature importance based on the decrease in impurity at each decision tree split caused by the feature.Features leading to larger decreases in impurity are considered more important.RFFI is robust to non-linear relationships and can capture complex feature interactions, making it ideal for identifying the most influential features in a dataset.By combining these methods, a more balanced feature selection strategy can be achieved.This mitigates individual limitations and enhances overall model performance and interpretability.The outcomes of these methods are the 10 best-performing features for each selection, described in Tables 2-4.
DNDF is a CNN variant that replaces the softmax layer with decision forests, which comprise decision trees (Sun et al., 2022).Decision trees have decision and prediction nodes, indexed as N and L, respectively.Prediction nodes have a probability distribution over the output space, while decision nodes have decision functions.Parameters from the CNN update feature representations.The CNN's embedding function determines the action of decision functions in the decision trees.The architecture, depicted in Fig. 2, illustrates the implementation of decision nodes using the final CNN layer output.DNDF uses the same fully-connected and convolutional layers as a typical CNN, with feature representations learned by the fully-connected layer serving as tree nodes in the decision forests.
We selected the DNDF model over the traditional CNN for three main reasons.First, the Incorporation of Decision Forests: Unlike typical CNN activation functions, decision forests in DNDF can efficiently capture complex decision boundaries.This is particularly beneficial in high-dimensional or noisy data scenarios, as it enhances performance.Second, the Benefits of Ensemble Learning: Decision forests improve model robustness and generalization.This is advantageous in situations where CNNs may overfit or have limited generalization due to data complexity or dataset size.Finally, improved interpretability: Decision forests provide better interpretability than CNNs, which is vital in classification problems.
The final layer of CNN provides the embedding functions fn (n = 1, 2, …, n), and its resulting output dictates the decision function dn of the decision tree nodes.Prediction nodes, denoted as π1, π2, .., πn within the decision tree, contain probability distributions for each class, these nodes are responsible for determining the likelihood of an observation belonging to a specific class.in Fig. 2, the red path demonstrates an example of a routing of a sample x to reach the leaf π4.
Broadly, DNDF and CNN share the same fully connected and convolutional layers.However, DNDF diverges by replacing the softmax layer of CNN with decision trees, wherein the tree nodes utilize feature representations acquired from the fully connected layer.The decision function of the decision node dn(•; θ) is defined as follows: where x ϵ X represents the sample input, the parameter θ used to update the feature representation from CNN, and fn(x; θ) is a real-value function contingent on both the sample and the parameters θ.The output of the dn(•; θ) function decides the routing of the x ϵ X (the red path example in Fig. 2).When a sample transitions from a tree node to a leaf node, the routing function is expressed as follows: where l is the leaf node and the routing from the current node to the left is represented by nc, and dn(x; θ) = 1dn(x; θ).As illustrated by the red path example in Fig. 1, it is defined as follows: The stochastic routing of this architecture results in a final classification of a sample x into class y, by averaging the probability outcomes of reaching a leaf node.The function for the final prediction is: where πl is the class label distribution, and πly represents the probability of a sample reaching leaf node l and being assigned class y.Finally, the decision forest comprises multiple decision trees F = T1 + T2 +… + Tk where the average of the output of each tree makes the final decision forest prediction, and defined as follows:

DNDF model architecture
The proposed model components and process shown in Fig. 3, designed specifically for structured data, follows a systematic process from input handling to final classification.The architecture begins with an input layer, where each of an X number of input features is fed into the model as individual inputs.These features are diverse and may include both numerical and categorical data.
To facilitate uniform processing, each input feature undergoes a dimensional expansion using the tf.expand_dims operation.This step is crucial as it adds an additional dimension to each feature, converting them into tensors suitable for subsequent operations within the model.The expanded features are then concatenated into a single, unified tensor, effectively combining all the individual feature representations into a cohesive input.
The concatenated tensor is passed through a batch normalization layer.Batch normalization is applied to stabilize and accelerate the training process by normalizing the inputs to each layer, ensuring that the model trains efficiently and effectively.This normalization helps in maintaining the mean and variance of the inputs, thereby mitigating issues related to internal covariate shift.
Following batch normalization, the processed input tensor is fed into the core component of the model: the neural decision forest.The neural decision forest consists of an ensemble of an N number of neural decision trees.Each tree within this ensemble is characterized by a hierarchical structure, defined by its depth, which dictates the complexity and the number of decision boundaries it can learn.
Within each neural decision tree, the input features are selectively masked and processed through a series of nodes, where decisions are made based on the learned feature interactions.The structure of these trees allows for learning complex, non-linear relationships within the data.The output of each tree is a probabilistic prediction for each class, computed through a series of sigmoid activations and weighted decisions at each node.
The outputs from all N trees are aggregated to form the ensemble output.This aggregation involves summing the predictions from each tree and then averaging them, which effectively combines the learned decision boundaries from all trees.This ensemble approach enhances the model's robustness and predictive performance, as it leverages the collective decision-making capabilities of multiple trees.

EXPERIMENTAL STUDY
In this section, we test our model on three different datasets: NSL-KDD, CICIDS2017, and UNSW-NB15.We use four different feature selection methods (PCA, SelectKBest, LR, and RFFI), each selecting only the 10 best-performing features.The same hyperparameters are used for all the different settings and are acquired through a process of grid search, and are described as follows : 1. Input features a.The model uses 10 input features, both numerical and categorical, processed accordingly.We compare our findings with other models from recent literature, focusing on random forest and CNN-based intrusion detection models.We also extend our comparison to other deep learning architectures.

Environment
For the evaluation of our proposed solution, it is important to compare it against recent models in IDS literature.For this reason, we tested our model on multiple widely used datasets, including NSL-KDD, CICIDS2017, and UNSW-NB15, to provide a broader comparison.In Table 5, we compare the results from our DNDF-IDS model for each feature selection method with the other models listed in this section, taking into account their best-performing model in terms of ACC (ACC (Best)).We focus on the number of features used.
We performed our tests using Google Colab, a hosted Jupyter notebook service with 1 vCPU of the model AMD EPYC 7B12 @ 2.2 GHz and 12.7 RAM.The operating system was Linux 5.15.120+, and we implemented our solution using Python v3.10.12.

NSL-KDD dataset
The NSL-KDD dataset is a balanced and improved version of the KDD Cup 99.It is widely adopted for IDS evaluations and contains a real-world-like set of network traffic with simulated network attacks.These attacks include, but are not limited to, DoS, Probe, Remote-to-Local, and User-to-Root attacks.The NSL-KDD dataset contains a comprehensive set of network traffic data that simulates various real-world network attacks, including Denial of Service (DoS), Probe, User-to-Root (U2R), and Remote-to-Local (R2L) attacks.The NSL-KDD dataset contains over 140,000 records and 41 features (in addition to the target feature) of a realistic representation of network traffic, reduced redundancy, diverse attack types, and challenging anomalies.It is a very widely adopted dataset in IDS literature, which makes it a good benchmark for our model.Table 2 lists the best 10 selected features for each of the four methods.
Using this dataset, Primartha & Tama (2017) proposed an enhanced random forest classifier for detecting anomalies in IoT networks, evaluating ten classifiers with different parameters, focusing on the ensemble's tree count.Three datasets were used in the experiment (NSL-KDD, UNSW-NB15, and GPRS), and the results indicated that their model significantly outperformed other classifiers, reaching an ACC of 99.57% for the NSL-KDD but with all 41 features.Farnaaz & Jabbar (2016) also proposed a random forest classifier to detect various types of attacks applying 10-fold cross-validation.Their model

CICIDS2017 dataset
The CICIDS2017 dataset, also known as the CSE-CIC-IDS2017 dataset, is a comprehensive and modern benchmark unbalanced dataset for evaluating IDS.This dataset is designed to represent contemporary cybersecurity threats and provides a more realistic and up-to-date simulation of network traffic compared to some older datasets.The dataset includes a diverse range of network traffic data that covers a wide variety of network attacks, such as DoS, Distributed DoS (DDoS), Probe, and User-to-Root attacks.
Some key features and reasons to use the CICIDS2017 dataset for evaluating IDS are: . Realistic and current data . Diverse attack scenarios . Large and complex . Anomalies and attack variations .

Standardized evaluation
. Useful for research and development Overall, the CICIDS2017 dataset offers a rich and realistic environment for evaluating intrusion detection systems, making it an excellent choice for testing and benchmarking the performance of IDS against contemporary cyber threats.It contains over 2.8 million records with over 80 features.Table 3 lists the best 10 features for each of the four selection methods.
Ustebay, Turgut & Aydin ( 2018) presented an IDS model using deep learning and random forests, and tested it with the CICIDS2017 dataset.To reduce the dataset without sacrificing accuracy and improve speed, recursive feature elimination is used, with random forest as the classifier.The study achieves a dataset reduction of features down to 10, retaining 89% accuracy, creating a more meaningful and smaller dataset for IDS.
The work of Yulianto, Sukarno & Suwastika (2019) aims to boost the performance of their IDS model based on Ada-Boost on the CICIDS2017 dataset.It employs techniques like SMOTE, PCA, and Ensemble Feature Selection.The proposed method performed with 81.83% accuracy using only 25 features.Dong, Shui & Zhang (2021), Hasan et al. (2016) proposed an anomaly detection model that uses feature selection and the random forest to improve anomaly detection performance in high-dimensional data from industrial control networks.It combines information gain (IG) and PCA for feature reduction, achieving a high accuracy rate of 99.80% for CICIDS2017 datasets using 13 features.Vinayakumar et al. (2019) proposed a hybrid IDS that employs a scalable framework on commodity hardware for network analysis and host-level activities.It utilizes distributed deep learning models with deep neural networks to handle real-time data on a large scale.Their model retained an accuracy of 96.3% using all the features offered by CICIDS2017.
The target variable distribution in the CICIDS2017 dataset is demonstrated in Fig. 5.

UNSW-NB15 dataset
The UNSW-NB15 dataset is commonly used for IDS evaluation.Developed by the University of New South Wales (UNSW), this unbalanced dataset encompasses a detailed collection of network traffic data, including both benign and malicious activities.It provides a realistic representation of network behavior, making it an invaluable tool for testing and refining IDSs.The dataset contains over 2.5 million instances of network traffic with 47 features, offering a diverse and dynamic set of scenarios for testing the effectiveness of intrusion detection algorithms.Researchers and cybersecurity experts frequently turn to the UNSW-NB15 dataset for its real-world applicability, enabling them to assess the accuracy and robustness of their systems against a wide range of network threats and attack vectors.In essence, it serves as a pivotal benchmark for enhancing the security of network environments by enabling the development and evaluation of more robust intrusion detection mechanisms.Table 4 lists the 10 best selected features of each method.
Zhang, Li & Ye (2020) introduced the BCNN-IDS model aiming to improve detection performance by reducing unconfident results using the T-ensemble detection scheme.Their solution was evaluated using two open datasets (NSL-KDD and UNSW-NB15), showing significant improvements over alternative models.Jing & Chen (2019) discussed using an SVM with a nonlinear scaling method for intrusion detection, particularly on the UNSW-NB15 dataset.Their SVM-based model was tested for binary and multiclassification, achieving high accuracy, with 85.99% and 75.77% respectively, and low false positive rates, 16.50% and 3.04%.The results indicate that the SVM-based approach is effective for classifying and attack detection, outperforming RF and CNN for each class.Halbouni et al. (2022) developed an intrusion detection system by combining CNN and LSTM deep learning algorithms, utilizing CNN for spatial features and LSTM for temporal features.They applied batch normalization, dropout layers, and standardization to enhance performance.They tested their model on multiple datasets, with the CNN-LSTM hybrid model yielding the best detection rate and accuracy.In binary classification scenarios, it achieved high accuracy, although it had some limitations in detecting certain types of attacks.The study also explored K-Fold cross-validation and increasing the number of epochs, which showed performance improvements before stabilizing.Amin et al. (2022) proposed an anomaly detection model that combines feature selection and ensemble methods.They used the UNSW-NB15 dataset for evaluation.Univariate feature selection (ANOVA test) was applied, reducing the 44 independent features to 38 important ones.The ensemble classifiers used were bagging and random forest with decision trees as base classifiers.The experimental results demonstrated significantly improved accuracy against other existing models using the UNSW-NB15 dataset, with 99.28% accuracy for random forest and 99.47% for bagging algorithms.
The target variable distribution in the UNSW-NB15 dataset is demonstrated in Fig. 6.

Evaluation metrics
For the evaluation metric, we used accuracy, precision, true positive rate, false positive rate, and the F1 score as described below.Accuracy is a widely used metric in various domains.
Using it for IDS allows for consistent comparisons between different systems and across different datasets.This common benchmark makes it easier to understand how an IDS compares to others in the field.It is used in this article to compare our results with those from the IDS literature.
. Accuracy: Where TP, TN, FP and FN refer to true positives, true negatives, false positives and false negatives respectively as shown in the confusion matrix in Table 6.

DISCUSSION OF RESULTS
An essential aspect of our proposed solution lies in its ability to expedite prediction times by carefully selecting features.To achieve this, it is crucial to determine the minimum number of features necessary for accurate predictions.To address this requirement, we conducted a dedicated experiment using the NSL-KDD dataset, which includes 41 features in addition to the target variable.
In this experiment, we employed PCA to rank all features in descending order of importance.Subsequently, we trained and evaluated our model using all possible combinations of features, ranging from one to 41.Throughout this process, we meticulously recorded both the accuracy and prediction times of the model.Figure 7 illustrates the progression of accuracy (shown by the blue line) alongside prediction times.Notably, we observed that beyond ten features, accuracy reached a plateau while prediction times continued to increase.Specifically, as we increased the number of features from 10 to 41, accuracy improved by 0.6%, while prediction times surged by 33.33%.
The ablation experiment, as detailed in Table 7, demonstrates the efficacy of our model, DNDF, under various conditions by systematically removing specific components and evaluating performance metrics such as ACC and prediction times.Notably, DNDF with    and slower prediction times.Thus, the DNDF model with 10 features and IQR stands out as the best-performing and most efficient choice, highlighting the importance of feature selection and robust preprocessing in achieving top-tier performance.Our model, DNDF, integrates decision trees and CNNs to combine the strengths of both approaches.Decision trees prioritize features and provide interpretability, while CNNs excel in feature extraction and learning hierarchical representations.DNDF enhances performance by using decision trees for feature selection and CNNs to refine these features, addressing the gap in traditional models that separate feature selection and feature engineering.We replace the CNN activation function with decision trees and use the IQR method for data preprocessing, refining the feature set and improving model performance.Ablation experiments show DNDF with 10 features and IQR preprocessing achieves a high accuracy of 98.38% with a prediction time of 6.27 s for the entire test set, demonstrating optimized performance and computational efficiency.Extensive experiments show DNDF outperforms standalone decision tree and CNN models in accuracy, robustness, and interpretability, supporting the effectiveness of our integrated approach.
Figures 8 and 9 provide a comprehensive visual representation of the convergence speed of our proposed model.They specifically illustrate the steady progression of the loss and  8, the values for each dataset are mostly similar, with slight differences across the feature selection methods used.
Figure 13 represents the acquired accuracy values from each feature selection method for our model.After selecting the 10 best performing features using PCA, SelectKBest, LR, and RFFI, we obtained accuracy values of 98.38%, 94.26%, 97.62%, and 96.58% respectively for the NSL-KDD dataset.Similarly, the model retained accuracies of 98.84%, 95.22%, 97.72%, and 94.09% for the CICIDS2017 dataset.Finally, for the UNSW-NB15 dataset, it retained 98.23%, 98.10%, 97.75%, and 97.10% respectively.Notably, PCA consistently achieved the highest accuracy for two out of three datasets, with 98.38% for NSL-KDD and 98.84% for CICIDS2017 and 98.23% for UNSW-NB15.This emphasizes the importance of dimensionality reduction in enhancing model performance.evaluation of our model for each dataset using each feature selection method.It is noticeable that PCA performed the best in general in terms of accuracy and precision.PCA retained the highest scores for NSL-KDD and CICIDS2017 and the second highest for UNSW-NB15, but only by a very small margin.Our model, trained on three diverse datasets, primarily relied on accuracy as the evaluation metric.The results showcased the model's ability to maintain high accuracy levels for both NSL-KDD and CICIDS2017 datasets, demonstrating its robustness across different network intrusion scenarios.
Our solution only relies on the 10 best features and consequently leads to very short prediction times.As illustrated in Table 10, the prediction time is in seconds and for the whole test sample (test size), averaging prediction times close to 0.1 ms per record.In Fig. 14, we illustrate a comparative representation of prediction time per record for each dataset.
Furthermore, when comparing our model with other models documented in IDS literature, we found that our model performed exceptionally well, outperforming some existing models and achieving similar performance to others.A noteworthy aspect of our model's success is that it accomplished these results using only the 10 best features selected    through feature extraction techniques, while many previous models relied on all available features.This not only highlights the effectiveness of our model but also its efficiency in reducing the computational burden by focusing on the most informative features.In summary, our feature selection methods, particularly PCA, proved to be valuable for enhancing our intrusion detection model's accuracy on the NSL-KDD, CICIDS2017, and UNSW-NB15 datasets.The model's consistent performance, when compared with existing models, reaffirms its effectiveness in network intrusion detection, all while being resourceefficient with its focus on the 10 best features.

CONCLUSION AND FUTURE WORKS
In today's interconnected world, maintaining an exemplary level of security is imperative.This has prompted a surge in research efforts focused on the design and development of intrusion detection systems.This article contributes to that body of work by introducing a new IDS approach called Deep Neural Decision Forest (DNDF).To improve data quality, we use the IQR method to clean the data by detecting and removing outliers.We then select the top 10 features based on four feature selection methods: PCA, SelectKBest, LR, and RFFI, with the aim of enhancing performance and reducing reliance on computational resources.To evaluate our DNDF-IDS model, we used three distinct datasets: NSL-KDD, CICIDS2017, and UNSW-NB15.The results showed commendable performance, with accuracies ranging from 94.09% to 98.84%, depending on the specific feature selection methodology employed.Moreover, the IDS demonstrated improved prediction capabilities, achieving an average time of approximately 0.1 ms per record.In conclusion, as more things become connected, it is of paramount importance to detect and mitigate malicious intrusion attempts.Our DNDF-IDS, along with the used feature selection and data cleaning techniques, is a step towards making IoT systems safer.However, it is important to recognize both theoretical and practical limitations.Theoretical limitations might include model design assumptions or simplifications in the base algorithms.Practical limitations could include constraints related to computational resources or dataset attributes.In future work, we plan to address these limitations by refining and enhancing our solution.Our focus will not only be on improving detection capabilities but also on reducing computational costs.Furthermore, examining the scalability and adaptability of our method to various network environments and evolving threat landscapes is vital for wider applicability in real-world situations.

Figure 3
Figure 3 DNDF model components and process.Full-size  DOI: 10.7717/peerj-cs.2290/fig-3 demonstrated increased accuracy of 99.67% on the NSL-KDD dataset, and it also used all 41 features to get to this result.While Ding & Zhai (2018) also used all 41 features, their CNN model provided an ACC of 80.23% on the same dataset.Finally, Abrar et al. (2020) evaluated various models, such as LR, NB, KNN, MLP, RF, ETC, and DT using four different subsets of features of the NSL-KDD dataset for each model.The result was the highest ACC of 99.48% using 31 of the 41 features.The target variable distribution in the NSL KDD dataset is demonstrated in Fig. 4.

Figure 7
Figure 7 Accuracy and prediction times evolution with the number of features for NSL-KDD dataset.Full-size  DOI: 10.7717/peerj-cs.2290/fig-7

Figure 13
Figure 13 Retained ACC values for each feature selection method on each dataset.

Table 1 A
comparison of the models mentioned above.

Table 2
Selected features by individual methods for the NSL-KDD dataset.

Table 3
Features selected by each method for the CICIDS2017 dataset.

Table 4
Method-specific selections of features for the UNSW-NB15 dataset.

Table 5 A
comparison between our model with the models mentioned above.

Table 7
Comparative analysis of model variants with selective component removal.