Next Article in Journal
Data Acquisition for Condition Monitoring in Tactical Vehicles: On-Board Computer Development
Previous Article in Journal
A Mid-Infrared Quantum Cascade Laser Ultra-Sensitive Trace Formaldehyde Detection System Based on Improved Dual-Incidence Multipass Gas Cell
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Conditional Tabular Generative Adversarial Based Intrusion Detection System for Detecting Ddos and Dos Attacks on the Internet of Things Networks

by
Basim Ahmad Alabsi
1,
Mohammed Anbar
2,* and
Shaza Dawood Ahmed Rihan
1
1
Applied College, Najran University, King Abdulaziz Street, Najran P.O. Box 1988, Saudi Arabia
2
National Advanced IPv6 (NAv6) Centre, Universiti Sains Malaysia, Gelugor 11800, Penang, Malaysia
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(12), 5644; https://doi.org/10.3390/s23125644
Submission received: 2 May 2023 / Revised: 6 June 2023 / Accepted: 13 June 2023 / Published: 16 June 2023
(This article belongs to the Section Internet of Things)

Abstract

:
The increasing use of Internet of Things (IoT) devices has led to a rise in Distributed Denial of Service (DDoS) and Denial of Service (DoS) attacks on these networks. These attacks can have severe consequences, resulting in the unavailability of critical services and financial losses. In this paper, we propose an Intrusion Detection System (IDS) based on a Conditional Tabular Generative Adversarial Network (CTGAN) for detecting DDoS and DoS attacks on IoT networks. Our CGAN-based IDS utilizes a generator network to produce synthetic traffic that mimics legitimate traffic patterns, while the discriminator network learns to differentiate between legitimate and malicious traffic. The syntactic tabular data generated by CTGAN is employed to train multiple shallow machine-learning and deep-learning classifiers, enhancing their detection model performance. The proposed approach is evaluated using the Bot-IoT dataset, measuring detection accuracy, precision, recall, and F1 measure. Our experimental results demonstrate the accurate detection of DDoS and DoS attacks on IoT networks using the proposed approach. Furthermore, the results highlight the significant contribution of CTGAN in improving the performance of detection models in machine learning and deep learning classifiers.

1. Introduction

The Internet of Things (IoT) has become more widespread in recent years, with applications ranging from smart homes and wearable technology to factory automation. However, due to their increased use, IoT devices are increasingly vulnerable to Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) attacks and other novel cyber threats. DDoS/DoS attacks on IoT networks aim to make the services and resources of the targeted network or devices inaccessible to legitimate users. This is accomplished by inundating the network or devices with an enormous volume of malicious traffic, depleting their available resources such as bandwidth, processing power, or memory. Traditional intrusion detection systems (IDS) have difficulty retaining and identifying these threats because of the amount and diversity of IoT data [1].
With more and more devices connecting to networks and inadequate protections in place, DDoS attacks on IoT infrastructure have become more common and destructive. The purpose of these kinds of attacks is to employ many infected machines to flood a network or server with traffic, rendering it unavailable to its legitimate users. IoT networks are vulnerable to DDoS attacks because of their limited processing capacity, lack of security measures, and the possibility of broad infiltration, which might jeopardize critical data and cause disruptions in services like healthcare and transportation. Developing IDS systems that can identify DDoS attacks is vital for ensuring the availability and security of IoT networks [2].
According to a report by Kaspersky [3], the number of DDoS attacks targeting IoT devices increased by 9.5 times between 2017 and 2018. In 2019, IoT devices were found to be involved in 32.7% of all DDoS attacks worldwide.
Another report by NETSCOUT [4] revealed that in the first half of 2021, IoT devices were involved in 29% of all DDoS attacks globally. The report also highlighted that there was a significant increase in the number of amplification attacks that used IoT devices, which rose by 1630% compared to the same period in 2020.
These statistics indicate that IoT devices are becoming an increasingly popular target for DDoS and DoS attacks, and organizations must take proactive measures to protect their networks and devices from these attacks.
To detect DDoS/DoS attacks in IoT networks, traditional IDS utilize approaches such as statistical anomaly detection, signature-based detection, and machine learning-based detection. However, the detection of DDoS/DoS attacks in IoT networks poses a significant challenge for traditional intrusion detection systems (IDS). These systems typically employ techniques such as statistical anomaly detection, signature-based detection, and machine learning-based detection. However, the unique characteristics of IoT networks, including a vast number of interconnected devices, varied communication protocols, and heterogeneous traffic patterns, contribute to the complexity of detecting malicious activities. Traditional IDS methods, which were primarily designed for conventional networks, struggle to cope with the dynamic and unpredictable nature of IoT environments [5].
This study highlights the critical need to investigate new ways to improve IDS detection capabilities for IoT networks. Researchers can make strides in developing more reliable and accurate techniques for identifying and mitigating DDoS/DoS assaults by addressing the challenges posed by the enormous quantity and variety of IoT traffic. Safeguarding IoT networks and devices through enhanced detection methods is essential for maintaining the trustworthiness and security of many IoT applications and services.
To aid in the development of efficient DDoS/DoS detection approaches for IoT networks, researchers can push the limits of existing research, explore new detection algorithms, utilize advanced machine learning algorithms such as CTGAN, or adapt existing methods to suit the distinctive features of IoT traffic. Ultimately, these advancements will strengthen the security and resilience of IoT systems, making them more resistant to and better equipped to handle the increasing dangers of DDoS/DoS attacks.
The key contributions of this research work can be summarized as follows:
  • An approach that leverages CTGAN for accurate identification of DDoS and DoS attacks in IoT networks. The proposed approach utilizes the power of generative adversarial networks to synthesize realistic network traffic data, enabling more effective detection and classification of malicious activities.
  • Conducting an extensive evaluation of the classification performance of various shallow machine learning (ML) and deep learning (DL) models. By leveraging the synthetic dataset generated by CTGAN, this research pioneers a comprehensive assessment of different ML and DL algorithms, providing insights into their strengths and weaknesses in detecting DDoS/DoS attacks in IoT networks. This evaluation contributes to the understanding of the most effective models for accurate attack classification.Furthermore, this evaluation serves as a valuable resource for future researchers in the same field, aiding them in identifying the optimal combination of machine ML or DL techniques in conjunction with CTGAN.
  • Addressing the issue of extreme class imbalance in the Bot-IoT dataset through the utilization of synthetic data generation. The research proposes the use of CTGAN to generate synthetic data that represents the minority class of DDoS and DoS attacks. By augmenting the dataset with synthetic samples, this approach helps alleviate the challenges associated with imbalanced training data, enhancing the performance and robustness of detection models.
By presenting these contributions, the research contributes to the development of more effective and reliable intrusion detection systems tailored to the unique characteristics of IoT environments.
The remaining sections of the paper are organized as follows: Section 2 introduces the background of this research. Section 3 discusses related works. Section 4 outlines the proposed approach. Section 5 showcases the experimental results. Finally, Section 6 outlines the conclusions and future works.

2. Background

This section provides an overview of DDoS and DoS attacks on IoT networks and briefly explains the Conditional Tabular Generative Adversarial Network (CTGAN).

2.1. Distributed Denial of Service (DDoS) and Denial of Service (DoS)

IoT networks are highly susceptible to cyber-attacks, with DDoS and DoS attacks being two of the most prevalent types. These attacks can cause significant disruptions to critical services, resulting in financial loss and damage to the reputation of affected organizations. The vulnerability of IoT devices is a significant contributing factor to these attacks, as they often lack security measures and computing resources. DDoS attacks involve a coordinated effort by multiple devices to flood a network or server with traffic, rendering it inaccessible to legitimate users. This is often accomplished by utilizing compromised devices, such as those infected with malware or bots. In contrast, DoS attacks involve a single device or a small group of devices overwhelming the network with traffic, causing it to become unavailable. The key differentiator between DDoS and DoS attacks is the number of devices employed to carry out the attack [6].
There are several ways in which DDoS and DoS attacks against IoT devices are distinct from DDoS and DoS attacks on conventional networks. One key distinction between conventional computing devices and IoT devices is the latter’s generally lower processing power and memory. This makes them more susceptible to resource depletion attacks, in which the target is subjected to such a high volume of requests or traffic that it becomes incapacitated. In addition, it might be more difficult to identify and counteract attacks in real-time when dealing with IoT devices since they may be dispersed across a large geographical region and linked through a variety of network protocols and communication channels. The necessity for strong security measures to defend against DDoS and DoS attacks will only increase as IoT devices continue to spread and become more ingrained in essential infrastructure and day-to-day life [7,8].
Furthermore, the low-cost and easy-to-use nature of IoT devices, coupled with a lack of emphasis on security during their design, makes them an attractive target for cybercriminals to exploit their vulnerabilities. Many IoT devices have default login credentials that are either easily guessable or readily accessible, enabling attackers to gain unauthorized entry to these devices and exploit them for malicious activities, including the initiation of DDoS and DoS attacks.
Generally, DDoS/DoS attacks in IoT networks pose unique challenges, with certain types of attacks being more prevalent. One example is IoT Botnet-based DDoS attacks [9]. These attacks exploit the large number of interconnected IoT devices to launch massive-scale attacks. The prevalence of these attacks is due to the vulnerability of IoT devices, their widespread deployment, and the resource constraints of IoT devices. Detecting and mitigating such attacks in real-time is challenging due to limited device capabilities and the heterogeneity of IoT devices and communication protocols. Table 1 shows attacks on IoT networks other than DDoS and DoS [10].

2.2. Conditional Tabular GAN (CTGAN)

CTGAN [11] is a specialized form of Generative Adversarial Network (GAN) designed specifically for handling tabular data commonly found in databases and spreadsheets. Unlike traditional GANs that focus on generating graphics or text, CTGAN is tailored to create synthetic tabular data that closely emulates the statistical properties of the original data.
CTGAN trains both the generator and discriminator networks simultaneously to produce their respective outcomes. The generator network takes white noise as input and generates synthetic data samples. On the other hand, the discriminator network receives both real and synthetic data as input to differentiate between the two. As the training progresses, the generator network aims to produce synthetic data that increasingly resembles the actual data, while the discriminator network strives to improve its ability to distinguish between real and synthetic instances.
A key feature of CTGAN is its capability to generate conditioned synthetic data, enabling the generation of data under specific circumstances, such as predefined column values or the absence of certain patterns. This conditional data generation feature proves particularly valuable when adhering to regulatory or corporate guidelines during data synthesis.
CTGAN effectively generates synthetic tabular data while preserving the statistical characteristics of the original data and accommodating conditional data production. Consequently, it finds applications in various domains, including data privacy and security, data augmentation, and data sharing [12].
In this study, CTGAN was chosen over traditional GAN models [13] due to its ability to address the limitations of GAN in accurately capturing complex dependencies and distributions within structured data. By generating synthetic tabular data that closely resembles real-world data, CTGAN surpasses the scope of GAN primarily used for synthetic image generation. This makes CTGAN more suitable for intrusion detection in IoT networks [14].

3. Literature Review

Various studies have proposed different techniques for detecting DoS and DDoS attacks in IoT networks using machine learning algorithms. For instance, Cviti et al. developed a method using a boosting technique of logistic model trees to detect DDoS traffic for various classes of IoT devices. Their results achieved accuracy rates ranging from 99.92% to 99.9% for the four device classes considered [15]. Roopak et al. employed a convolutional neural network (CNN) with long short-term memory (LSTM) for classifying DDoS attacks and achieved high accuracy rates on the CISIDS-2017 datasets, with a precision of 99.26%, recall of 99.35%, and F1-score of 99.3% [16].
Hodo et al. proposed a multilayer perceptron (MLP)-based intrusion detection system (IDS) to detect DoS attacks in IoT networks, which accurately distinguished between various DDoS and DoS attacks [17]. Mohammed et al. proposed an IDS based on multiple ML algorithms, including decision tree (DT), k-nearest neighbors (k-NN), and Naive Bayes (NB), achieving accuracy rates of 100%, 98%, and 29%, respectively, using the CICIDS-2019 dataset samples [18].
Using the CIDDS-001, UNSWNB15, and NSL-KDD datasets, Verma et al. demonstrated a number of shallow ML algorithms, such as random forest (RF), Adaboost (AB), gradient boosting machine (GBM), extremely randomized trees (ERT), classification and regression tree (CART), and multilayer perceptron (MLP) neural network, with RF achieving the best results with an accuracy rate of 94% [19].
Chopra et al. compared several rudimentary ML algorithms, including Naive Bayes, J48, RF, and ZeroR classifiers, for detecting and classifying DDoS attacks in IoT using the Bot-IoT dataset. However, the authors suggest that these models may not perform well when applied to large-scale IoT datasets due to the poor accuracy performance of naive ML algorithms in such contexts [20].
In Churcher et al.’s work [21], the Bot-IoT dataset was used to conduct binary and multiclass classification tasks. They utilized weight-based class balancing techniques to produce balanced and asymmetrical representations of the data. The authors used Scikit-Learn [22] and Keras [23] with their default hyperparameters and reported on performance indicators such as precision and F1 score. The initial Bot-IoT dataset contained 35 variables, such as timestamps and the Argus sequence number; after removing columns with missing values, text, and unnecessary columns, the final dataset contained just 19. The percentage of the validation set used in the 80/20 data divide for training and testing was not disclosed. Using weighted datasets for binary classification in DDoS and DoS attack protocols, the ANN consistently outperformed other models with an accuracy of 99. When used for multiclass categorization, the ANN has the highest precision (97%) across all attack types in the Bot-IoT dataset.
Alimi et al. [24] introduced a revised RLSTM deep learning model to identify DoS attacks in IoT networks. They evaluated the proposed RLSTM model using two standard datasets: CICIDS-2017 and NSL-KDS. The experiments demonstrated that the proposed model substantially enhanced the detection accuracy, precision, recall, and F1 score.
Almaraz-Rivera et al. [25] conducted research on DoS attacks on IoT networks and created an intrusion detection system based on ML and deep learning models to analyze the Bot-IoT dataset. Using a variety of performance criteria, they found that the models were, on average, more accurate than 95% of the time, with the decision tree and MLP models being the best for detecting DDoS and DoS attacks in IoT networks.
Susilo and Sari (2020) [26] proposed the use of several machine-learning and deep-learning strategies, including random forests (RF), convolutional neural network (CNN), and multi-layer perceptron (MLP), for improving the security performance of IoT networks. The authors developed an algorithm for detecting denial-of-service (DoS) attacks using a deep-learning algorithm. The BoT-IoT dataset is used to evaluate their work, and they found that the deep-learning model could increase accuracy, making the mitigation of attacks that occur on an IoT network as effective as possible.
In their study, Kumar et al. introduced a fog computing-based distributed Intrusion Detection System (IDS) for detecting Distributed Denial of Service (DDoS) attacks on mining pools in IoT networks enabled by blockchain technology. The proposed model is evaluated using Random Forests and an optimized gradient tree boosting system on distributed fog nodes, and the evaluation is conducted using the BoT-IoT dataset. The results demonstrate that XGBoost performs better in binary attack detection, while the Random Forest outperforms in multi-attack detection. Furthermore, the Random Forest exhibits faster training and testing times on distributed fog nodes compared to XGBoost [27].
Table 2 shows the summary of related works
Table 2 presents an overview of different shallow ML algorithms used for detecting DDoS and DoS attacks in IoT devices. The findings indicate that DL algorithms, such as CNN and MLP, outperform shallow ML classifiers in terms of accuracy. However, it is worth noting that the choice of dataset plays a crucial role in determining the accuracy of the model.
For example, Verma et al. achieved the highest accuracy of 94% using random forest on the CIDDS-001, UNSW-NB15, and NSL-KDD datasets, while Mohammed et al. observed varying results when comparing naive Bayes, Bayes Net, and ZeroR on the UNSW-NB15 dataset. It’s essential to use datasets that reflect the characteristics of IoT networks, such as Bot-IoT and UNSW-NB15, to evaluate the effectiveness of existing and future approaches instead of using non-IoT datasets like CICIDS-2019, CISIDS-2017, and NSL-KDD.
Moreover, Table 2 highlights the lack of attention given to utilizing Generative Adversarial Networks (GAN) or its variants, such as CTGAN, for enhancing the detection of DDoS and DoS attacks on IoT networks.

4. Proposed Approach

This section describes an approach to detect TCP and UDP DDoS and DoS attacks on IoT networks using CTGAN to produce adversarial samples that are highly representative of actual IoT network traffic. To further improve the accuracy of DDoS and DoS detection in IoT networks, these samples are used to train several shallow ML and DL classifiers. CTGAN can replicate data with near-perfect statistical accuracy since it is trained using real-world samples. Figure 1 depicts the three stages of the proposed approach: (1) data pre-processing, (2) synthetic data creation using CTGAN, and (3) machine learning-based DoS and DDoS detection. These three stages will be discussed in detail below.

4.1. Data Pre-Processing

Generally, data pre-processing helps improve the accuracy of models that use the data. In data pre-processing, it is essential to standardize or normalize the data to ensure that the features are on the same scale and have similar ranges. Without pre-processing, the accuracy of models that use the data can be compromised [28,29]. The measures taken to guarantee the quality of the dataset used in the research (refer to Section 5.1 for details about dataset used) are crucial. These measures include data cleaning, handling missing values, feature scaling, and transforming categorical variables. Failure to carry out these steps can lead to biased and unreliable results, rendering the entire research effort useless. Therefore, it is essential to prioritize data pre-processing to ensure accurate and reliable results. The explanation of measures carried out in this research is as follows:
  • Data cleansing: This procedure involves identifying data that is lacking, incorrect, erroneous, or irrelevant so it can be updated or removed. For example, if a feature has no available value in the dataset, it is assigned a value of 0.
  • Categorical data transformation: This step entails converting data from one format to another. For example, the characteristics of the String/Object datatype are substituted by a unique number. The Categorical data in the dataset used are: proto, saddr, sport, daddr, dport, category, subcategory. Table 3 shows sample of categorical data while Table 4 shows sample of categorical data transformation
  • Feature scaling: This procedure maps the information onto the unit sphere or converts it to the interval [0, 1] (or any other interval). Table 5 show sample of feature scaling of used dataset. Using Equation (1), we max-min normalize the feature vector:
    x i = x i min ( x ) max ( x ) min ( x )
Data filtering is an important technique in data analysis that helps to extract meaningful information from large and complex datasets. By selecting a subset of the data that meets specific criteria or conditions, filtering can help to reduce noise and improve the accuracy of statistical and ML models.
Analysis of network traffic data is crucial for detecting TCP and UDP DDoS and DoS attacks in IoT networks. Filtering the dataset to include only TCP and UDP protocols is a vital step in this process since these are the most often utilized protocols in these attacks. The accuracy of the analysis and the dependability of the findings are both improved by filtering out unnecessary data, resulting in only a more focused and better-quality dataset.

4.2. CTGAN-Based Synthetic Data Generation

GAN shows impressive results in generating syntactic images which do not applicable to IDS. Therefore, several GAN variants such as Wasserstein GAN (WGAN) [30], TGAN [31], and CTGAN [11] are proposed to generate synthesizing tabular data that is suitable to evaluate the performance of IDS in detecting the presence of attacks In this research, CTGAN was chosen to generate a syntactic dataset as it demonstrated superior performance compared to WGAN and TGAN, as reported by Bourou et al. [32]. It has shown promise in various applications, including fraud detection, rare event detection, and anomaly detection.
This stage is the core stage of the proposed approach which aims to generate synthetic data and perturb it to create adversarial samples is a promising approach for improving the robustness of ML and DL learning models for detecting DoS and DDoS detection in IoT networks.
CTGAN generator network can be represented as a function G ( X , Z ) , where X is the real data, and Z is a noise vector. The generator network inputs X and Z and produces a batch of synthetic data samples as outputs. The generator network is trained to minimize the distance between the distribution of the synthetic data and the real data distribution.
To create an adversarial example using CTGAN, a synthetic data sample x s y n is chosen from the batch that is closest to the decision boundary between the current predicted class and the target class. A small perturbation is added to x s y n to create an adversarial example x a d v . This perturbation can be represented as a function P ( x s y n ) , where P is a function that adds a small amount of noise or changes to x s y n . The success of the adversarial example x a d v is evaluated by computing the model’s output for x a d v and comparing it to the target class Y. If the model misclassifies x a d v , it is considered a successful adversarial example. Additionally, to address the extreme class imbalance in the Bot-IoT dataset, this approach proposes using synthetic data generation as a solution. The output of this stage is x a d v , which is used as input for the next stage.
The workflow of the proposed approach can be summarized as follows:
Synthetic Traffic Generation: The CTGAN-based IDS employs a generator network to produce synthetic traffic that closely mimics legitimate traffic patterns. This synthetic traffic generation step enables the IDS to effectively distinguish between legitimate and malicious traffic, facilitating accurate detection and mitigation of DoS and DDoS attacks.
Discriminator Network: The discriminator network, a crucial component within the CTGAN framework, learns to differentiate between legitimate and malicious traffic. By analyzing the characteristics and patterns of the traffic, the discriminator enhances the IDS’s ability to detect and classify attacks. This helps in effectively identifying and mitigating both DoS and DDoS attacks on IoT networks.
Enhanced Detection Models: The syntactic tabular data generated by CTGAN is utilized to train multiple shallow machine-learning and deep-learning classifiers. The training process involves using the synthetic data to enhance the performance of the detection models. This results in improved accuracy and effectiveness in detecting and mitigating DoS and DDoS attacks.
The proposed approach leverages the capabilities of CTGAN to generate synthetic traffic, train detection models, and enhance the overall performance of the IDS. By combining synthetic traffic generation, discrimination analysis, and improved detection models, the approach aims to enhance the security of IoT networks against DoS and DDoS attacks.

4.3. DoS and DDoS Attack Detection

During this phase, the focus is on training multiple shallow ML and DL models to create detection models capable of accurately detecting DDoS and DoS attacks. The adversarial examples generated in the previous stage, denoted as x a d v , are utilized as training data for these models. The main output of this phase is the resulting trained models, which can be deployed online to detect DDoS and DoS attacks in IoT networks.
It is worth noting that this approach is not limited to TCP and UDP DDoS and DoS attacks. It can be applied to various datasets, enabling the identification of different types of attacks across different domains or fields. The flexibility of this approach makes it adaptable and applicable to diverse scenarios where attack detection is required.
By training multiple models using the adversarial examples, the aim is to enhance the detection capabilities and robustness of the models against various attack scenarios. Once deployed, these trained models can effectively analyze network traffic data and accurately identify instances of DDoS and DoS attacks, contributing to the security and stability of IoT networks. The workflow of the proposed approach in detecting DDoS/DOS attacks in IoT network can be summarized as follows:
  • The CTGAN-based IDS employs a generator network to produce synthetic traffic that closely mimics legitimate traffic patterns. This synthetic traffic generation step enables the IDS to effectively distinguish between legitimate and malicious traffic, facilitating accurate detection and mitigation of DoS and DDoS attacks.
  • The discriminator network, a crucial component within the CTGAN framework, learns to differentiate between legitimate and malicious traffic. By analyzing the characteristics and patterns of the traffic, the discriminator enhances the IDS’s ability to detect and classify attacks. This helps in effectively identifying and mitigating both DoS and DDoS attacks on IoT networks.
  • The syntactic tabular data generated by CTGAN is utilized to train multiple shallow machine-learning and deep-learning classifiers. The training process involves using the synthetic data to enhance the performance of the detection models. This results in improved accuracy and effectiveness in detecting and mitigating DoS and DDoS attacks.

5. Experimental Results

This section describes the experimental setup, the data, the evaluation metrics, and the results of the proposed approach.

5.1. Dataset

The BoT-IoT dataset [33] is employed to assess the proposed method’s capability in detecting TCP and UDP DDoS and DoS attacks. This dataset, created by the Cyber Range Lab at The Center of UNSW Canberra Cyber, emulates a realistic network environment and encompasses both regular and botnet traffic in formats such as PCAP, argus, and CSV files. The complete dataset comprises over seventy-two million records, while a 10% subset contains approximately three million records. For our experiments, we utilized a 5% subset of the dataset, focusing on the top ten features. The BoT-IoT dataset was chosen due to its widespread use in existing research such as in [27,34]. It is a commonly utilized dataset that provides a comprehensive representation of various IoT network traffic scenarios. Researchers frequently rely on the BoT-IoT dataset for benchmarking intrusion detection systems and evaluating the performance of detection algorithms.
The number of records in the training and testing sets for each attack category in the BoT-IoT traffic is presented in Table 6. These attacks are classified into seven main categories, which are further mapped into five categories, as illustrated in Table 7. The information matrix of the training and testing datasets is depicted in Table 8 and Table 9, respectively.
The analysis of Table 6 reveals that UDP and TCP attacks are the predominant attack types within the 5% subset of the BoT-IoT dataset. Furthermore, Table 7 highlights that DDoS and DoS attacks constitute the majority of attacks in the BoT-IoT dataset. Hence, this research focuses on detecting UDP and TCP DDoS and DoS attacks in IoT networks. The distribution of DDoS and DoS attacks in the BoT-IoT dataset is visualized in Figure 2. Additionally, Table 10 presents the attack category distribution of the BoT-IoT dataset after applying the filtering process.
The pre-processed filtered dataset presented in Table 10 serves as the input for CTGAN to generate the syntactic dataset x a d v . In order to enable binary classification for shallow ML and DL classifiers, the DDoS and DoS categories are merged into a single category, labeled as 1, while the “normal” category is retained and labeled as 0. Consequently, the generated dataset consists of two primary classes: Attack (1) and normal (0). The distribution of attack categories in the synthetic dataset is shown in Table 11.
It is worth mentioning that during the transformation steps, the categorical data is converted to numerical values, as shown in Table 11. Moreover, the number of normal instances has increased from 118 to 441,101 instances. This increase in the number of normal instances solves the problem of severe imbalanced data in the BoT-IoT dataset. The generated dataset is used to train several shallow ML and DL classifiers.

5.2. Evaluation Metrics

Described below are metrics for measuring the efficacy of the proposed approach. Table 12 displays the evaluation metrics based on the various properties of the confusion matrix.
Numerous research studies, including [35,36], use the metrics employed here to evaluate the efficacy of IDS. The evaluation of the proposed approach requires the computation of all of these measures.

5.3. Results and Discussion

The objective of this section is to evaluate the efficiency of the syntactic tabular dataset, denoted as x a d v , which was generated using CTGAN, in improving the performance of detection models. To achieve this, we trained several shallow ML classifiers, namely Logistic Regression (LR) [37], Naive Bayes (NB) [38], Random Forest (RF) [39], Decision Tree (DT) [40], and Support Vector Machine (SVM) [41]. Additionally, we trained several deep learning classifiers, namely Long Short-Term Memory (LSTM) [42], Recurrent Neural Network (RNN) [43], and Gated Recurrent Units (GRUs) [44]. These classifiers were trained using the x a d v dataset and evaluated using an unseen testing dataset (5% testing dataset). The default parameters were used for shallow ML classifiers, while the parameters for DL classifiers were based on [45].
Table 13 presents the evaluation results of these models using the BoT-IoT dataset, while Table 14 presents the evaluation results using the synthetic dataset generated by CTGAN. These evaluation metrics provide insights into the effectiveness of the x a d v dataset in enhancing the performance of the detection models.
Table 13 presents the performance metrics of different models. The logistic regression, Naive Bayes, and SVM models have similar performance metrics, including a detection accuracy of 0.699, precision ranging from 0.367 to 0.849, recall score of 0.699, and F1 measure of 0.823. Similarly, the random forest classifier and decision tree classifier models share the same performance metrics, with a detection accuracy of 0.648, precision of 0.342, recall score of 0.683, and F1 measure of 0.786. Among all the models, the LSTM model demonstrates superior performance with a detection accuracy of 0.978, precision of 0.966, recall score of 1.0, and F1 measure of 0.984. On the other hand, the RNN and GRU models exhibit lower performance metrics, with a detection accuracy of 0.693 and 0.695, respectively, precision of 0.356 and 0.359, recall score of 0.698, and F1 measure of 0.819 and 0.820, respectively.
Table 14 displays the performance metrics of different models. The LSTM, RNN, and GRU models demonstrate the highest performance metrics in terms of detection accuracy, precision, recall score, and F1 measure. Specifically, the LSTM model achieves the highest detection accuracy of 0.994 and F1 measure of 0.996. The RNN and GRU models also exhibit strong performance, with a detection accuracy of 0.986, precision of 0.978, recall score of 1.000, and F1 measure of 0.990 and 0.986, respectively. The Naive Bayes and SVM models perform moderately well, with F1 measure scores of 0.9754 and 0.8086, respectively. On the other hand, the random forest classifier and decision tree classifier models show lower detection accuracy and F1 measure scores, indicating their limited effectiveness in detecting DoS and DDoS attacks in IoT networks. Table 15 highlights the enhancements achieved by CTGAN for each of the shallow ML and DL classifiers, emphasizing the improvements in the detection models when utilizing the syntactic tabular dataset generated by CTGAN.
Based on Table 15, it is observed that the models exhibit varying degrees of enhancement in their performance compared to the results listed in Table 13. For shallow ML classifiers, the NB model shows the highest enhancement in detection accuracy with a score of 0.267, followed by LR with a score of 0.193, and DT with a score of 0.183. SVM, on the other hand, has the lowest enhancement in detection accuracy with a score of 0.076. Regarding DL classifiers, the RNN model shows the highest enhancement in detection accuracy with a score of 0.293, followed by GRU with the same score. LSTM, however, has the lowest enhancement in detection accuracy with a score of 0.016. The table also presents the enhancement in precision, recall score, and F1 measure. It is observed that the NB model shows the highest enhancement in precision with a score of 0.598. RNN and GRU models exhibit the highest enhancement in recall score with a score of 0.302, while the LSTM model shows a negligible enhancement in recall score with a score of −0.001. The NB model also shows the highest enhancement in the F1 measure with a score of 0.1524, while RF exhibits a negligible enhancement with a score of −0.0095.
In summary, the results indicate that the CTGAN approach has a positive impact on the performance of both shallow ML and DL classifiers in most cases. Specifically, the NB, RNN, and GRU models demonstrate notable improvements in detection accuracy, precision, recall, and F1 measure when using the syntactic tabular dataset generated by CTGAN.

5.4. Discussion

Overall, the findings of this study reveal the significant impact of leveraging CTGAN (Conditional Table GAN) on the performance of intrusion detection models in IoT networks. By using CTGAN to generate synthetic attack instances and augment the training dataset, the LSTM, RNN, and GRU models have exhibited remarkable improvements in accurately detecting DoS and DDoS attacks. This highlights the importance of utilizing advanced data generation techniques to enhance the effectiveness of intrusion detection systems.
In contrast, the random forest classifier and decision tree classifier models have demonstrated comparatively weaker performance metrics in this study. These models, when trained on the original dataset without the benefits of CTGAN-generated synthetic data, may lack the ability to effectively capture the intricacies and complexities of modern IoT-based attacks. Therefore, caution is advised when considering the use of these models for intrusion detection in IoT networks.
To ensure robust and reliable intrusion detection, it is recommended to prioritize the utilization of LSTM, RNN, and GRU models, which have shown superior performance in accurately identifying DoS and DDoS attacks. These models, when combined with advanced data generation techniques like CTGAN, have the potential to significantly enhance the security of IoT networks against evolving attack vectors. Furthermore, a comparative analysis was conducted between our proposed approach and the work proposed in [26]. In order to ensure a fair comparison, the CNN and MLP models were implemented using the parameter settings specified in [26]. Subsequently, the performance of the CNN and MLP models was evaluated using the same batch sizes and epoch values as mentioned in [26].
Table 16 compares the results of the CNN and MLP models reported in [26] with the results obtained using CTGAN-generated synthetic data (batch size 32, varied epoch sizes). Similarly, Table 17 shows the comparison for batch size 64, and Table 18 for batch size 128.
The results tabulated in Table 16, Table 17 and Table 18 indicate that the proposed models consistently outperform the models reported in [26] in terms of mean accuracy in the majority of cases. This strongly suggests that the proposed approach, which utilizes CTGAN-generated synthetic data, significantly improves the detection accuracy of both CNN and MLP models across various batch sizes and epochs.
These findings highlight the effectiveness of the proposed approach in enhancing the performance of intrusion detection models. By leveraging CTGAN to generate synthetic data, the CNN and MLP models achieve higher mean accuracy, indicating their improved capability to accurately detect intrusion attempts in IoT networks. This improvement can be attributed to the ability of CTGAN to generate realistic synthetic data that captures the complexities of modern attack patterns, enabling the models to learn more effectively and make better predictions.
Overall, these results provide strong evidence supporting the efficacy of the proposed approach in improving the performance of intrusion detection systems. The use of CTGAN-generated synthetic data offers a promising avenue for enhancing the accuracy and reliability of detection models, ultimately contributing to the security and resilience of IoT networks against evolving cyber threats.

6. Conclusions and Future Works

Our study proposes a CGAN-based IDS for detecting DDoS attacks on IoT networks. The proposed IDS overcomes the limitations of existing IDS systems by employing a generator network to create synthetic traffic that imitates legitimate traffic patterns, and a discriminator network to detect anomalies. We evaluated the proposed approach using the BoT-IoT dataset in two scenarios.
In the first scenario, we evaluated multiple machines and deep learning classifiers using the original BoT-IoT dataset. This allowed us to establish a baseline performance for the detection models. In the second scenario, we evaluated the same machines and classifiers using the syntactic tabular dataset generated by CTGAN. This enabled us to assess the impact of using synthetic data on the performance of the detection models.
Our experimental results indicate that the syntactic tabular dataset significantly enhanced the detection model performance of multiple machines and deep learning classifiers. The use of synthetic data generated by CTGAN improved the models’ ability to accurately detect DDoS attacks on IoT networks. These findings demonstrate the effectiveness of our proposed CGAN-based IDS in improving the performance of intrusion detection systems.
In future work, we plan to investigate the effectiveness of our proposed CGAN-based IDS for detecting other types of attacks on IoT networks. By expanding the scope of our research, we aim to develop a comprehensive IDS solution that can effectively detect various types of intrusions in IoT environments. Additionally, we also aim to explore the potential of utilizing reinforcement learning techniques to further enhance the performance of the proposed Intrusion Detection System (IDS). By leveraging the capabilities of reinforcement learning, we anticipate achieving even higher accuracy and adaptability in detecting and mitigating attacks targeting IoT devices. Furthermore, we plan to evaluate the performance of the proposed approach using various benchmarking datasets to ensure its effectiveness and robustness across different scenarios.

Author Contributions

Writing—original draft preparation, B.A.A., S.D.A.R. and M.A.; writing—review and editing, M.A., B.A.A. and S.D.A.R.; Methodology B.A.A. and M.A.; project administration, B.A.A., resources, M.A. and B.A.A.; funding acquisition, B.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Distinguished Research Funding program grant code (NU/DRP/SERC/12/56).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Al-Sarawi, S.; Anbar, M.; Alieyan, K.; Alzubaidi, M. Internet of Things (IoT) communication protocols. In Proceedings of the 2017 8th International Conference on Information Technology (ICIT), Amman, Jordan, 17–18 May 2015. [Google Scholar]
  2. Amairah, A.; Al-Tamimi, B.N.; Anbar, M.; Aloufi, K. Cloud computing and internet of things integration systems: A review. Adv. Intell. Syst. Comput. 2019, 843, 406–414. [Google Scholar] [CrossRef]
  3. Kaspersky. DDoS Attacks in Q1 2020. Securelist. 2020. Available online: https://securelist.com/ddos-attacks-in-q1-2022/106358/ (accessed on 13 May 2023).
  4. NETSCOUT. Threat Intelligence Report: H1 2021. 2021. Available online: https://www.netscout.com/threat-intelligence-report-h1-2021 (accessed on 15 May 2023).
  5. Alzubi, Q.M.; Anbar, M.; Sanjalawe, Y.; Al-Betar, M.A.; Abdullah, R. Intrusion detection system based on hybridizing a modified binary grey wolf optimization and particle swarm optimization. Expert Syst. Appl. 2022, 204, 117597. [Google Scholar] [CrossRef]
  6. Alabsi, B.A.; Anbar, M.; Anickam, S. A comprehensive review on security attacks in dynamic wireless sensor networks based on RPL protocol. Int. J. Pure Appl. Math. 2018, 119, 12481–12495. [Google Scholar]
  7. Al-Amiedy, T.A.; Anbar, M.; Belaton, B.; Bahashwan, A.A.; Hasbullah, I.H.; Aladaileh, M.A.; Mukhaini, G.A. A systematic literature review on attacks defense mechanisms in RPL-based 6LoWPAN of Internet of Things. Internet Things 2023, 22, 100741. [Google Scholar] [CrossRef]
  8. Al-Amiedy, T.A.; Anbar, M.; Belaton, B.; Kabla, A.H.H.; Hasbullah, I.H.; Alashhab, Z.R. A Systematic Literature Review on Machine and Deep Learning Approaches for Detecting Attacks in RPL-Based 6LoWPAN of Internet of Things. Sensors 2022, 22, 3400. [Google Scholar] [CrossRef]
  9. Hoque, N.; Bhattacharyya, D.K.; Kalita, J.K. Botnet in DDoS attacks: Trends and challenges. IEEE Commun. Surv. Tutorials 2015, 17, 2242–2270. [Google Scholar] [CrossRef]
  10. Inayat, U.; Zia, M.F.; Mahmood, S.; Khalid, H.M.; Benbouzid, M. Learning-based methods for cyber attacks detection in IoT systems: A survey on methods, analysis, and future prospects. Electronics 2022, 11, 1502. [Google Scholar] [CrossRef]
  11. Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling Tabular data using Conditional GAN. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  12. Han, G.; Liu, S.; Chen, K.; Yu, N.; Feng, Z.; Song, M. Imbalanced sample generation and evaluation for power system transient stability using ctgan. In Proceedings of the Intelligent Computing & Optimization: Proceedings of the 4th International Conference on Intelligent Computing and Optimization 2021 (ICO2021) 3; Springer: Berlin/Heidelberg, Germany, 2022; pp. 555–565. [Google Scholar]
  13. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  14. Habibi, O.; Chemmakha, M.; Lazaar, M. Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection. Eng. Appl. Artif. Intell. 2023, 118, 105669. [Google Scholar] [CrossRef]
  15. Cvitic, I.; Perakovic, D.; Gupta, B.B.; Choo, K.K.R. Boosting-Based DDoS Detection in Internet of Things Systems. IEEE Internet Things J. 2022, 9, 2109–2123. [Google Scholar] [CrossRef]
  16. Roopak, M.; Tian, G.Y.; Chambers, J. An Intrusion Detection System Against DDoS Attacks in IoT Networks. In Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference, CCWC 2020, Vegas, NV, USA, 6–8 January 2020; pp. 562–567. [Google Scholar] [CrossRef]
  17. Hodo, E.; Bellekens, X.; Hamilton, A.; Dubouilh, P.L.; Iorkyase, E.; Tachtatzis, C.; Atkinson, R. Threat analysis of IoT networks using artificial neural network intrusion detection system. In Proceedings of the 2016 International Symposium on Networks, Computers and Communications (ISNCC), Dubai, United Arab Emirates, 31 October–2 November 2016; pp. 1–6. [Google Scholar]
  18. Mohammed, S. A Machine Learning-Based Intrusion Detection of DDoS Attack on IoT Devices. Int. J. Adv. Trends Comput. Sci. Eng. 2021, 10, 2792–2797. [Google Scholar] [CrossRef]
  19. Verma, A.; Ranga, V. Machine Learning Based Intrusion Detection Systems for IoT Applications. Wirel. Pers. Commun. 2020, 111, 2287–2310. [Google Scholar] [CrossRef]
  20. Chopra, A.; Behal, S.; Sharma, V. Evaluating machine learning algorithms to detect and classify DDoS attacks in IoT. In Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 17–19 March 2021; pp. 517–521. [Google Scholar]
  21. Churcher, A.; Ullah, R.; Ahmad, J.; Ur Rehman, S.; Masood, F.; Gogate, M.; Alqahtani, F.; Nour, B.; Buchanan, W.J. An experimental analysis of attack classification using machine learning in IoT networks. Sensors 2021, 21, 446. [Google Scholar] [CrossRef] [PubMed]
  22. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  23. Ketkar, N. Introduction to Keras. In Deep Learning with Python: A Hands-on Introduction; Apress: Berkeley, CA, USA, 2015. [Google Scholar]
  24. Alimi, K.O.A.; Ouahada, K.; Abu-Mahfouz, A.M.; Rimer, S.; Alimi, O.A. Refined LSTM Based Intrusion Detection for Denial-of-Service Attack in Internet of Things. J. Sens. Actuator Netw. 2022, 11, 32. [Google Scholar] [CrossRef]
  25. Almaraz-Rivera, J.G.; Perez-Diaz, J.A.; Cantoral-Ceballos, J.A. Transport and Application Layer DDoS Attacks Detection to IoT Devices by Using Machine Learning and Deep Learning Models. Sensors 2022, 22, 3367. [Google Scholar] [CrossRef]
  26. Susilo, B.; Sari, R.F. Intrusion Detection in IoT Networks Using Deep Learning Algorithm. Information 2020, 11, 279. [Google Scholar] [CrossRef]
  27. Kumar, R.; Kumar, P.; Tripathi, R.; Gupta, G.P.; Garg, S.; Hassan, M.M. A distributed intrusion detection system to detect DDoS attacks in blockchain-enabled IoT network. J. Parallel Distrib. Comput. 2022, 164, 55–68. [Google Scholar] [CrossRef]
  28. Rinnan, Å.; Nørgaard, L.; van den Berg, F.; Thygesen, J.; Bro, R.; Engelsen, S.B. Data pre-processing. In Infrared Spectroscopy for Food Quality Analysis and Control; Academic Press: Cambridge, MA, USA, 2009; pp. 29–50. [Google Scholar]
  29. Kuhn, M.; Johnson, K.; Kuhn, M.; Johnson, K. Data Pre-Processing; Springer: New York, NY, USA, 2013. [Google Scholar]
  30. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; Volume 1, pp. 298–321. [Google Scholar]
  31. Xu, L.; Veeramachaneni, K. Synthesizing Tabular Data using Generative Adversarial Networks. arXiv 2018, arXiv:1811.11264. [Google Scholar]
  32. Bourou, S.; El Saer, A.; Velivassaki, T.H.; Voulkidis, A.; Zahariadis, T. A review of tabular data synthesis using gans on an ids dataset. Information 2021, 12, 375. [Google Scholar] [CrossRef]
  33. Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
  34. Le, T.T.H.; Kim, H.; Kang, H.; Kim, H. Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method. Sensors 2022, 22, 1154. [Google Scholar] [CrossRef] [PubMed]
  35. Comparative performance analysis of classification algorithms for intrusion detection system. In Proceedings of the 2016 14th Annual Conference on Privacy, Security and Trust, PST 2016, Auckland, New Zealand, 12–14 December 2016; pp. 282–288. [CrossRef]
  36. Aladaileh, M.A.; Anbar, M.; Hintaw, A.J.; Hasbullah, I.H.; Bahashwan, A.A.; Al-Sarawi, S. Renyi Joint Entropy-Based Dynamic Threshold Approach to Detect DDoS Attacks against SDN Controller with Various Traffic Rates. Appl. Sci. 2022, 12, 6127. [Google Scholar] [CrossRef]
  37. Kleinbaum, D.G.; Dietz, K.; Gail, M.; Klein, M.; Klein, M. Logistic Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  38. Wickramasinghe, I.; Kalutarage, H. Naive Bayes: Applications, variations and vulnerabilities: A review of literature with code snippets for implementation. Soft Comput. 2021, 25, 2277–2293. [Google Scholar] [CrossRef]
  39. Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
  40. Charbuty, B.; Abdulazeez, A. Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
  41. Noble, W.S. What is a support vector machine? Nature Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
  42. Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
  43. Nanduri, A.; Sherry, L. Anomaly detection in aircraft data using Recurrent Neural Networks (RNN). In Proceedings of the 2016 Integrated Communications Navigation and Surveillance (ICNS), Herndon, VA, USA, 19–21 April 2016; p. 5C2-1. [Google Scholar]
  44. Sattari, M.T.; Apaydin, H.; Shamshirband, S. Performance evaluation of deep learning-based gated recurrent units (GRUs) and tree-based models for estimating ETo by using limited meteorological variables. Mathematics 2020, 8, 972. [Google Scholar] [CrossRef]
  45. Elejla, O.E.; Anbar, M.; Hamouda, S.; Faisal, S.; Bahashwan, A.A.; Hasbullah, I.H. Deep-Learning-Based Approach to Detect ICMPv6 Flooding DDoS Attacks on IPv6 Networks. Appl. Sci. 2022, 12, 6150. [Google Scholar] [CrossRef]
Figure 1. Proposed approach.
Figure 1. Proposed approach.
Sensors 23 05644 g001
Figure 2. Proportion of DDoS and DoS attacks in the BoT-IoT dataset.
Figure 2. Proportion of DDoS and DoS attacks in the BoT-IoT dataset.
Sensors 23 05644 g002
Table 1. Attacks on IoT networks other than DDoS and DoS.
Table 1. Attacks on IoT networks other than DDoS and DoS.
Attack TypeExplanation
Malware and
Ransomware
Malicious programs that are downloaded and installed on IoT
gadgets and then cause damage, steal information, or turn
the gadgets into part of a botnet. Data on a device is encrypted and
then locked until a ransom is paid
Man-in-the-Middle
(MitM)
The connection between IoT devices and the network may
be intercepted by hackers, allowing them to eavesdrop,
alter data, or insert harmful instructions. This leads to compromised
security, altered data, or outright device control.
Physical AttacksPhysically accessing or tampering with an IoT device with the
intent of stealing data, changing its behavior, or obtaining
control over it. To identify and prevent such attacks,
strong physical security measures are required.
Privilege EscalationGaining administrative access by exploiting flaws in the software
or configuration of an IoT device. Because of this, malicious actors
may get access to private information, change the way
a device normally operates, or even go beyond its limits.
Information LeakageThe disclosure of private information, such as user passwords,
configuration settings, or personal data, by IoT devices
without permission. Those who would steal identities
or get access illegally or maliciously take advantage of
this vulnerability.
Replay AttacksA method of recording and then playing back authorized interaction
between IoT gadgets. Because to this, malicious acts, entry into
protected regions, and authentication bypass are all possible.
DNS AttacksDNS hijacking is the practice of diverting traffic from
legitimate websites to malicious ones. Because of this,
unauthorized parties may gain access to or modify
information sent from an IoT device to its intended recipient.
Firmware AttacksTaking advantage of security holes in the firmware of
embedded systems used in IoT devices. Software that has been
compromised may be used to take over a device, modify its behavior,
or install malicious software. The security and functioning
of a device may be severely compromised
by an attack on its firmware.
Table 2. Summary of related works.
Table 2. Summary of related works.
ReferenceAlgorithmDatasetAccuracy
[15]
logistic model trees
IoT device classes99.92% to 99.99%
[16]Convolutional neural network
(CNN) with LSTM
CISIDS-201799.03%
[17]Multi-layer perceptron (MLP)Various types
of DDoS
and DoS attacks
High accuracy
[18]DT, k-NN, and NBCICIDS-2019100%, 98%, 29%
[19]RF, AB, GBM,
ERT, CART, and MLP
CIDDS-001,
UNSWNB15,
NSL-KDD
94% (RF)
[20]Naive Bayes, Bayes Net, ZeroRUNSW-NB15Varying results
[21]Artificial Neural Networks (ANN)BoT-IoT99% (binary class)
and 97%
(multiclass class)
[24]Refined long short-term memory
(RLSTM) deep learning model
CICIDS-2017
and NSL-KDS
Outperforms
other methods
[25]Machine Learning and
Deep Learning models
(Decision Tree and
Multi-layer Perceptron)
Bot-IoTAverage accuracy
over 99%
[26]CNN
Multi-layer Perceptron
RF
Bot-IoTAverage accuracy
92.85%
[27]Random Forest
XGbooest
Bot-IoTAverage accuracy
99%
Table 3. Sample of categorical data.
Table 3. Sample of categorical data.
ProtoSaddrSportDaddrDportCategorySubcategory
udp192.168.100.1506551192.168.100.380DDoSUDP
tcp192.168.100.1505532192.168.100.380DDoSTCP
tcp192.168.100.14727,165192.168.100.380DDoSTCP
udp192.168.100.15048,719192.168.100.380DoSUDP
udp192.168.100.14722,461192.168.100.380DDoSUDP
Table 4. Sample of categorical data transformation.
Table 4. Sample of categorical data transformation.
ProtoSaddrSportDaddrDportCategorySubcategory
4461,68513419107
3450,36313419106
3119,08013419106
4443,02813419117
4113,85413419107
Table 5. Sample of feature scaling of used dataset.
Table 5. Sample of feature scaling of used dataset.
pkSeqIDProtoSaddrSportDaddrDportSeq
0.8566841.000.2666670.9411810.2653060.8879240.961012
0.6630090.750.2666670.7684310.2653060.8879240.979089
0.5387220.750.0666670.2911200.2653060.8879240.239964
0.3382171.000.2666670.6565150.2653060.8879240.378203
0.8880941.000.0666670.2113820.2653060.8879240.400685
Table 6. Attack distribution in the training and testing datasets in BoT-IoT traffic.
Table 6. Attack distribution in the training and testing datasets in BoT-IoT traffic.
Attack TypeTraining DatasetTesting Dataset
UDP566,132396,580
TCP455,737318,337
Service_Scan20,78814,542
OS_Fingerprint50583621
HTTP721504
Normal118107
Keylogging2014
Data_Exfiltration10
Total1,048,575733,705
Table 7. Attack Category Distribution in BoT-IoT Traffic.
Table 7. Attack Category Distribution in BoT-IoT Traffic.
CategoryTraining DatasetTesting Dataset
DDoS550,955385,309
DoS471,635330,112
Reconnaissance25,84618,163
Normal118107
Theft2114
Total1,043,575733,705
Table 8. Information matrix of training dataset.
Table 8. Information matrix of training dataset.
Column NameCountMeanStdMinMax
pkSeqID1,048,5751,833,7361,058,7965.03,668,519
seq1,048,575121,283.375,795.081.0262,207
stddev1,048,5750.8868130.8034540.02.496763
N_IN_Conn
_P_SrcIP
1,048,57582.5813524.366421.0100.0
min1,048,5751.0190181.4842720.04.980471
state_number1,048,5753.1346011.1864061.011.0
mean1,048,5752.2316641.5177820.04.981882
N_IN_Conn
_P_DstIP
1,048,57592.4820818.134281.0100.0
drate1,048,5750.45715667.194960.058,823.53
srate1,048,5753.4976121058.1120.01,000,000.0
max1,048,5753.0209401.8606180.04.999999
attack1048575.00.99988750.01060760.01.0
Table 9. Information matrix of testing dataset.
Table 9. Information matrix of testing dataset.
Column NameCountMeanStdMinMax
pkSeqID733,7051,834,4721,058,8262.03,668,507
seq733,705121,412.81989275,823.398841.0262,212
stddev733,7050.8878940.8040130.02.496758
N_IN_Conn
_P_SrcIP
733,70582.49255124.4261451.0100.0
min733,7051.0188681.4842350.04.980470
state_number733,7053.1350731.1864271.011.0
mean733,7052.2334291.5175720.04.981785
N_IN_Conn
_P_DstIP
733,70592.42776318.2160761.0100.0
drate733,7050.50629874.3301750.058,823.53
srate733,7052.262398403.4080920.0333,333.3125
max733,7053.0230001.8607250.04.999999
attack733,7050.9998540.0120750.01
Table 10. Attack Category Distribution of BoT-IoT Dataset After Applying the Filtering.
Table 10. Attack Category Distribution of BoT-IoT Dataset After Applying the Filtering.
CategoryProtocolNumber of Records
DDoSTCP279,601
UDP271,056
Total of DDoS records 550,657
DoSTCP295,063
UDP176,123
Total of DoS records 471,186
NormalTCP92
UDP13
ARP10
IPV6-ICMP3
Total of Normal records 118
Total of records 1,021,961
Table 11. Attack category distribution of the synthetic dataset.
Table 11. Attack category distribution of the synthetic dataset.
CategoryTraffic TypeNumber of Packets
0 (normal)4 (TCP)347,715
3 (UDP)94,386
1 (attack)4 (TCP)313,836
3 (UDP)244,063
Total number of records 1,000,000
Table 12. Evaluation metrics.
Table 12. Evaluation metrics.
Evaluation MetricDefinition
True positive (TP)Conditions under which the classifier makes the right decision
an attack
False negative (FN)This is a condition in which the classifier incorrectly labels
an attack as normal.
False positive (FP)Refers to situations in which the classifier incorrectly identifies
a normal instance as an attack.
True negative (TN)This is the situations in which
the classifier makes the right call common occurrences
PrecisionThe ratio of accurately predicted attacks
to all samples predicted as attacks.
Precision = TP / (TP + FP)
Recall /
Detection Rate
The proportion of all attack samples correctly classified
as attacks
vs. all attack samples.
Recall = TP / (TP + FN)
False Alarm Rate /
False Positive Rate
The ratio of incorrectly predicted attack samples
vs. all normal samples.
False Alarm Rate = FP / (TN + FP)
True Negative RateThe proportion of correctly classified normal
samples vs. all normal samples.
True Negative Rate = TN / (TN + FP)
AccuracyThe proportion of instances correctly
classified vs. the total number of instances.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
F1-measureThe harmonic means of precision and recall.
F1 Measure = 2 × (Precision x Recall) / (Precision + Recall)
Table 13. Evaluation results using the BoT-IoT dataset.
Table 13. Evaluation results using the BoT-IoT dataset.
ModelDetection AccuracyPrecisionRecall ScoreF1 Measure
LR0.6990.3670.6990.823
NB0.6990.3510.6990.823
RF0.6480.3420.6830.786
DT0.6480.3420.6830.786
SVM0.6990.8490.6990.823
LSTM0.9780.9661.0000.984
RNN0.6930.3560.6980.819
GRU0.6950.3590.6980.820
Table 14. Evaluation results using a synthetic dataset generated by CTGAN.
Table 14. Evaluation results using a synthetic dataset generated by CTGAN.
ModelDetection AccuracyPrecisionRecall ScoreF1 Measure
LR0.8920.8681.00.9170
NB0.9660.9491.00.9754
RF0.7440.7701.00.7765
DT0.8310.8201.00.8629
SVM0.7750.7861.00.8086
LSTM0.9940.9910.9990.996
RNN0.9860.9781.00.990
GRU0.9810.9711.00.986
Table 15. Enhancements made by CTGAN for each shallow ML and DL classifiers.
Table 15. Enhancements made by CTGAN for each shallow ML and DL classifiers.
ModelDetection AccuracyPrecisionRecall ScoreF1 Measure
LR0.1930.5010.3010.094
NB0.2670.5980.3010.1524
RF0.0960.4280.317−0.0095
DT0.1830.4780.3170.0769
SVM0.076−0.063 0.301−0.0144
LSTM0.0160.025−0.0010.012
RNN0.2930.6220.3020.171
GRU0.2860.6120.3020.166
Table 16. The result of batch size 32.
Table 16. The result of batch size 32.
EpochWork in [26]Mean AccuracyProposed WorkMean Accuracy
10CNN90.85%CNN97.48%
10MLP53.07%MLP97.63%
30CNN89.82%CNN83.65%
30MLP62.95%MLP97.37%
50CNN88.30%CNN79.09%
50MLP62.00%MLP97.23%
Table 17. The result of batch size 64.
Table 17. The result of batch size 64.
EpochWork in [26]Mean AccuracyProposed WorkMean Accuracy
10CNN91.15%CNN96.86%
10MLP76.92%MLP97.25%
30CNN91.02%CNN80.20%
30MLP54.04%MLP97.49%
50CNN90.64%CNN80.11%
50MLP53.89%MLP97.28%
Table 18. The result of batch size 128.
Table 18. The result of batch size 128.
EpochWork in [26]Mean AccuracyProposed WorkMean Accuracy
10CNN90.87%CNN95.17%
10MLP54.10%MLP97.20%
30CNN90.76%CNN79.97%
30MLP54.43%MLP97.16%
50CNN91.27%CNN80.96%
50MLP79.01%MLP97.18%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alabsi, B.A.; Anbar, M.; Rihan, S.D.A. Conditional Tabular Generative Adversarial Based Intrusion Detection System for Detecting Ddos and Dos Attacks on the Internet of Things Networks. Sensors 2023, 23, 5644. https://doi.org/10.3390/s23125644

AMA Style

Alabsi BA, Anbar M, Rihan SDA. Conditional Tabular Generative Adversarial Based Intrusion Detection System for Detecting Ddos and Dos Attacks on the Internet of Things Networks. Sensors. 2023; 23(12):5644. https://doi.org/10.3390/s23125644

Chicago/Turabian Style

Alabsi, Basim Ahmad, Mohammed Anbar, and Shaza Dawood Ahmed Rihan. 2023. "Conditional Tabular Generative Adversarial Based Intrusion Detection System for Detecting Ddos and Dos Attacks on the Internet of Things Networks" Sensors 23, no. 12: 5644. https://doi.org/10.3390/s23125644

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop