Robust DDoS attack detection with adaptive transfer learning

In the evolving cybersecurity landscape, the rising frequency of Distributed Denial of Service (DDoS) attacks requires robust defense mechanisms to safeguard network infrastructure availability and integrity. Deep Learning (DL) models have emerged as a promising approach for DDoS attack detection and mitigation due to their capability of automatically learning feature representations and distinguishing complex patterns within network traffic data. However, the effectiveness of DL models in protecting against evolving attacks depends also on the design of adaptive architectures, through the combination of appropriate models, quality data, and thorough hyperparameter optimizations, which are scarcely performed in the literature. Also, within adaptive architectures for DDoS detection, no method has yet addressed how to transfer knowledge between different datasets to improve classification accuracy. In this paper, we propose an innovative approach for DDoS detection by leveraging Convolutional Neural Networks (CNN), adaptive architectures, and transfer learning techniques. Experimental results on publicly available datasets show that the proposed adaptive transfer learning method effectively identifies benign and malicious activities and specific attack categories.


Introduction
Distributed Denial of Service (DDoS) attacks are a significant threat to organizations worldwide (Chadd, 2018).These attacks have the potential to paralyze networks, making them inaccessible to legitimate users and causing severe disruptions in service availability and integrity.The ability to detect and mitigate DDoS attacks has therefore become vital to ensuring the resilience and security of critical infrastructure.Given the rising frequency and complexity of DDoS attacks in the cybersecurity landscape, it is imperative to develop effective intrusion detection systems (IDS) to ensure network infrastructure integrity and availability.Deep learning (DL) models have emerged as a promising approach for detecting and mitigating such attacks by automatically learning complex patterns from network traffic data (Diro and Chilamkurti, 2018), (Gümüs ¸bas ȩt al., 2020) and various DL models are being developed to enhance the detection of DDoS attacks.However, mainly due to the dynamic nature of attackers' behavior and evolving cyber threats, maintaining up-to-date models can be a challenging task (Kolias et al., 2017).Furthermore, developing DL models for intrusion detection faces another significant challenge due to the limited availability of data required for effective training, with the consequence that the scarcity of adequately sized and highquality training datasets hinders the widespread adoption of DL in IDSs.To mitigate this aspect, transfer learning approaches have been considered to train DL models by leveraging data originating from different sources and increasing detection accuracy (Das et al., 2022).However, no method in the literature has yet considered the design of DL models trained using transfer learning that can adapt to evolving attacks by using adaptive architectures.
This paper proposes a novel methodology based on DL for DDoS detection that leverages adaptive architectures in a transfer learning modality, to achieve an accurate classification of benign vs malicious networks in evolving scenarios.Our approach employs customized CNN models with diverse layer configurations, in addition to several publicly available models such as VGG16, VGG19, and ResNet50.We train the models, considering both a binary and a multi-label classification, by adopting transfer learning techniques while adaptively optimizing hyperparameters, introducing a dynamic and flexible approach that enhances the robustness and efficiency of DDoS attack detection.
The remainder of the paper is structured as follows.Section 2 provides an overview of related works in the field of DL and transfer learning-based DDoS detection and hyperparameter tuning.Section 3 presents the methodology and framework employed in our proposed approach.Section 4 discusses the results and performance analysis of our proposed methodology.Finally, Section 5 concludes the paper.
In the context of DDoS attack detection, various studies have employed DL techniques with significant success.The papers by (Sabeel et al., 2019) and (Cil et al., 2021) evaluate the effectiveness of DL techniques to improve the detection accuracy of DDoS attacks.The work presented in (Shaaban et al., 2019) delves into the application of the CNN models for large-scale DDoS attack detection within software-defined networks (SDN).Furthermore, (Chen et al., 2019) and (Nugraha and Murthy, 2020) introduced multi-channel CNN and hybrid CNN-LSTM (Long Short-Term Memory) models.Another study (Yeom et al., 2022) introduced a collaborative LSTMbased DDoS detection framework to address the challenges of irregular traffic patterns.These studies showed the promising potential of CNN-based DL models for the efficient detection of DDoS attacks.
Differently from the approaches presented above, which only consider a single DL model, the method described in (Elsaeidy et al., 2021) combines the strengths of various models to enhance both the accuracy and the robustness of detection systems.Furthermore, (Wei et al., 2021) demonstrated the effectiveness of integrating a Multilayer Perceptron (MLP) with an Autoencoder (AE) for DDoS detection and classification.Complementing these advancements, (Hnamte and Hussain, 2023) proposes a hybrid model combining CNNs and Bidirectional Long Short-Term Memory (BiLSTM) networks.This approach leverages CNNs' ability in feature extraction and pattern recognition, alongside BiLSTMs' capability to understand sequence and temporal dependencies in the data streams.
In exploring the complex landscape of DDoS attacks, it is essential to recognize the heterogeneity of these threats and gain a comprehensive understanding of the advanced defensive mechanisms required for protecting cloud-based infrastructures.To this purpose, the work in (Agrawal and Tapaswi, 2019) highlighted the various DDoS attacks and their corresponding defensive approaches to protect cloud infrastructures.Moreover, the work in (Venkatesan et al., 2016) presented a moving target defense technique, shifting proxy servers and remapping client connections, effectively disrupting attackers' efforts to map out and exploit network vulnerabilities.Similarly, (Kansal and Dave, 2017) introduced a method that uses load-balancing algorithms alongside attack proxies to differentiate between malicious insiders and genuine clients, adding an extra layer of security.Furthermore, (Jia et al., 2014) developed a cloud-enabled defense mechanism that employs selective server replication and intelligent client reassignment, effectively turning victim servers into dynamic targets to isolate attacks.
In response to the prevalent challenges of scarce labeled data in developing DL models for DDoS detection, current research emphasizes the integration of transfer learning techniques.This kind of approach leverages knowledge from pre-trained models, which have been trained on extensive datasets, to enhance learning efficiency and accuracy in tasks constrained by limited labeled data availability (Masum and Shahriar, 2021).Such as the method described in (Wu et al., 2019), which demonstrates the effectiveness of transfer learning in IDS, leveraging knowledge from pre-trained models.Transfer learning has also been applied for DDoS attack detection in IoT environments.For example, the work by (Okey et al., 2023), (Zhang et al., 2021), (Rodríguez et al., 2022), (Xue et al., 2022) and (Vu et al., 2020) has demonstrated the adaptation of pre-trained DL models for IDS in IoT.Furthermore, the works presented in (Yang and Shami, 2022) proposed a CNN-based transfer learning approach specifically tailored for IDS in the Internet of Vehicles (IoV).
Although DL models have demonstrated proficiency in identifying known cyber threats, they often face challenges in detecting new or evolving DDoS attack patterns.To address this challenge, adaptive DL techniques have been proposed for DDoS attack detection.As an example, the work described in (Cheng et al., 2018) introduced a method based on multiple-kernel learning, while (Kushwah and Ranga, 2021) employed an improved self-adaptive evolutionary extreme learning approach.Furthermore, the method introduced in (Agostinello et al., 2023) consists of a DL approach for DDoS attack detection using adaptive architectures with an optimized number of neurons.
While DL-based approaches for DDoS detection using transfer learning or adaptive architectures have been proposed in the literature, to the best of our knowledge, no approach has yet considered adaptive architectures in a transfer learning modality.To address these gaps, our paper proposes an adaptive DL approach for DDoS detection within a transfer learning framework.

Methodology
This section explains our proposed framework for DDoS detection using DL models trained using the adaptive transfer learning procedure.The methodology comprises five steps: i) data preprocessing, ii) CNN models, iii) transfer learning, iv) hyper parameter optimization, and v) model evaluation and selection.Figure 1 outlines the proposed methodological framework.

Data Preprocessing
Data preprocessing consists of i) data cleaning, ii) data transformation, iii) data dimensionality reduction, and iv) data conversion Data cleaning.We initially focused on validating and correcting inconsistencies and errors within the dataset to ensure its integrity for model training.First, we removed columns lacking useful values, including socket-related features, and those filled solely with zeros.We then eliminated duplicate rows and rows containing NaN values.Finally, we replaced all infinite and null values with -1.
Data transformation.This task encompasses dataset transformation aimed at ensuring consistent numerical values across diverse datasets.Initially, we achieve this by normalizing numerical values within the [0, 1] range through the min-max method.Additionally, categorical features undergo label encoding, which converts categorical values into numerical counterparts.This process utilizes two methods: a label encoder, which converts each label into a unique numerical value, and a onehot encoder (OHE), which transforms labels into n-dimensional binary vectors, where n is the number of labels.
Data dimensionality reduction.This step aims to reduce the number of features to decrease noise, accelerate training, and achieve a consistent number of features across diverse datasets.
To execute this reduction, we apply the PCA technique by determining the optimal number of principal components using the maximum likelihood estimation (MLE) method, a statistical approach for estimating the parameters of a probability distribution that best describes a set of observed data (Ogbuanya, 2021).

Data conversion.
The pre-trained CNN models that we consider in this paper, VGG16, VGG19, and ResNet50, have been trained on image datasets.However, network traffic datasets are typically captured in non-image formats, such as .csvor .pcapformats.To enhance the effectiveness of DDoS attack detection through the application of transfer learning, it is important to transform this non-image network traffic data into an image-compatible format suitable for CNNs.We first scale the numeric features of each dataset to a range of [0, 1] to normalize the data.Following this initial normalization, we apply the quantile transform technique to each feature.This method involves discretizing the normalized values into quantiles, which are then mapped onto a new scale ranging from 0 to 255.This adjustment aligns the data values with the standard range of pixel intensities used in image processing, facilitating their interpretation as image pixels.Using this quantile-scaled data, we generate images for each category within the datasets, including various types of network attacks and benign traffic.
Initially, these images are created with dimensions of 9 × 9 pixels and are encoded in three color channels (RGB), which allows us to capture and distinguish a broad spectrum of feature variations through color differentiation.If the number of features is lower, we add padding to maintain consistency.To ensure that these images are compatible with commonly used pre-trained models such as VGG16, VGG19, and ResNet50, we standardize the dimensions of these initial 9 × 9 images to 224 × 224 pixels, maintaining a three-channel (RGB) format.

CNN Models
In our work, we consider three different customized CNN DL architectures to evaluate the behavior under CNNs with varying depths for one-dimensional input vectors, namely i) Conv4, ii) Conv8, and iii) Conv18 and three pre-trained models, specifically VGG16, VGG19 and ResNet50.For each architecture, we explore two variants of classification types: one conducts binary classification, distinguishing benign from DDoS attacks, and the other performs multi-label classification, aiding in the identification of each specific type of attack.Below, we elaborate on the configurations of these customized CNN architectures.
Conv4.The customized four-layer CNN applies convolutional processing to the input data, enhances the model's non-linearity with ReLU activation functions after the first and third convolutional layers, and utilizes max-pooling operations to downsample the data for improved feature extraction.
In this paper, a 1D CNN architecture with 4 layers is designed to meet our task's demands.Illustrated in Figure 2, the model begins with an input layer (N, 1), followed by Conv1D operations (Conv i ) using F filters, K-sized kernels, and relu activation.After the convolution operation, global average pooling actively reduces the spatial dimensions.To prevent overfitting and improve generalization, dropout and regularization techniques (L1/L2) are incorporated into the architecture.Dropout layers with rates between 0.1 and 0.5 are inserted after each Conv1D layer, and regularization is applied to the convolutional layers.A dense layer with output dimension H and relu activation is next.The final layer is a dense layer with O output classes and softmax activation.

J o u r n a l P r e -p r o o f
Journal Pre-proof Dense ( 1024) Conv8.Building upon Conv4, we extend our model with 8 convolution layers.This expansion enables us to capture more complex and abstract patterns within the data.The architecture depicted in Figure 2 includes added layers that facilitate a deeper feature extraction process, empowering the model to excel in tasks that demand a higher level of complexity and feature representation.
Conv18.We extend Conv4 and Conv8 by incorporating 18 convolution layers.This model, with its increased depth, captures an even larger hierarchy of features in the dataset representations (see Figure 2).

Transfer Learning and Fine-Tuning
In this paper, we utilize transfer learning, accompanied by finetuning, to improve model adaptability and convergence, enabling efficient knowledge transfer from a source dataset to a target dataset.Fine tuning is applied to models pre-trained on large datasets to effectively adapt and perform well even when tuned with comparatively smaller datasets.In this way, we leverage the learned features from the large dataset, applying them to a smaller, possibly more specific dataset, to enhance learning efficiency and performance.
In our methodology, the optimization process begins with training the source model, which is formalized as follows: Equation 1 describes the process of iteratively updating the parameters Θ s of the source model M s to minimize the loss function L over the source dataset S .The best parameters, Θ * s are achieved at the end of this training phase and serve as the initial settings for the subsequent deep tuning phase applied to the target model.This sequential approach ensures that the source model's insights are not discarded but rather enhanced to suit the new data context represented by dataset T .Thus, the transition from the source model to the target model involves an initial parameter transfer followed by fine-tuning, as outlined in Equation 2.
Here, T represents the target dataset, and L is the loss function specifically adapted to the target's requirements.In Equation 2, the fine-tuning starts from the parameter set Θ * s , thus leveraging the pre-trained state to accelerate and refine the learning process on T .This method is particularly effective for scenarios where the source and target datasets are related but distinct enough to require fine-tuning, such as in domain adaptation tasks.
Specifically, for binary classification scenarios, we employ a binary cross-entropy loss: where y is the ground truth label (0 for benign, 1 for DDoS) and y ′ is the predicted probability of DDoS by the model.In multi-class classification, we have employed a categorical crossentropy loss: where L(y, y ′ ) is the categorical cross-entropy loss, y is a onehot encoded vector representing the true class labels, and y ′ is a vector of predicted class probabilities produced by the model.

Hyperparameter Optimization
Hyperparameters play a critical role in determining the model's performance and effectiveness.The following hyperparameters were selected and tuned for optimal results: learning rate, batch size, dropout rate, regularization parameters (L1 and L2), and number of layers.The rationale behind the selection of these hyper-parameters stems from their significant impact on the model's performance and generalization ability.By tuning these hyperparameters, we aim to achieve the best trade-off between accuracy, computational efficiency, and model robustness.Additionally, we consider the specific requirements of DDoS detection in cybersecurity, including the diverse range of attack scenarios and the distinct characteristics of network traffic, when determining the optimal hyperparameter values.
When it comes to hyperparameter optimization, several techniques can be employed, including random search, grid search, Bayesian optimization, and hyperband.Hyperband improves on random search by efficiently prioritizing configurations using explore-exploit principles, allocating resources more effectively to find the best settings.In this paper, we have used the hyperband keras library for hyperparameter tuning.We opted for this approach due to its well-balanced trade-off between time, resource utilization, and performance.
In this paper, we have employed a four-step approach for fine-tuning and hyperparameter optimization in our models.i) Model definition.We select and define the specific DL architecture tailored to our dataset, establishing the foundation for our optimization process.ii) Hyperparameter selection.We identify the hyperparameters for tuning, specific to the chosen DL architecture.iii) Search space definition.We establish the search space for each hyperparameter by specifying their possible range or values, iv) Search algorithm specification.We apply the hyperband search algorithm to efficiently navigate the hyperparameter space.
We executed the algorithm specified in Algorithm 1 by utilizing the defined search space.In this context, units refer to the number of neurons in a given layer of our neural network model.

Databases used and Preprocessing
To evaluate the performance of our proposed adaptive transfer learning models, we selected four well-known datasets in cyber security: KDDCup'99 (Bay et al., 2000), UNSW-NB15 (Moustafa and Slay, 2015), CSE-CIC-IDS2018 (Sharafaldin et al., 2018), and CIC-DDoS2019 (Sharafaldin et al., 2019).These datasets are widely recognized as industry benchmarks in the domain of cybersecurity (Gümüs ¸bas ¸et al., 2020;Sharafaldin et al., 2017).They encompass a wide spectrum of attack scenarios, providing us with the means to effectively train DL models to detect a variety of attack types.Specifically, we chose UNSW-NB15 for its realistic network traffic patterns, KDDCup'99 for its comprehensive set of network intrusions, CSE-CIC-IDS2018 J o u r n a l P r e -p r o o f Journal Pre-proof for its modern attack and traffic types, and CIC-DDoS2019 for its detailed DDoS attack scenarios.This diversity allows us to evaluate the robustness and efficacy of models across different types of network environments and attack vectors.In the following, we delve into detailed explanations of these datasets and the corresponding preprocessing.
KDDCup'99 dataset (Bay et al., 2000).The KDDCup'99 dataset was specifically created for the KDDcup 1999 competition which aimed to develop effective methods for detecting unauthorized access and malicious activities in computer networks.This dataset includes an extensive collection of network connection records, approximately 5 million entries.This dataset comprehensively includes both normal connections and 22 types of cyber-attacks, classified into four major categories.These attacks consist of DoS-based (back, LAND, ping of death, teardrop, Neptune, and smurf attacks), U2R (buffer overflow, load module, perl, and rootkit attacks), R2L (ftp-write, guesspassword, imap, multihop, PHF, spy, warezclient, and warezmaster attacks), and probe-based (port sweep, IP sweep, NMAP, and Satan attacks).Each network connection record is characterized by 42 features (Aggarwal and Sharma, 2015).We performed data preprocessing for this dataset following the procedures outlined in Section 3. Initially, we converted the categorical data into numeric values.Next, we normalized the entire dataset using the min-max normalization method to scale the data within a standardized range of 0 to 1.To enhance data quality, we identified and removed duplicate rows, NaN values, missing values, and columns containing only zero values.After conducting normalization and data quality enhancement procedures, the dataset consists of 494,020 rows and 42 features.

UNSW-NB15 dataset (Moustafa and Slay, 2015
).This dataset contains 9 unique attack types and 49 features.The attack categories consist of Analysis, Fuzzers, Backdoors, DoS, Exploits, Reconnaissance, Generic, Shellcode, and Worms.These attack types cover a wide range of cyber threats, enabling a thorough assessment of IDS.After preprocessing, we retained 642,566 rows and 45 features for further analysis.

CSE-CIC IDS2018 dataset (Sharafaldin et al., 2018).
The dataset records network traffic in a controlled lab environment, capturing both benign traffic and seven distinct cyberattack scenarios.The attacking infrastructure involves 50 machines, while the victim organization consists of 5 departments, comprising 420 machines and 50 servers.The dataset consists of captured network traffic and system logs from each machine (Sharafaldin et al., 2018).This dataset encompasses diverse attack scenarios, including DoS, DDoS, port scanning, and malicious code activities.To support ML algorithms, the dataset creators have specifically processed a version tailored for this purpose.This processed version is accessible as a set of CSV files, incorporating 80 features extracted from the captured traffic using CICFlowMeter-V3.This paper focuses specifically on segments of the dataset related to DDoS and benign traffic.The dataset contains information about seven types of DDoS attacks: Gold-enEye, Slowloris, Hulk, SlowHTTPTest, LOIC-HTTP, HOIC, LOIC-UDP, and benign network traffic.
We performed the preprocessing and discovered and removed duplicate rows in the dataset, eliminating 3,708,162 redundant entries.Additionally, we removed 17 columns, which comprised socket-related features and only zero values.After conducting normalization and data quality enhancement procedures, the dataset consists of 7,384,563 rows and 66 features.Figure 3 presents samples of the converted images from each class, ranging from Class C0 to C7. Class C0 represents benign traffic, while classes C1 to C7 represent different types of attack traffic.BIOS, LDAP, MSSQL, UDP, and UDPLag.To streamline the dataset for multi-class classification, we merged similar attacks based on their attack techniques, network behaviors, and naming conventions.For instance, different types of UDP-based attacks-DrDoS-UDP, UDP, UDP-lag, and UDPLag-were grouped due to their shared characteristic of overwhelming the target with excessive requests.This merging process, which aligns with existing practices in the literature (Akgun et al., 2022) simplifies the dataset without compromising the integrity of the attack patterns, thereby enhancing the manageability and training efficiency of models.

CIC-DDoS2019 dataset (Sharafaldin et al., 2019). The dataset offers comprehensive data on various
Consequently, the dataset now profiles 12 distinct attack types: TFTP, UDP, NTP, SSDP, SYN, MSSQL, SNMP, DNS, BENIGN, LDAP, NetBIOS, Portmap, and WebDDoS.Table 1 details the dataset's cardinality, while Figure 4 illustrates samples of the converted images from each class in these datasets.
During preprocessing, we removed 59,936,580 rows and 20 columns filled predominantly with zero values and socketrelated features, which lacked variability, reducing the dataset to 10 million rows and 66 columns.

Model Evaluation and Selection
The following evaluation metrics were applied in this study.

• Error (ERR). The proportion of incorrect classifications to total observations
• Accuracy (ACC).The percentage of exact predictions out of the total instances.
• Precision (PR).Also known as false negative rate (FNR), it is the ratio of correct positive predictions (TP) to the total positive predictions of the model.
• Recall (REC).Also known as detection rate (DR) or true positive rate (TPR), it is the percentage of correct positive predictions (TP) on the total of positive instances.

REC = T P T
• F-Score (FS).Also known as f1-score, it is the harmonic mean of the precision and recall metrics.It is especially useful when class distribution is imbalanced:

Results and Discussion
In this section, we evaluate the performance of our adaptive transfer learning approach across different datasets DL models, including CNN architectures, along with fine-tuning pre-trained models for DDoS attack detection.We thoroughly examine the results of the capabilities of both DL and transfer learning models in DDoS attack detection.The experiments were performed  using Google Colab Pro, with GPU enabled and RAM set to "high".For data preprocessing and experimentation, we used Python with libraries PIL, Dask, Pandas, Keras, and Sci-Kit Learn.We partitioned the dataset into three segments: 40% for training, 20% for validation, and the remaining 40% for testing.
As shown in Table 2, we define the search space for hyperparameter learning rates as [1e-3, 1e-4, 1e-5], to find the value that ensures efficient convergence without causing overshooting or slow convergence.For batch size, common values range from 16 to 512, and finding the optimal batch size can impact training speed and weight updates.We tune the dropout rate between 0.1 and 0.5 to prevent overfitting while preserving useful information.Moreover, we test different activation functions, ReLU, Sigmoid, or Tanh, to identify the one that allows the model to capture non-linear relationships.We adjust the number of hidden layers and neurons in each layer to find the optimal balance between model complexity and generalization ability.We considered layer configurations [32,64,128] or [64,128,256,512] and evaluated their impact on performance.We also considered L1 and L2 regularization techniques to find the best trade-off between reducing over-fitting and model performance.

Customized CNN and Pre-trained Models Transfer Learning Results
CNN customized model.We trained the custom CNN models using the Adam optimizer.The loss functions were categorical cross-entropy for multiclass classification and binary cross-entropy for binary classification.
Conv4 achieved an accuracy of 99.90%, Conv8 recorded 99.94%, and Conv18 reached 99.88% in identifying benign versus DDoS attack traffic within the CIC-DDoS2019 dataset.For multi-class classification of specific attack types, Conv4 and Conv8 demonstrated accuracies of 99.84% and 99.82%, re-

J o u r n a l P r e -p r o o f
Journal Pre-proof  spectively, on the CSE-CIC-IDS2018 dataset, while Conv18 achieved 97.61% on the UNSW-NB15 dataset.
To explore the transferability and adaptability of models trained on specific networks or datasets to new and diverse environments, we assess their performance by applying them to various target datasets.The target datasets used in this evaluation CSE-CIC-IDS2018, CIC-DDoS2019, KDDCup'99, and UNSW-NB15 enable a comprehensive assessment of the models' adaptability across diverse network environments.
Initially trained on the CIC-DDoS2019 dataset, the source model demonstrated robust adaptability across various target datasets.In binary classification tasks, the Conv18 model, transferred from CIC-DDoS2019 to the CSE-CIC-IDS2018 dataset, achieved an impressive 99.99% accuracy in distinguishing benign from DDoS network traffic.Refer to Table 4 for detailed results.
The proposed model exhibits a consistent adaptation across source to target dataset transfers, demonstrating minimal differences in binary classification performance.This underscores the model's robust adaptability across various datasets.Additionally, the model achieves better results compared to singledomain training.These findings explicitly confirm that our approach permits the achievement of greater accuracy relative to single-domain training.
In multiclass classification, the transfer of the Conv18 model from CIC-DDoS2019 to CSE-CIC-IDS2018 yielded a performance of 99.92%, while the reverse transfer achieved 93.62%, as detailed in Table 5. Comparing the present results to prior findings reveals a consistently high accuracy level of the models when transferred from CIC-DDoS2019 to other datasets, in both binary and multiclass tasks.These results suggest the model's effective adaptation to the target dataset's characteristics, particularly as dataset features increase.
In transferring a model from a dataset with fewer features and instances to a larger and more complex target dataset, we observed decreased accuracy values in specific attack type identification.For instance, Conv4 achieved a score of 83.42% when transferred from UNSW-NB15 to the CIC-DDoS2019 dataset.This can be attributed to significant dissimilarities in dataset characteristics, such as size and complexity, leading to challenges in the model's adaptation to diverse patterns.Conversely, when transferring a model trained on a larger and more complex dataset to a smaller and less complex target dataset, we observed improved accuracy.For instance, Conv18, when transferred from CIC-DDoS2019 to the KDDCup'99 dataset, demonstrated enhanced performance metrics.
The model's effectiveness largely arises from its robust capability to analyze and utilize feature patterns from the extensive source dataset.This capability enables it to adapt to the structurally simpler target dataset efficiently.Such flexibility demonstrates the model's capability to transfer knowledge effectively, especially from a well-labeled, larger dataset to a smaller one.This feature is precious for reducing the necessity of extensive data labeling while maintaining high accuracy in predictions on the target dataset.
Pre-trained models.In this experiment, we employed a transfer learning approach to leverage the capabilities of pre-trained Im-ageNet CNN architectures, specifically VGG16, VGG19, and ResNet50.The approach involved the transformation of network traffic data into image representations, a process visually illustrated in Figure 3.
For the CSE-CIC-IDS2018 dataset, a subset of 41,883 images were selected, which depicted characteristics of either benign or malicious traffic.We then extended our analysis to distinguish between multiple types of DDoS attacks in addition to benign traffic.This required a more comprehensive set of images to adequately represent each class, resulting in the use   4. In addition, we used 12,154 images from the KDDCup'99 dataset and 5,629 images from the UNSW-NB15 dataset.In our experiment, we tailored pre-trained models for binary and multiclass DDoS attack detection.Additionally, as part of our comprehensive model optimization, we applied major hyperparameter adjustments across all models, including the frozen layer ranges in our framework.
The results presented in Table 6 demonstrate the binary classification efficacy of the VGG16, VGG19, and ResNet50 models in differentiating between benign and DDoS attack traffic.Notably, the VGG19 model achieves a score of 100% in accuracy, recall, precision, and F1-score on the CSE-CIC-IDS2018 dataset.Within the CIC-DDoS2019 dataset, VGG16, VGG19, and ResNet50 all demonstrate high accuracy, with scores of 99.99%, 99.99%, and 99.94%, respectively.For the KDD-Cup'99 and UNSW-NB15 datasets, VGG19 outperforms the others, achieving accuracy rates of 99.90% and 98.64%.These findings highlight VGG19's superior binary classification capabilities, especially in precisely identifying benign versus DDoS network attack traffic in various network scenarios.
The performance metrics detailed in Table 7 present a comprehensive evaluation of adaptive pre-trained models, including VGG16, VGG19, and ResNet50, applied to multi-class classification tasks across diverse datasets.Notably, VGG19 outperforms other models in multi-class classification efficiency.On the CSE-CIC-IDS2018 dataset, VGG19 achieves an accuracy of 99.97%.In the CIC-DDoS2019 dataset, it leads with an accuracy of 92.65%.For the KDDCup'99 dataset, VGG19 excels with 98.99% accuracy, slightly ahead of ResNet50, which scores 98.94%.Similarly, on the UNSW-NB15 dataset, VGG19 maintains strong performance, achieving an accuracy of 97.59%.These outcomes underscore the adaptability and superior effectiveness of VGG19 in handling multi-class classification challenges.
The VGG19 model consistently outperforms others across a range of datasets, demonstrating its adaptability in capturing complex patterns effectively.We found that VGG19's relatively simpler and shallower architecture is particularly effective in capturing essential textural features from the image-formatted data.Its use of uniformly small filter sizes might allow it to efficiently identify crucial, surface-level discriminative features.Although ResNet50 shows good performance, especially in the KDDCup'99 and UNSW-NB15 datasets, they also require more extensive training data to achieve optimal performance.
In binary classification tasks, transferred pre-trained models VGG19 using the CSE-CIC-IDS2018 dataset have scored higher accuracy results than traditional DL models.However, in multiclass classification, transferred custom CNN models, such as Conv18, demonstrate a distinct advantage.Moreover, the impact of transfer learning on model performance is particularly notable in the domains of IDS and DDoS attack detection.
To evaluate the efficacy of our proposed model, we conducted a thorough comparison with state-of-the-art DL and transfer learning models, across similar datasets.Our proposed Conv18 model achieved 99.92% accuracy in network attack identification, compared to a VGG-16 IDS that reached a 98.8% accuracy on the CSE-CIC-IDS2018 dataset, as reported by (Okey et al., 2023).Additionally, the pre-trained VGG19 model exhibited 100% accuracy in distinguishing benign from DDoS network traffic in the CSE-CIC-IDS2018 dataset.The models presented in (Agostinello et al., 2023) and (Chartuni and Márquez, 2021) achieved accuracy rates of 77.29% and 81.77%, respectively, on the CIC-DDoS2019 dataset for attack type classification.In contrast, our model surpassed these results, achieving an accuracy of 93.62%.Additionally, in (Wu et al., 2019), the TL-ConvNet model for the KDDCup'99 dataset demonstrated an accuracy of 93.86%, while our adaptive pretrained VGG19 model achieved a significantly higher accuracy of 98.99%.Furthermore, the Deep Belief Network (DBN) model by (Almogren, 2020)

Conclusions
DDoS attacks pose significant challenges to organizations worldwide, with their disruptive impact on network infrastructure availability and integrity.Building attack detection systems based on DL holds the promise of achieving high accuracy in detecting attack patterns in network traffic data.However, a major difficulty in developing DL-based IDS is the scarcity of large, labeled datasets that accurately represent today's network environments.In this paper, we proposed an adaptive transfer learning framework with fine-tuning and hyperparameter optimization.We employed custom CNN models (Conv4, Conv8, and Conv18), along with pretrained models (VGG16, VGG19, and ResNet50), trained on cybersecurity benchmark datasets, including KDDCup'99, UNSW-NB15, CSE-CIC-IDS2018, and CIC-DDoS2019.
Our experiments compared the performance of models trained with and without transfer learning in network traffic classification.The pre-trained VGG19 model excelled in binary classification, effectively separating benign from malicious network traffic.Our custom-transferred Conv18 model achieved better accuracy, precision, recall, and F1-measure in detecting attack types, particularly in multi-label classification scenarios.Comparison of the current results with prior findings reveals a consistently high accuracy level of the models when transferred from the CIC-DDoS2019 dataset to others, in both binary and multiclass tasks.These results suggest the models' effective adaptation to the characteristics of the target dataset, especially as the number of dataset features increases.This shows that transfer learning proves to be a valuable approach to enhancing DDoS attack detection, even with limited labeled data.
Future work will enhance the practicality and robustness of DL and transfer learning models by prioritizing diverse dataset evaluation, defense against adversarial attacks, real-time implementation, and scalability.

Figure 1 :
Figure 1: A comprehensive methodology framework for robust transfer learning DDoS attack detection, encompassing A) data preprocessing, B) CNN model, C) transfer learning and fine-tuning, D) hyperparameter optimization, and E) model evaluation and selection.

Figure 2 :
Figure 2: Overview of custom 1D CNN models: (A) Conv4, (B) Conv8, and (C) Conv18.These models feature ReLU activation in their internal layers to introduce non-linearity and use either Softmax (for multi-class classification) or Sigmoid (for binary classification) in the output layer.Designed to manage varying complexities, the models range from 4 to 18 convolutional layers, optimizing them for efficient feature extraction from a 1D input dataset.Tailored for both simple and complex DDoS attack classifications, the models are defined by the kernel (K), stride (S), and feature map size (F) and have undergone hyperparameter optimization.

Table 2 :
Defined hyperparameters and search space ranges for hyperparameter tuning strategies

Table 3 :
Accuracy results of transfer learning source model for binary and multiclass classification

Table 4 :
Accuracy evaluation of models in detecting benign to attack traffic detection tasks transferred to various target datasets.

Table 5 :
Accuracy evaluation of models in multiclass attack detection tasks transferred to various target datasets.

Table 6 :
Performance metrics on various datasets and VGG16, VGG19, and ResNet50 pre-trained models for binary classification

Table 7 :
Performance metrics on various datasets and VGG16, VGG19, and ResNet50 pre-trained models for multi-class classification achieved a 96.34% accuracy for the UNSW-NB15 dataset, with our Conv18 model achieving 99.84% accuracy in detecting specific attack types.As detailed in Table8, these findings indicate that our model performs well in comparison to existing approaches, particularly in DDoS attack detection and specific attack types identification, demonstrating the effectiveness of our employed adaptive transfer learning techniques.

Table 8 :
Comparison of proposed approach metrics with state-of-the-art methods by dataset and class variants