A Modified PointNet-Based DDoS Attack Classification and Segmentation in Blockchain

.


Introduction
A mesh topology is used to build the peer-to-peer (P2P) network structure that makes up the blockchain [1]. In the context of a blockchain network, each node can simultaneously serve as a client and a server. The blockchain network can offer consumers more dependable, high-quality, and secure services while also more effectively distributing network traffic and offering them a wider range of service options [2]. Since every node on the blockchain network can send and receive data, traffic won't be centralized, as it is in client/server networks, but distributed over all of the network nodes instead [3]. Due to the distributed network architecture used by the blockchain system, several connection points may make network nodes more vulnerable [4]. Additionally, because the blockchain system offers a public database, attackers can quickly get all system-related data to attack the system [5].
One of the most significant risks to blockchain security is the DDoS attack [6], which is a network attack in which attackers attempt to overwhelm a network or server with traffic from various sources to disrupt service [7]. DDoS attacks have different characteristics in the blockchain ecosystem than they do in a typical network context [8,9]. In contrast to a DDoS attack in a traditional network environment, the attacker can use numerous attacked nodes to start the attack, overwhelming the target node's inbound connection and ultimately causing the network to go down [10]. Additionally, attackers may use network congestion as a means of interfering with the network's regular operations and lowering the availability and dependability of nodes [11].
DDoS attacks frequently exhibit concurrency in the blockchain ecosystem, where several and various attack types may manifest simultaneously [12]. Blockchain technology is a decentralized system in which every node is equal and can participate in the consensus process of the network. However, this also means that any node can be targeted by attackers, who can exploit their vulnerabilities to compromise the entire network [13]. Therefore, it is crucial to protect blockchain applications from DDoS attacks [14]. Studying DDoS attacks can help blockchain application developers and security experts develop more effective defense strategies, such as mitigating DDoS attacks with distributed defense mechanisms, or monitoring and identification techniques to detect and respond to DDoS attacks early [15,16]. In addition, studying DDoS attacks can also help blockchain application developers and security experts better understand network protocols and system vulnerabilities and take steps to mitigate them [17,18]. Different forms of attacks, including LDAP (Lightweight Directory Access Protocol), MSSQL (Microsoft SQL Server), NetBIOS (NetBIOS Services Protocols), Portmap, Syn (Synchronize Sequence Numbers), UDP (User Datagram Protocol), UDPLag, etc., cannot be properly detected by current DDoS attack detection techniques. Therefore, it is very important to detect DDoS attacks in a blockchain system, where a blockchain node may face multiple data streams at the same time. It is necessary to carry out parallel and coordinated detection of the data flows faced, and the detection mainly involves two aspects: the first is to detect whether there is a classification of DDoS attacks in multiple data flows at the same time, and the second aspect is to segment different data entries into DDoS attack types when DDoS attacks are detected. We construct a targeted dataset for the above problems, simulating multiple data streams that may occur at the same time, and detecting DDOS attacks based on M-PointNet networks.
Many scholars have studied and analyzed DDoS attacks in blockchain and given different ways to solve this problem. Artificial intelligence algorithms have developed into one of the workable options for identifying DDoS attacks [19,20]. The proposed framework is characterized by a high accuracy rate in detecting emerging DDoS attacks and its lightweight algorithm [21]. In [22], authors combined and took advantage of both machine learning algorithms and the Bloom filters. Kasim detected DDoS assaults with AE and SVM. The approach was 99.41% accurate on CICIDS(Canadian Institute for Cybersecurity Intrusion Detection Systems) and 99.5% on NSL-KDD [23]. Gopal and Virender introduced voting extreme machine learning (ELM) (V-ELM) to detect DDoS attacks in cloud computing [24], And it achieved 99.18% with the NSL-KDD dataset and 92.11% with the ISCX dataset. In [25], it is suggested to examine cloud provider income packets to detect and avoid DDoS TCP flood attacks. As for the datasets, [26] is the first to use the CICDDoS-2019 dataset, which contains 12 attack types. They used multiple denoising, tensor decomposition, and classifiers to detect assault and reported binary classification accuracy >99% for various denoising algorithms. These studies do not address multi-class classification, which security professionals need to detect DDoS attack types. Aamir et al. developed a clustering-machine learning method employing network flow traffic data as feature vectors. Their technique was 96.66% accurate on their dataset and 82% on CICIDS-2017 [27]. Kachavimath et al. extracted 8 features from 41 in the DSL-KDD dataset using co-relation-based feature selection. KNN had 98.51% accuracy and Naive Bayes 91.31% [28]. In [29], The captured traffic is processed to fetch its various features, and machine learning is applied for classification that can distinguish the attack traffic from the regular traffic. The results mentioned in the text are organized as follows in Table 1: In this paper, we investigated a range of typical and aberrant patterns using deep learning in this work using the CIC-DDoS 2019 dataset. First, we processed and screened data using statistical techniques, and we utilized the traditional decision tree method to screen features. Then, we ultimately categorize whether there is a DDoS attack and what kind of attack type it is, using the modified Point-Net network. This study offers a parallel, accurate, and effective detection approach for the blockchain environment, which can contribute to the security assurance of the contemporary blockchain system, as demonstrated by trials.

Blockchain
As a distributed recording system, blockchain technology enables transactions to be validated and documented without the need for a single administrator. To ensure that they are securely shared among several computers, it is made up of blocks, which are collections of transaction records. Although it was initially created for bitcoin, blockchain technology is now widely used in a wide range of industries, including financial services, logistics and supply chain management, public administration, and identity and access management [30].
Typically, a blockchain system consists of the following parts: (1) Block: To avoid tampering, a block is a collection of transaction data that have been encrypted and hashed together. (2) A blockchain is an ordered list of blocks, where each block has a hash value that points to the block before it. (3) Node: A blockchain network's nodes are computers. New transaction records can be accepted, verified, and stored by them. (4) Consensus mechanism: A new block's acceptance into the blockchain is decided by using the consensus process. Nodes submit new blocks by using a proof-of-work mechanism, which requires them to solve a challenging computational issue. This stops rogue nodes from sending copious amounts of useless data or tampering with the transaction records already in place. As it guarantees the security and dependability of the blockchain network, the consensus process is a crucial part of blockchain technology. There are several distinct consensus processes used by blockchain networks, including proof-of-work, proof-of-stake, and consensus rotation.
Blockchain technology is perfect for a wide range of applications, including financial services, logistics and supply chain management, government and public services, identity identification, and access control due to its decentralization and high level of security. Blockchain technology is still evolving, but it has already gained widespread adoption and is anticipated to have a bigger impact in the future.
As shown in Fig. 1, we assume that this blockchain network consists of several nodes, which can be divided into two categories: normal nodes and nodes attacked by DDoS. Normal nodes are nodes that are running normally. They are responsible for receiving and processing transaction requests from other nodes and can update the state of the blockchain. Nodes attacked by DDoS attacks may not work properly, and may not be able to receive and process transaction requests from other nodes. In this blockchain scenario, normal nodes and nodes attacked by DDoS may exist in the same network, and both can communicate with other nodes. However, since nodes attacked by DDoS may not work properly, they may not be able to process transaction requests from other nodes on time. This may lead to a reduction in the efficiency of the entire network and may affect the stability of the blockchain.

CIC-DDoS2019 Dataset
A collection of datasets for detecting DDoS attacks is called the CIC-DDoS2019 dataset. Northwestern University and the Canadian Telecommunications Research Institute collaborated to build it. It can be used to train and test deep learning models to detect DDOS assaults because it comprises legitimate and malicious traffic data from actual networks. First, with 150 million packets in total, the CIC-DDoS2019 dataset is incredibly enormous. The massive amount of data needs to learn complicated patterns, which makes it perfect for deep learning model training. Second, in addition to regular traffic from numerous protocols, the CIC-DDoS2019 dataset also includes five different kinds of DDoS attacks. Due to its high diversity, it can more accurately represent actual situations. Accordingly, the model developed utilizing the CIC-DDoS2019 dataset might be more versatile and more equipped to identify various DDoS attacks. The CIC-DDoS2019 dataset also contains some extra data, such as the attack's length and target. Researchers can use this information to describe attacks and evaluate their effects. Additionally, by using this data to train models, it is possible to identify malicious traffic. Additionally, because the CIC-DDoS2019 dataset contains real data, the trained model may be more applicable to a wider range of real-world scenarios.

DDoS Attack Categories
In this study, we used the CIC-DDoS2019 dataset to undertake deep learning-based research on a variety of normal and pathological patterns, including BENIGN, LDAP, MSSQL, NetBIOS, Portmap, Syn, UDP, and UDPLag [31]. The first screening and processing of the feature quantity are followed by the selection of the 14 most crucial features using the traditional decision tree scheme, and then the PointNet network is utilized to categorize whether it is DDoS and what type of attack it is.
We primarily identified the following seven types of assaults in this paper.
(1) LDAP: For requesting and changing directory information within a computer network, LDAP is a widely used network protocol. The term 'LDAP type attack' describes a DDoS attack in which the attacker floods the target network's server with numerous forwarded LDAP queries, preventing it from responding to legitimate requests and achieving the denial of service goal. (2) MSSQL: Microsoft's MSSQL is a database management system that controls databases using the SQL language. In a DDoS assault, an 'MSSQL type attack' is when the attacker floods the target network's server with a lot of forged MSSQL requests, preventing it from responding to legitimate requests and achieving the denial of service goal. (3) NetBIOS: The local area network protocol known as NetBIOS allows machines on the network to exchange files and printers. In DDoS assaults, NetBIOS attacks are when the attackers flood the target network's server with a high number of fraudulent NetBIOS requests, preventing it from responding to legitimate requests and achieving the denial of service goal. (4) Portmap: A application called Portmap is used in Linux systems to manage the mapping between ports and services. In DDoS assaults, the term "Portmap type attack" describes how the attacker floods the target network's server with a high number of fraudulent Portmap requests, preventing it from responding to regular requests and achieving the denial of service goal. (5) Syn: A data packet called a Syn is used to start a TCP connection. In typical TCP communication, the client sends a Syn packet to initiate a connection request, the server responds with a Syn-ACK packet to confirm the connection, and the client then sends a second ACK message to reiterate connection confirmation. In contrast, a Syn attack involves the attacker forging a lot of Syn packets, which forces the server to deal with an excessive amount of connection requests and serves the denial-of-service goal.
(6) UDP: UDP is a connectionless transport layer protocol that allows for the transmission of data over a network, but it does not provide secure data delivery. The goal of a UDP-type attack is to deny service by forcing the server to process an excessive amount of data packets by forging a large number of UDP data packets. (7) UDPLag: UDPLag-type attacks can place a heavier demand on the server than UDP-type assaults because they aim to increase the server's network connections. In a UDPLag-type attack, the attacker will create a large number of forged UDP packets, each of which will contain randomly generated source and destination port numbers. This will force the server to create a lot of network connections to process the packets, which will serve the attacker's goal of denial of service.

Method Structure
In order to realize the classification and segmentation of DDoS attacks for blockchain systems described above, we propose the following algorithm processing flow in this section, and the main method structure is described as follows: (1) Data Pre-process: preliminary processing of the original CIC-DDoS2019 dataset, random samples and preliminary screening of features. to the test of DDoS attack detection.

Data Pre-Process
To compare each attack mode fairly in this section, we first create a dataset. To achieve this, we randomly select a subset of DDoS attack entries from the original CIC-DDoS2019 dataset. In the dataset we created, the percentages of entries with and without DDoS attacks are both 50%, as are the percentages of entries using the seven distinct attack tactics.
As shown in Fig. 2, 10,500 pieces of data with BENIGN tags and 1,500 pieces with each of the following labels: LDAP, MSSQL, NetBIOS, Portmap, Syn, UDP, and UDPLag.
Except for the tag column, there are 87 feature quantities in the read CIC-DDoS2019 dataset. The first thing we noticed was that 12 feature amounts, such as Bwd PSH Flags, Fwd URG Flags, and PSH Flag Count, were constant throughout. We will remove the DDoS detection because it is ineffective and leave 75 remaining functionalities. Then, we exclude several feature quantities, including six, such as Flow ID, Source IP, Destination IP, Timestamp, SimilarHTTP, Unnamed:0, etc., that do not have the usual statistical relevance, leaving a total of 69 feature quantities. The correlation between the aforementioned remaining feature quantities was then counted; Fig. 3 displays the correlation. A few feature characteristics that we found to be strongly associated include the Total Length of Bwd Packets and Fwd IAT greater than 0.9. A total of 39 feature quantities remain after the deletion of 30 feature quantities, including Total, Bwd IAT Total, etc. There are now just 29 feature quantities left after the removal of the features Fwd PSH Flags, Syn Flag Count, CWE Flag Count, Active Mean, Active Std, Active Max, Active Min, and Idle Std with additional 0 values.

Feature Screening
The features are then further screened using a straightforward decision tree method. A decision tree is a type of tree structure used to categorize and forecast data [32]. To divide the dataset into subsets and partition the features, decision trees require training data. Each leaf node represents a category, each branch reflects the value of each internal node's representation of a feature.
The fundamental ideas and procedures in decision tree classification [33]: (1) Data preparation: You must first gather training datasets, which include input features and associated output categories. (2) Choose the best feature: To choose the best feature for division, consider information gain or the Gini index. The Gini index is chosen as the index in the training for this article. The Gini index of the k category distribution is as follows, the larger the Gini index, the greater the uncertainty of the sample. K represents the value that has the most possible variety of cases in a discrete probability distribution.
(3) Create a decision tree: Create a decision tree using the features that have been chosen. The dataset is split into two subsets-one for each feature and one without-and a decision tree is recursively constructed for each subset. (4) Decision tree pruning: To avoid overfitting, the decision tree is pruned once it has been built. (5) Decision tree classification: input data samples and categorize using a decision tree. Beginning at the root node, it proceeds to the leaf node by searching through each layer by the feature value of the input sample. The classification outcome of the input sample is the category that corresponds to the leaf node.
We perform training based on the fundamental decision tree using the aforementioned standardized dataset, and the training results are displayed in Fig. 4 as a result. The interior nodes of the tree structure's characteristics are among its most significant elements. We group them according to importance, and the total includes the following features: Source Port, Destination Port, Protocol, Total Forward Packets, Total Backward Packets, Forward Packet Length, Backward Packet Length Max, Flow Bytes/s, Flow IAT Mean, Flow IAT Std, Max Packet Length, ACK Flag Count, URG Flag Count, Inbound, and another 14 features.

Dataset Generating
In the subsequent training, we filter the data for the crucial features Source Port and Destination Port, keeping just 10 of the port numbers that are commonly used while setting the values of the remaining port numbers to −1. Then, to create a set of standardized data, we normalize each feature quantity by taking its value and subtracting it from its mean value, and dividing it by its standard deviation.
Here is how we simulate the DDoS detection procedure. Since many feature entries (one row of data) can be gathered concurrently, the deep learning-based detector design has the following two issues: Detecting the current multiple is one. The first is to determine whether a DDoS attack exists in the entry, which is a classification issue, and the second is to categorize the various DDoS attack kinds individually when determining whether there are DDoS attacks in the current numerous entries, which is a division of DDoS attack types. The next dataset, which may be utilized for real-time DDoS detection, was created using random extraction based on the 14 significant features that were screened above, as shown in Table 2. No DDoS attack category is represented by category '0', while a DDoS attack category is represented by category '1'. Fig. 5 displays the data value and label value of random samples. Each element in each sample can be shown to have a one-to-one correlation with its label value, and the numbers 1 through 8 stand for the BENIGN, LDAP, MSSQL, NetBIOS, Portmap, Syn, UDP, and UDPLag types, respectively. It is noted that in samples with DDoS attacks, the number and types of DDoS attacks that may occur simultaneously may be different.

M-PointNet-Based Modeling
A deep learning architecture called PointNet is used to process point cloud data. It uses a multilayer perceptron to learn point cloud features and has a symmetric architecture to process point clouds quickly (MLP). Therefore, DDoS attacks can be detected using the PointNet network [34]. To discover probable attack patterns and efficiently learn network traffic features, the PointNet network can analyze network traffic to detect DDoS attacks. It can be trained to distinguish between legitimate and malicious traffic and can classify fresh incoming traffic. PointNet processes point cloud data more efficiently than other computers because it uses global shared feature extraction technology, which can handle a variety of point clouds and is not order-sensitive. The quantity of point clouds frequently places restrictions on learning methods [35].
We start by enhancing the PointNet network in this part. Processing the 3D point cloud makes use of the standard PointNet network, which has three channels in all. We suggest a 14-channel PointNet to address the issue of DDoS detection, and Fig. 6 depicts the designed PointNet's network layout.

Figure 6: A layout of M-PointNet proposed in this work
The modified PointNet network structure is as follows: The network accepts a data sample with several entries, each point having 14 feature quantities, in its input layer. The global branch and local branch of the network then translate low-dimensional features to high-dimensional space through the feature dimension enhancement operation, making it simpler for the network to learn complicated feature representations and enhancing classification and segmentation accuracy. In the global branch, global features are accomplished by mapping low-dimensional features into a high-dimensional space using a fully-connected layer. In the local branch, local features are accomplished by mapping lowdimensional characteristics onto a high-dimensional space using a point convolutional layer. The global features obtained through max-pool have a total of 2048 elements and are used in the detection of the classification problem of whether there is a DDoS attack. In the local branch, we concatenate the 64-dimensional high-dimensional features obtained by feature transformation with the 2048dimensional global features to obtain a 2062-dimensional feature tensor. Therefore, the segmentation result of the DDoS attack type can be obtained after processing by the fully connected layer.
The following is the point convolution computation process: Let the input point cloud be X ∈ R N×C , which N represents the number of points in the point cloud, and C represents the number of features of each point. The output of the point convolution layer is Y ∈ R N×D , which D represents the number of features output. The parameters of the point convolution layer are W ∈ R C×D . The calculation process of the point convolution layer is as Y = XW .
Fewer parameters are needed to extract the point cloud's global features in the classification and segmentation networks as a result of the dimensionality reduction mapping process, which converts high-dimensional point cloud data to low-dimensional space. This enhances the networks' capacity for generalization. This layer concatenates the outputs of the global branch and the local branch at the output layer of the network and uses a fully connected layer to transfer them to the output space. The output layer typically uses a softmax activation function for classification tasks and a sigmoid activation function for segmentation tasks. Let the input be, where is the number of classification categories.
The softmax function is calculated in the manner described below [36]: The sigmoid function is calculated in the manner described below: The sigmoid function produces a number between 0 and 1, which can be used to express the likelihood that a sample belongs to a particular class.
There are two processes involved in DDoS detection. The first step is to identify and categorize any DDoS attacks. When there is no DDoS attack, we use the label 0; when there is a DDoS assault, we use the label 1, and we use the cross-entropy loss function as the optimizer: Second, we select the quadratic square error loss function (MSE loss function) as the loss function for the type segmentation task under DDoS: Adam is chosen as the optimizer to iteratively change the network's backpropagation parameters throughout the training phase [37].

Simulation and Analysis
In this section, using the enhanced PointNet network previously suggested, we simulate the DDoS assault classification and segmentation method. We train and deploy the model according to the following process: (1) Build a dataset of DDoS attacks in the blockchain system shown in Fig. 5 and perform data preprocessing.
(2) Define the PointNet network shown in Fig. 6, using the cross-entropy loss function as the loss function for classification and segmentation tasks. (3) Set the optimizer and learning rate, iteratively train the model, and save the trained model after the model converges, that is, a classification model and a segmentation model. (4) Deploy and test the model, and extract test samples from the constructed test dataset. (5) Load the previously trained classification model and segmentation model. (6) Input the test data into the model to obtain the output of the model. (7) For the classification task, the detection result of whether the blockchain node is suffering from DDoS attack is obtained according to the probability value; For the segmentation task, the type of DDoS attack that the blockchain node is suffering from is obtained based on the probability value.
There are 10,000 samples without DDoS attack categories and 10,000 samples with DDoS attack categories in the created dataset. The total proportion of LDAP, MSSQL, NetBIOS, Portmap, Syn, UDP, and UDPLag among the entries with DDoS attack types is 50%, and the number of entries in each sample is uniformly distributed between 4 and 20. To conduct pertinent simulation analysis, we divide the dataset into training, verification, and test sets, with proportions of 70%, 20%, and 10%, respectively. Using an Intel Core i9-13900K @3.00 GHz, 64 GB RAM, and Nvidia GeForce RTX 4090 24G device, the simulations in this section were run. The optimizer selected during the training process is Adam, the initial learning rate is 0.001, and the learning rate attenuation is set to 0.5 every 10 steps, the number of parallel items detected simultaneously is set to 30, the total training algebra is 20, and the batchsize is set to 1024.
First, we simulate the system's classification performance, or its propensity to correctly determine whether a DDoS attack is occurring or not. As stated in the previous section, we constructed an upgraded PointNet network. We also set the maximum number of entries to 30 and the size of the batch training (batch size) to 1024. Fig. 7 displays the loss function curves on the training set and validation set during the training procedure. As can be shown, the proposed classification PointNet network converges after 11 training rounds and obtains a reduced loss function value, which is 0.007744 on the training set and 0.003497 on the validation set, respectively.   8 provides the accuracy rates for the training set, verification set, and test set. Indicating that the proposed DDoS attack detector based on the improved PointNet network has high performance and can Complete the task, it can be seen that with network training, the accuracy rate of the DDoS detection and classification task presents an upward trend, and the final detection accuracy rate It has reached a higher level, namely the training set accuracy rate of 99.71%, the verification set accuracy rate of 99.90%, and the test set accuracy rate of 99.65%. The improved PointNet segmentation network suggested in the previous part is adopted and trained as we simulate the segmentation task of various forms of DDoS attack detection in the section below. For training on DDoS-affected datasets, we similarly set the batch training size (batchsize) to 1024 and the maximum number of items that can be entered to 30. The convergence curve during the 20 generations training process is shown in Fig. 9. The final loss function values for the training set and verification sets are 0.358323 and 0.366697, respectively, showing that the M-PointNet network has reached convergence. The accuracy rate curve for the DDoS-type segmentation task is shown in Fig. 10 concurrently. The accuracy rates on the training set and verification sets can be observed to gradually increase as the neural network is trained, and the final obtained accuracy rate can be shown to increase as well. It can do superior DDoS-type segmentation tasks based on the accuracy rates of the training set, validation set, and test set, which are 87.38%, 85.44%, and 85.47%, respectively. As shown in Table 2, for the classification performance of DDoS attack detection, the proposed M-PointNet algorithm has a higher classification accuracy, better than LightGBM [38], SVM [39], NB [40]. In addition, in the face of simultaneous and different numbers of DDoS attack feature items, M-PointNet supports the segmentation of each item, and has acceptable results. Table 3 lists the characteristics of these detection methods. It can be seen that M-PointNet is a parallel and accurate detection method when compared with the other listed methods.    [38] 99.56% Not support It has a low computational cost for higher accuracy.
(Continued)  [39] 99.41% Not support It is capable of executing tasks such as anomaly-based intrusion detection in real-time. Naïve Bayes (NB) [40] 95.14% Not support It is simple and quick to forecast the test datasets' class.

Conclusion
In this study, we design and implement the M-PointNet network for the classification and segmentation of the DDoS attack. Based on the CIC-DDoS2019 dataset, we construct a dataset with several DDoS attack types for network's training, validating, and testing. Our model obtains 99.65% accuracy on the test set when it comes to classifying DDoS attacks, showing that it can accurately separate attack traffic from regular traffic. Our model obtains 85.47% accuracy on the test set in the segmentation task, which indicates its ability to recognize various DDoS attack types. Comparing with methods LightGBM, SVM and Naïve Bayes, we find that our proposed method slightly improves the detection performance of DDoS attacks. In addition, the proposed method supports parallel detection, that is, segmentation of multiple DDoS attack items, which is not available in traditional methods. In conclusion, this study demonstrates that DDoS attacks may be efficiently classified and segmented using the proposed M-PointNet, which provides an approach for network security. Future research will concentrate on enhancing the model's segmentation accuracy and applying it to distributed network environments.