Few-Shot Learning for Discovering Anomalous Behaviors in Edge Networks

: Intrusion Detection Systems (IDSs) have a great interest these days to discover complex attack events and protect the critical infrastructures of the Internet of Things (IoT) networks. Existing IDSs based on shallow and deep network architectures demand high computational resources and high volumes of data to establish an adaptive detection engine that discovers new families of attacks from the edge of IoT networks. However, attackers exploit network gateways at the edge using new attacking scenarios (i.e., zero-day attacks), such as ransomware and Distributed Denial of Service (DDoS) attacks. This paper proposes new IDS based on Few-Shot Deep Learning, named CNN-IDS, which can automatically identify zero-day attacks from the edge of a network and protect its IoT systems. The proposed system comprises two-methodological stages: 1) a filtered Information Gain method is to select the most useful features from network data, and 2) one-dimensional Convolutional Neural Network (CNN) algorithm is to recognize new attack types from a network’s edge. The proposed model is trained and validated using two datasets of the UNSW-NB15 and Bot-IoT. The experimental results showed that it enhances about a 3% detection rate and around a 3%–4% false-positive rate with the UNSW-NB15 dataset and about an 8% detection rate using the BoT-IoT dataset.

There are two popular forms of IDSs-based deployment: Host-based IDS (HIDS) and Network-based IDS (NIDS) [18]. One the one hand, a HIDS monitors system activities of hosts, for example, system configuration, application activity, system logs, application processes, and file access [19]. On the other hand, a NIDS monitors network activity and analyzes the collected information to identify suspicious events from network traffic [20]. The NIDS consumes low computational processing less than the HIDS and has a quicker response because it does not require maintaining for the sensor programming at the host level [1]. There are three detection methods in IDSs: 1) anomaly-based detection, 2) misuse-based detection, and 3) a hybrid of both. An anomaly detection method designs a standard profile and discovers outliers as anomalies [21]. A misuse-based detection method depends on well-known signatures and matches them against a blacklist of suspicious events. A misuse-based IDS cannot discover new attack types while An anomaly-based IDS can detect them, along with a false alarm rate if small variations of normal and abnormal patterns have been identified [22,23].
IDSs have been designed based on machine and deep learning algorithms to recognize cyber threats [1,5]. Deep learning algorithms have proven their capability in different applications, such as computer vision and malware detection [5]. Deep learning can be categorized on its architecture into generative and discriminative. The classes of generative architecture are Recurrent Neural Network (RNN), Deep Auto Encoder, Deep Boltzmann Machine (DBM), and Deep Belief Networks (DBN) [1]. Auto-encoder consists of two symmetrical components, which are an encoder and a decoder. The encoder works to extract the features from the raw data. The decoder reshapes the data from the features that extract using the encoder. DBM consists of arbitrary units for the whole network for getting or producing binary results. DBN has multiple layers that have a connection between them, not between units. Discriminative architecture has two types, which are recurrent neural network and convolutional neural network. RNN is used in sequential data, and in most cases, it is used for natural language processing [16]. This work focuses on CNN as it includes multiple layers that can classify small variations of data features of various class labels, such as legitimate and normal behaviors.

Internet of Things (IoT)
The Internet of Things (IoT) offers connecting devices and applications to the Internet to sense and monitor systems [4,16]. IoT is defined as the seamless connection of the information network and physical objects, named 'smart objects,' with these objects being active users in business processes, being accessed through network services, along with considering security and privacy in mind [24]. At the end of the 20 th century, the Internet started to spread through web services. It was imaginable that objects like a pen or book that would automatically work itself and write directly. The development of IoT spreads worldwide through mobile devices, laptops, and workstations [3]. The creation of new IoT products would minimize the computer and new approaches linked with wireless networks [25]. Nowadays, IoT sensors connect to the Internet, such as devices that carry IP cameras. The IoT devices usually are not expensive and easy to deploy in IoT networks, such as the deployment of temperature and light bulb sensors [26].
Research studies have emphasized that security in the IoT concentrates on attack detection, authorization, authentication, and access control [26,27]. Many aspects affect the change of the traffic pattern while recognizing abnormal behaviors from IoT networks. It is vital to consider various aspects while developing IDS techniques for IoT networks at the edge, such as inspecting network protocols [28], determining application services [29], and identifying abnormal patterns at the edge [30]. Existing IDSs have led to evolve and improve deep learning, statistical learning, and machine learning systems to classify massive data by analyzing the threats of IoT networks [11,31].

Related Work
Several IDSs have been proposed in the literature to identify cyber-attacks from network systems. For instance, Sadek et al. [32] proposed a new hybrid IDS approach using an indicator variable-enabled rough set technique for feature reduction and neural networks for classification. The empirical results revealed that the hybrid approach could achieve a 96.7% accuracy and a 3% false alarm rate using the NSL-KDD dataset, with lower computational resources than other compelling IDSs. The authors in [33] suggested a hybrid IDS based on the triangle area based nearest neighbors (TANN). The k-mean algorithm was used to cluster centers of attack classes, and KNN was used for classifying attack events. This experiment showed high accuracy and a low false alarm rate on the KDD-Cup 99 dataset.
Moustafa et al. [5] proposed a new approach called (ODM-ADS) that detects attacks, where a new profile was designed to model normal events and detect attacks differently based on an outlier function. This approach would be deployed at IoT and cloud and fog computing, and it accomplished high performances compared with other techniques using the NSL-KDD dataset and UNSW-NB15 datasets. Essam et al. [34] proposed a hybrid algorithm based on correlation feature selection and information gain to reduce the number of features. This research applied to the NSL-KDD dataset; the reduced dataset was validated by a naive Bayes classifier using the adaptive boosting technique. A study by Alom et al. [35] used DBN to perform an intrusion detection system for detecting unknown attacks. Karimi et al. [36] developed a feature selection technique using information gain and symmetric uncertainty model to select the relevant features and naïve Bayes for classifying attacks. The outputs showed that the proposed techniques performed more than machine learning-based IDSs.
Tang et al. [37] developed an intrusion detection model using a deep forward network that contains three hidden layers. The model used the best six features selected from the NSL-KDD dataset. Ling et al. [38] applied a convolution neural network technique for IDS that detect attacks. Niyaz et al. [39] used the auto-encoder to get feature representation then classify the data using the soft-max regression using the NSL-dataset. Hodo et al. [40] proposed a new approach of an artificial neural network to detect DoS and DDoS attacks with obtaining good accuracy in IoT systems. Chen et al. [41] also tried to detect DDoS for IoT networks. Haddadi et al. [42] used two hidden layers of the neural network using the DARPA1999 data to overcome the problem of overfitting and detect suspicious events. Amma et al. [43] proposed a new in-depth radial approach to optimize the depth of the neural network parameters applied to different datasets to detect DoS attacks.
Recently, Moustafa et al. [1] reviewed existing IDSs and their methods and problems in network and edge systems. The authors demonstrated that the main challenge of IDSs is that existing IDS approaches cannot discover new families from large-scale and heterogeneous data sources collected from IoT networks. It was recommended that deep learning techniques improve the performance of reliable intrusion detection systems for obtaining high detection accuracy and low false alarm rates [1,16]. Therefore, this study's primary goal is to discover new attack families from heterogeneous data sources collected from the edge of a network. Deep learning is used in this work as it has the ability of the feature extracting, analyzing in deep, and detecting suspicious vectors.

Proposed CNN-Enabled Intrusion Detection System
This section discusses the proposed Intrusion Detection System (IDS) that discovers cyberattacks from the edge of a network. The proposed system provides the ability to deal with the essential features of network flows. The proposed system includes three main components: data preprocessing, feature selection, and decision engine, as depicted in Fig. 1. In data preprocessing, network data are filtered and processed by removing redundant values, converting data into a numeric format, and normalizing data to improve feature selection and decision engine stages. In feature selection, the information gain method is applied to select the essential features and enhance the accurate detection of the decision engine technique. In the decision engine, a few shot deep learning-based Convolution Neural Network (CNN) techniques are employed to classify anomalous behaviors. The three components of the proposed IDS are explained below.

Data Preprocessing Phase
In the data preprocessing phase, network data are filtered by converting non-numerical features to numerical values because the convolution neural network handles numbers. This conversion happens by converting categorical values in the datasets into numeric ones, such as protocol values in the dataset are converted into numerical values, for example, (TCP = 1, UDP = 2, ICMP = 3). Redundant values in the datasets are also excluded to enhance the detection accuracy of Deep Learning. To overcome the imbalance in the datasets, the train and test data are divided into 20% testing data and 80% training data. The values of the feature datasets, such as UNSW-NB 15 and BoT-IoT datasets, are entirely different because the data have nominal, float, and timestamp values. Therefore, data features are normalized into a range of values, such as [0,1], to improve the decision engine's performance.

Few Shot Learning Method for Intrusion Detection
Few-Shot Learning (FSL) can release new tasks that have only a few samples with supervised information. In other words, FSL is a new machine learning that is ready to learn from a limited number of examples with supervised information [44,45]. FSL can help in the robotics field [46], which generates robots or machines that act like humans. Many fields need to use FSL, and the most important one is drug discovery, which finds out the properties of new molecules to generate a new drug [9] that will be useful for diseases. FSL is now considered a hot topic because it is based on a small number of samples, so many machine learning approaches have been proposed, such as embedding learning [47,48], meta-learning [49], and generative modeling [44,50].

Feature Selection-Based Information Gain (IG)
Information Gain (IG) is known as mutual information that indicates a training set of features vectors is most useful for discriminating between the classes to be learned and tries to find a subset of the original variable, which is calculated as Eq. (1). It is one of three feature selection strategies: filter, wrapper, and embedded approaches [16]. It is used to improve the accuracy of the system or time for mining. The different researchers applied data preprocessing techniques, such as data cleaning, data integration, and dimensionality reduction based on feature reduction and feature selection. The entropy determines the value of the information and relation between each feature, estimated as Eq. (2). Feature selection is the way of searching for a solution to make a network more secure through reducing false alarm and time costs of IDSs during monitoring malicious activities on a network.
The objective of feature selection is to minimize the attribute. It led to making probability close to possible original distribution to all attributes. This process is done without more selection techniques employed to select relevant and information features or to select features that are useful to build a good predictor. Information gain is based on Shannon's mathematical theory and communication and depends on entropy, which is a measure of unpredictability of information, and ranks the features that affect the data classification and pi is the probability of feature in the given set of features as shown in Eq. (3). where Pi = (# classes i /entity population) According to Maher and Ulrich (2012), IG handles only discrete values; therefore, it is essential to transfer continuous values into discrete values. Given the two random variables X and Y, I (X, Y) is the information gain of X concerning the class attribute Y. When Y and are discrete variable that takes values in {y i ,. . ., y t } and {x i ,. . ., x t }. With probability distribution function P(x); then the entropy of X is given by Eq. (4) or average information is expected value of I(x) over an instance of X by Eq. (5). Information I from the message X. Hence the IG for feature F on the dataset D in Eq. (6) H where value (F) is the set of all possible F values, D attr is the subset of D that has a value attr.
H(D) = entropy of the class attribute.
Based on the information gain method, we select the most critical ten features from the network datasets to improve the decision engine technology's performance that can discover cyber-attacks.

Convolution Neural Network (CNN) as Decision Engine
CNN is used as a decision engine of IDS that classifies legitimate and anomalous activities at the network's edge. CNN may be a later type of neural network that works on to memorize and reach appropriate features for speaking to the input information. There are two contrasts with MLPs, which are weight sharing and pooling. CNN has numerous layers, and each layer comprises numerous convolution bits that are utilized to form distinctive outlines. Each locale of the neuron of a feature outline is connected to the following layer. All the spatial areas of the input share the bit for producing the included outline. One or different completely connected layers are utilized for the classification [13] after a few convolution and pooling layers. Since the utilization of shared weights in a Convolution Neural Network, the demonstration learns the same design is happening at distinctive positions of inputs without inquiring about memorizing isolated detectors for each position. For that, the architecture can control the interpretation of inputs [51].
The pooling layers minimize the computational obstacle since it diminishes the number of connections between convolutional layers. Be that as it may, pooling layers expanding the properties of interpretation and upgrading the open field of convolution layers. The activation function is used to solve non-linearity for convolution neural networks that help multi-layer detect nonlinear features. There are three types of activation function sigmoid, tanh and ReLU. One or numerous completely connected layers can be included after the stream of the network. To measure the blunders within the preparing portion, loss work can be utilized to check the mistakes [52]. The CNN is adapted using the parameters listed in Tab. 1 to establish a decision engine technique that can classify legitimate and attack events of datasets collected from the edge of networks.

Experimental Design
We used Google open-source data flow engine TensorFlow using the Python Keras package, which is named Google Colab [53], to implement the proposed IDS. Keras was used as the front-end API as it is the foremost critical library in an in-depth convolutional network study. It incorporates a model reinforcement to utilize it effectively and rapidly that runs utilizing CPU and GPU.

Datasets Used
To validate the proposed system for different types of attacks and different network infrastructure and characteristics, testing and evaluation was carried out on two different network datasets of UNSW-NB15 [14] and BoT-IoT [15,54]. First, the UNSW-NB15 [14] is a new data set published in 2015 from The UNSW Canberra Cyber to evaluate intrusion detection purposes. The UNSW-NB15 is divided into a training set and testing set containing 175,341 records and testing 82,332 records. The UNSW-NB15 used the IXIA Perfect Storm tool to establish mixed regular and modern attacks of network traffic. The UNSW-NB15 includes nine attack families, as demonstrated in Tab. 2.
Second, the Bot-IoT dataset was designed from a real network environment and was built in the cyber range lab of UNSW Canberra to be used for creating. There are combinations between normal and malicious traffic in the environment. The source files of the datasets are given with different formats that contain CSV files, PCAP files, and argue files. The files will be clustered based on the attack category and subcategory to get better support in the labeling process. The PCAP files are 69.3 GB, with more than 72.000.000 records. The size of the extracted traffic is 16.7 GB. MySQL queries are used in the botnet dataset for extracting 5% of the original dataset to ease the usage of the dataset. The extracted 5% consists of 4 files 1.07 GB in size, and 3 million records. The attack types of the Bot-IoT dataset are described in Tab. 3. This attack is related to a web application. An attacker sneaks the web application from a port scan, web scripts and spam of emails. Backdoor It is a technique of passing hidden standard authentication; make the authorization of remote access to an end device, the definition of the access to plain text, as it wants to be unobserved.

DoS
It is an intrusion which disrupts the computer resources via memory to cause excessive business, to prevent authorized requests from accessing a device.

Exploit
It is an asset of orders which pick advantage of vulnerability, unsuspected manner on network or host. Generic This attack uses a hash function to make collision without esteem to the arrangement of the block-chipper. This attack makes against block-cipher.

Reconnaissance
It is the same meaning of probe; the attacker begins to collect the information about the network of the computer to shirk the security. Shellcode It is malware in which the attacker sneaks a small part of code starting from a shell to control the machine. Worm It is an attack in which the attacker replicates itself to propagate on computers and use network computers to spread, based on the security washout of the accessing computer that uses it.

Feature Selection Using Information Gain
The ten crucial features are selected using the Information Gain technique from the UNSW-NB15 and BoT-IoT datasets, as listed in Tabs. 4 and 5. These features are used as the input of applying CNN as a decision engine to classify normal and attack activities. They significantly impact the performance of the decision engine by improving the detection accuracy and processing time.

Results of CNN Compared with Other IDSs
The proposed CNN-IDS model was trained using the two datasets of UNSW-NB15 and Bot-IoT. This phase of training to guarantee that parameters dependable for affecting in the testing phase. The evaluation of the CNN intrusion detection system was processed on the ten selected features of datasets listed in Tabs. 4 and 5. Using the UNSW-NB15 dataset, the overall Detection Rate (DR) and False Positive Rate (FPR) of the CNN-IDS are represented in Fig. 2. In this figure, the Receiver Operating Characteristics (ROC) curves which show the relation between the detection rates and false rates, are depicted. The outcomes demonstrated the proposed system could detect different attack types in an average of 91% on the UNSW-NB15 dataset. The results of CNN-IDS system is compared with four existing intrusion detection techniques, that are named the Triangle Area Nearest Neighbors (TANN) [33], Euclidean Distance Map (EDM) [55] and Multivariate Correlation Analysis (MCA) [56], Outlier Dirichlet Mixture (ODM) [5]. As shown in the figure, the system outperforms these techniques in terms of detection rate with about 2% and a false positive rate with roundly 1%-2%. The proposed CNN-IDS system also can correctly classify and discover various attack types using the BoT-IoT dataset, as presented in Fig. 3. The proposed system can detect all the attack types in around a 99.9% detection rate and a 0.01% false-positive rate on the BoT-IoT dataset. The CNN-IDS system is also compared with the four techniques used in the UNSW-NB15 dataset. The outputs illustrated that the proposed system would detect attack types better than other models with about a 3% detection rate and around a 3%-4% false-positive rate. When comparing the results on both datasets, it is obvious that the proposed CNN-IDS achieves better performance with about 8% detection rate using the BoT-IoT dataset that is higher than the UNSW-NB15. This is because the BoT-IoT has new attack types with high variations between the normal and attack classes, enabling the CNN-IDS system to train the normal and attack data better than the UNSW-NB15 dataset. To sum up, the proposed CNN-IDS system achieves higher detection accuracy than the other four IDS mechanisms because of its potential design using the Information gain and CNN models. The Information Gain assisted in selecting the most important features in both datasets, while the CNN architecture [57] was designed to have multi-dense layers that can identify small variations between the normal and abnormal events from the datasets. Therefore, the proposed system can be used as a proper IDS solution that identifies and alerts attack activities at the edge of networks.

Conclusion
This paper has presented a new IDS, so-called CNN-IDS, based on a few shots learning. The proposed CNN-IDS has been developed to discover new attack events from the edge of a network. The proposed system includes two models of feature selection and decision engine. The feature selection model was developed by the Information Gain method to select essential features from network data, while the decision engine was developed using a one-dimensional Convolutional Neural Network (CNN) algorithm to discover attack events. The proposed system was trained and tested using two datasets of the UNSW-NB15 and Bot-IoT. The results showed that the proposed system outperforms several peer intrusion detection systems. This demonstrates the capability of applying the proposed system at real IoT networks and safeguards them against new cyber threats. This work will be extended by developing new federated IDS that can concurrently discover attacks from IoT services and their network traffic.
Funding Statement: This work has been supported by the Australian Research Data Common (ARDC), project code-RG192500.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.