1 Introduction

The Internet of Things (IoT) is defined as the new technology paradigm envisioned as a global network of machines and devices capable of interacting with each other [1]. Almost every device used in various applications and fields became connected through the internet. According to recent analytic reports, there are more than 13.8 billion IoT devices used worldwide, and the number is expected to increase to 30.9 billion by 2025 [2]. This massive network increased the ability to remotely control smart objects and gather and share information which provides new ways for utilization. IoT consists of a network of physical smart objects that contain computing and communication components, software, and sensors to gather and share data with other connected devices around the world. The large scale, heterogeneity of the objects [3], and long-ranged connection and control made it susceptible to a lot of attacks that not only could steal or corrupt any kind of data [4], but also can damage sensitive devices that our lives could depend on, such as medical devices and smart cars. This threat has made security topics are of a top priority in the field of IoT. Although the IoT attacks are similar to those landed in other networks, it is not suitable to use the same defenses without modification due to the limited processing power and storage of most IoT devices [5]. Therefore, to set up a security layer in IoT networks and prevent attackers from paralyzing the whole system, new innovative techniques should be investigated.

An Intrusion Detection System (IDS) is an effective security measure to detect cyber attacks or malicious actions on computer systems [6]. It differentiates between normal and abnormal network traffic or system usage behavior and can prevent them from causing damage to the computer system. IDS can be categorized based on the monitored data source: either host-based or network based. The host-based IDS inspects data sources originating from the host system logs such as operating system, application, and database logs. It can monitor the behavior of objects of high importance like sensitive files or programs and accurately detects intrusions or abnormal actions. This type depends on the host’s reliability and its resources, and thus it cannot detect network attacks. It needs numerous amounts of log files to efficiently detect attacks. On the other hand, the network-based IDS is independent of the hosts which makes it capable of being applied in different environments to detect network attacks. The disadvantage of the network-based IDS is that it can only detect attacks that occur in a specific network segment.

The detection methods in IDSs are usually categorized into signature-based, anomaly-based, or hybrid [6]. A signature-based IDS utilizes pattern-matching techniques to search for predefined signatures of the known attacks. This approach is effective in the case of known attacks, but it is not able to detect zero-day or unknown attacks. An anomaly-based IDS analyses the usage or network traffic behavior to identify any abnormality using statistical models, time series, or machine learning techniques. The anomaly-based detection methods are significantly better than the signature-based methods when dealing with new attacks, but it suffers from a high false positive rate and the inability to provide explanations for the abnormality in the network traffic [7]. A hybrid combination of the two techniques has the advantages of both which creates a more effective IDS [6]. In this paper, the network-based anomaly-based IDS is considered.

The anomaly-based IDS usually utilizes one of three intrusion detection approaches knowledge-based, statistics-based, and machine learning-based [8]. The knowledge-based requires building a knowledge base that reflects the normal actions and traffic behavior according to the existing system data such as protocol specifications and network traffic instances. Actions or behaviors that differ from the constructed knowledge base are considered an intrusion. The statistics-based approach involves collecting the data traffic and building a distribution model for the normal behavior profile. The low probability events or actions can then be considered as potential intrusions. The disadvantage of the statistics-based approach is the need to formulate mathematical equations of the assumed variables that reflect normal user behavior. Machine Learning (ML) is a type of artificial intelligence that can extract useful information from given data to be able to predict outcomes without being explicitly programmed to do so. Machine-learning-based IDS applies ML techniques on intrusion datasets to train an ML model that can be later used to identify the legitimacy of unseen traffic. The core advantage of ML-based IDS is the ability to train and improve intrusion detection accuracy using data itself and without explicitly rebuilding the model. This advantage keeps the model up to date on differentiating normal and abnormal behaviors and thus reduces the false positives.

Several machine learning techniques have been utilized in the area of anomaly-based IDS such as clustering, association rules, decision trees, nearest neighbor, and deep learning methods. Deep Learning (DL) is a sub-field of ML that consists of multiple hidden layers which makes it more suitable to work with problems with massive data [7]. DL is categorized into supervised and unsupervised learning [9]. Supervised learning is a technique that depends on training data with the correct output “label” to train the model and then it can predict the label of test data. Its disadvantage is that labeling the training data is expensive and consumes a lot of time and effort, also it needs sufficient data to achieve high prediction accuracy. Unsupervised learning is a technique that depends on unlabeled data which means the model does not know the result previously. Instead, it clusters the input data into classes based on their statistical properties. Common deep-learning models are summarized in Table 1.

Table 1 Existing deep learning models [10]

1.1 AutoEncoder

It consists of two components, an encoder responsible for extracting features from raw data, and a decoder that reconstructs the data from the extracted features. When the decoder succeeds in reconstructing the data using the extracted data, then it can represent the essence of the data.

1.2 Restricted Boltzmann machine (RBM)

It consists of a visible layer and a hidden layer, the nodes within a layer are not connected, but the nodes in different layers are fully connected. RBMs are usually used for feature extraction or de_noising.

1.3 Generative adversarial network (GAN)

It consists of a generator that generates synthetic data similar to the real data, and a discriminator that distinguishes between the generated and the real data.

1.4 Deep belief network (DBN)

A DBN is formed by connecting multiple RBM layers trained by greedy layer-wise pre_training and a SoftMax layer trained by labeled data. DBN is used in the case of feature extraction or classification.

1.5 Deep neural network (DNN)

It consists of multiple connected layers, the weights are learned first using unlabeled data, then, they are tuned using the labeled data. The unsupervised part of the DNN is mostly responsible for the high prediction accuracy of the DNN.

1.6 Convolutional neural network (CNN)

CNNs are mainly used in computer vision as they mimic the human visual system, they consist of multiple convolutional layers that work on 2D matrices. They are used in feature extraction.

1.7 Recurrent neural network (RNN)

They work on sequential data, so each layer not only receives the current state but also the previous states, many RNN models have been proposed such as Long short-term memory (LSTM) and bi-RNN.

Deep learning has proven its efficiency and accuracy in securing computer networks. However, IoT devices do not provide the same flexibility and capability to run such complex deep learning algorithms due to their limited power, storage, and connection bandwidth [4]. Therefore, to overcome this issue, a central device with large capabilities such as a cloud server can be used to perform complex calculations, analyze the traffic and train the classification model for the IDs. A cloud server provides computing, storage, and services over the internet which makes it a perfect solution to reach and control the incapable IoT devices [11].

The main contributions of this paper can be summarized as follows:

  • Four state-of-the-art deep learning models proposed in [12] are customized. They were initially designed as a single classifier, to detect six classes of the network traffic: Normal, Distributed Denial of Service (DDOS), Slowloris, Slowhttptest, Hulk, and GoldenEye.

  • We propose an Enhanced Intrusion Detection deep learning Model (EIDM) which is able to classify 15 traffic behaviors including 14 attack types contained in the CICIDS2017 dataset. To the best of our knowledge, EIDM is the first model that can classify all 15 classes without grouping close classes of similar features into one class.

  • Extensive experiments were conducted to evaluate and compare the accuracy and efficiency of the proposed models against the state-of-the-art models.

The rest of the paper is organized as follows. Section 2 presents related work on IDS using machine and deep learning techniques. Section 3 shows the assumed IoT architecture. Section 4 provides the description and architecture of the customized and proposed deep learning models. In Sect. 5, the environment setup and dataset are described, and the comparative results are presented. Finally, Sect. 6 concludes the paper and presents future work.

2 Related work

In recent literature, a lot of studies are heading to design flow-based intrusion detection systems using deep learning models.

Shahriar et al. [13] proposed a generative adversarial network (GAN) based intrusion detection system (GIDS), where GAN generates synthetic samples to increase the training examples, and IDS gets trained on them along with the original ones. The proposed system was tested and compared against a standalone IDS (S-IDS) to estimate its efficiency and the results have shown that G-IDS outperformed S-IDS because S-IDS could not accurately predict some classes due to the insufficient number of training samples for them. The proposed method was evaluated using the KDD’99 dataset.

Alkasassbeh et al. [14] proposed a detection approach for the IoT-BotNet attack by using Fuzzy Rule Interpolation (FRI). The proposed approach was applied to an open-source BoT-IoT dataset from the Cyber Range Lab of the center of UNSW Canberra Cyber. The proposed approach was evaluated and obtained a detection rate of 95.4%.

Keserwani et al. [15] proposed an IDS. Firstly, a combination of Gray Wolf Optimization (GWO) and Particle Swarm Optimization (PSO) is used to extract relevant IoT network features and then fed to a random forest classifier. The experiment was conducted on KDDCup99, NSL–KDD, and CICIDS-2017 datasets and achieved a 99.66% detection rate for multiclass classification.

Zhang et al. [16] proposed an anomaly detection model based on a neural network. The experiments were conducted using the CICIDS2017 and CTU datasets for both binary and multiclassification. They have implemented CNN, LSTM, and CNN + LSTM models, and good classification results are achieved in both binary classification and multi-classification experiments. The accuracy achieved was about 99%. They also analyzed the dataset flows which were important for the classification and efficient abnormal behavior detection.

Alhowaide et al. [17] proposed an ensemble model consisting of three different models and combined their decisions using soft voting. Two versions of the models were proposed, the first model was for working at a fog layer, called Edge-ENCLF and the second model was a cloud model called Cloud ENCLF. The authors considered four datasets, NSL-KDD, UNSW-NB15, BoTNetIoT, and BoTIoT. The accuracy achieved was 98% using NSL-KDD, 95% using UNSW-NB15 on the fog layer, and at last, 93% using UNSW-NB15 on the cloud layer.

Sahu et al. [18] proposed a mechanism for IDS using CNN to extract an accurate feature representation of data and used the LSTM model to do the classification. The authors used the IoT-23 dataset which contains traffic captured from three benign IoT devices and twenty infected IoT devices. The proposed model has achieved an accuracy of 96%.

Abdel-Basset et al. [19] proposed a semi-supervised deep learning approach for intrusion detection (SSDeep-ID). The proposed SS-Deep-ID can be integrated into a fog-enabled IoT network. The authors used CICIDS2017 and CICIDS2018 datasets. The model is tested using two scenarios: binary and multi-class classifications. For the binary classification, the model classifies the traffic behavior into benign or attack, and for the multi-class classification, the model classifies the behavior of 7 classes. The accuracy achieved in the first scenario is 99%, while in the second scenario, the measured performance varied from one class to another. The F1 measure varied for the classes and it ranged from 75 to 99.85%.

Shukla et al. [20] proposed three IDSs for IoT, including (1) Kmeans clustering unsupervised learning-based IDS; (2) decision tree-based supervised IDS; and (3) a hybrid two-stage IDS that combines K-means and decision tree learning approaches. The K-means approach achieves a 70 − 93% detection rate for varying sizes of random IoT networks. Decision tree-based IDS achieves a 71 − 80% detection rate, and the hybrid approach attains a 71 − 75% detection rate for the same network sizes.

Manimurugan et al. [21] proposed a Deep Belief Network (DBN) model for the IDS. The proposed method has been tested on the CICIDS2017 dataset and achieved 99.37% accuracy for normal class, 97.93% for Botnet class, 97.71% for Brute Force class, 96.67% for Dos/DDoS class, 96.37% for Infiltration class, 97.71% for Ports can class and 98.37% for Web attack.

Roopak et al. [12] comparatively studied multiple supervised machine learning models to determine the state of the network flow whether it is normal or suspicious. They used the CICIDS2017 dataset which contains benign, DoS and DDoS-labeled traffic along with 70 + features. The proposed deep learning models are MLP, CNN, LSTM, and CNN + LSTM. The accuracy, precision, and recall were measured for the four models with a maximum accuracy of 97.16% for the CNN + LSTM model with a precision of 97.41% and recall of 99.1%.

Toupas et al. [6] proposed a deep learning model which consists of one input layer with 44 features passed as input to the neural network. The input layer is followed by 8 hidden layers with 140, 120, 100, 80, 60, 40, 20, and 120 nodes, respectively. The final layer is the Softmax layer, which produces the probabilities for the 13 classes where the classification takes place. The proposed model is used to make a multi-class classification to detect 13 behavior types defined in the CICIDS2017 dataset. The accuracy of the model was measured by testing it using the same dataset and an accuracy of 99.95% was achieved.

Eskandari et al. [22] proposed an anomaly-based machine learning model using the Isolation Forest technique. The proposed model is used to make a single-class classification to detect Port Scanning, HTTP Brute Force, SSH Brute Force, and SYN Flood. The data was gathered by generating their own network traffic. The model was tested and evaluated. The model achieved F-score of 0.99, 0.96, 0.96, and 0.90 for Port Scanning, HTTP Brute Force, SSH Brute Force, and SYN Flood, respectively.

3 IoT architecture

The assumed IoT architecture for the proposed deep learning models is shown in Fig. 1. IoT devices are categorized into sensors and actuators. The sensors gather data and send it through gateways to a cloud server to perform complex analysis. The gateway relays data between the IoT devices and the cloud. The gateway also runs a deep learning-based IDS to scan the traffic passing through it. The deep learning model is trained by a cloud server and then deployed to the gateway to detect or even block the attacks from a compromised node or an external malicious attacker.

Fig. 1
figure 1

Proposed IoT network architecture

4 Proposed deep learning models

In this paper, an anomaly-based IDS that uses deep learning techniques has been developed to classify the network traffic and identify the cyber-attack type. Initially, we opt to customize the four recent deep learning models in [12] that were initially designed as a single classifier along with a newly proposed Enhanced Intrusion Detection Model (EIDM) to classify six classes of network traffic: Normal, Distributed Denial of Service (DDOS), Slowloris, Slowhttptest, Hulk and GoldenEye. These customized models of varying complexities are investigated to suit the needs of various IoT architectures with a central gateway node. The customized models are Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and a combination of CNN and LSTM.

Inspired by the high accuracy obtained by EIDM, it was trained and tested against the 15 classes of behaviors of the network traffic including 14 attack types provided in the CICIDS2017 dataset. To the best of our knowledge, EIDM is the first model that can classify all 15 classes without grouping close classes of similar features into one class. The network behavior classes are DDoS, slowloris, Slowhttptest, DoS Hulk, DoS, GoldenEye, Heartbleed, Bot, PortScan, Infiltration, Brute Force, XSS, Sql Injection, FTP-Patator, SSH-Patator, and Normal.

The details of the five proposed deep learning models are described next and shown in Fig. 2. The hyperparameters used in each model are selected after several model training trials and selecting the combination of parameter values that results in the best accuracy.

Fig. 2
figure 2

Architecture of the five implemented models

4.1 MLP deep learning model

MLP model consists of four layers. The input layer takes 78 input values which are the dataset’s features, followed by 1 hidden layer that takes 16 input values. A dropout layer is used to avoid overfitting by randomly dropping out nodes with the given probability (20%) in each weight update cycle, and finally, the output layer takes 16 input values and uses the sigmoid activation function to produce the classifications.

4.2 CNN deep learning model

The CNN model consists of 6 layers, including a 1D convolutional layer that takes 3d-shaped 78 input values and uses Relu as its activation function, followed by another 1D convolutional layer with a Relu activation function, the kernel size of the convolutional layer is 30 and the depth of the layer is 16. A Maxpool layer is used after the convolutional layer to extract the sharpest features and discard the features with low weight, then a dropout layer is used to avoid overfitting by randomly dropping out nodes with the given probability (20%) in each weight update cycle, and finally two fully-connected dense layers with sigmoid activation function.

4.3 LSTM deep learning model

The LSTM model consists of 5 layers, including a dense layer that takes 78 input values and uses Relu as its activation function, followed by 3 hidden layers, an LSTM layer with 64 input values, a dropout layer is used to avoid overfitting by randomly dropping out nodes with the given probability (20%) in each weight update cycle, and dense layer with 64 input values. All hidden layers use Relu as their activation function. Finally, the output layer takes 32 input values and uses sigmoid activation function to produce the output classification.

4.4 CNN + LSTM deep learning model

The data is also reshaped to be 3D to be accepted as input for the 1D convolutional layer. The CNN + LSTM model consists of 7 layers, including a dense layer that takes 78 input values which are our dataset’s features, and uses Relu as its activation function, 5 hidden layers, a convolutional layer with 64 input values, kernel size of 10 and 64 neurons. A Maxpool layer is used to extract the sharpest features and discard the features with low weight, LSTM layer with 64 input values, a dropout layer is used to avoid overfitting by randomly dropping out nodes with the given probability (20%) in each weight update cycle, and dense layer. All hidden layers have used the Relu activation function. Finally, the output layer uses the sigmoid activation function.

4.5 EIDM

This model consists of a combination of convolutional and dense layers. Due to the relatively large number of classes and features with a high unbalance of data samples, the convolutional layer is preferred. convolutional is not densely connected (i.e., not all inputs can affect all outputs) which gives more flexibility in learning. Besides convolutional layer, more layers are needed to increase the accuracy of the model. But to maintain the time cost metric and the complexity of the model as well, only one convolutional layer is used, and the rest are dense layers. The model is composed of 11 layers, including an input dense layer with 120 nodes, followed by 9 hidden layers: convolutional layer with 80 neurons and kernel size of 20 with relu as the activation function. Lower kernel sizes have been tested, but they resulted in lower accuracy due to closeness in feature values for some classes such as the web attack and the DoS attack classes. A Maxpooling layer is used to extract the sharpest features and discard the features with low weight. A dropout layer is used to avoid overfitting by randomly dropping out nodes with the given probability (20%) in each weight update cycle followed by 6 dense layers of 120, 100, 80, 60, 60, 40 nodes, respectively. The final layer is a dense layer with a sigmoid activation function. The model needed this number of dense layers to efficiently differentiate between 15 classes of behaviors including classes that have close feature values. All hidden layers used a Relu activation function and lecunUniform initializer. Stochastic Gradient Descent is used as the optimization method for the four models and the Adam method for EIDM. Table 2 provides the parameters of the implemented models. The parameters used in each model are selected after several model training trials and selecting the combination of parameter values that results in the best accuracy.

Table 2 Model parameters of each implemented model

5 Experimental results

5.1 Dataset

The CICIDS2017 dataset [23], which is one of the most commonly used datasets in intrusion detection systems, is used in the evaluation and comparison of the different models. The dataset is generated by gathering the network flow for five days, where each day targets specific types of attacks or normal behavior. Table 3 shows the data distribution of the CICIDS2017 dataset. Although it has a large set of data, some of them were unlabeled, corrupted, and contained duplicated records, in addition to the unbalanced distribution of data between the classes [24]. To overcome the issue of unlabeled and duplicated records, a cleaned version of the dataset was used [25]. The cleaned version contains 78 features. For the unbalanced classes, the Synthetic Minority Oversampling Technique (SMOTE) was used to balance the dataset using oversampling [26].

Table 3 Data distribution of the CICIDS2017 dataset

5.2 Environment setup

Experiments were conducted on a machine equipped with Intel Core i7-8700 LGA1151 @ 3.70 GHz, 16 GB DDR4 RAM 2400 MHz, and Nvidia RTX 2080 Ti 11 GB GDDR6 PCIe. The development environment of Anaconda3 and python language were used for implementation. The deep learning models were created, trained, and tested using the Keras package.

5.3 Preprocessing

To prepare the data for the model training, preprocessing steps were performed. Firstly, the Synthetic Minority Oversampling Technique (SMOTE) was used to solve the issue of the highly unbalanced data provided in the dataset using oversampling [26]. The Panda package was used to remove records with infinite value or corrupted data. Then, the data is shuffled and split into training and testing samples of 80% and 20%, respectively. Both training and testing samples were normalized over all features to use on a common scale. The normalization step is mandatory for some algorithms to be able to model the data correctly [27]. Finally, the output labels were transformed to one-hot encoded labels of 15 categories.

5.4 Metrics

The performance of the five implemented models was measured by calculating the accuracy, precision, recall, and F score. Accuracy describes the ratio between the correct predictions that have been made by the model to the total number of predictions. Precision describes the ratio between the correctly predicted positive samples to the total number of positive predictions whether they are correctly predicted or not. Recall describes the ratio between the correctly predicted positive samples to the total number of positive samples. Finally, the F-score is a measure of the test’s accuracy and it is calculated from the precision and recall of the test. All the metrics were calculated by testing the models’ detection of the behavior types provided in the CICIDS2017 dataset. These metrics are measured using the following equations, where TP, TN, FP, and FN stand for True Positive, True Negative, False Positive, and False Negative respectively.

$$\mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$$
(1)
$$\mathrm{Precision}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$$
(2)
$$Recall=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$
(3)
$$F\mathrm{score}=2\times \frac{\mathrm{Precision}\times \mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}$$
(4)

5.5 Results

5.5.1 Proposed models tested on six classes

The accuracy of the five models is measured where all models except EIDM used the SGD optimization method with different learning rates, while EIDM used Adam as its optimization method. Figure 3 shows the f1-score and the accuracy of each model. The MLP model achieved an accuracy of 88.7% using a learning rate of 0.08 and a momentum of 0.4. The CNN model achieved an accuracy of 93.7% using a learning rate of 0.08 and a momentum of 0.9. The LSTM model achieved an accuracy of 96.4% using a learning rate of 0.01 and momentum of 0.9. The CNN + LSTM model has achieved an accuracy of 97% using a learning rate of 0.01 and momentum of 0.9. At last, the EIDM model has achieved the highest accuracy of the 5 models with an accuracy of 99.48%.

Fig. 3
figure 3

The F1-Score and the Accuracy of the 5 models tested on 6 classes

Figures 4, 5, and 6 show the precision, recall, and F-score of each model for each attack type, respectively. All models achieved good precision and recall values with a slight difference between them. EIDM achieves the highest precision in all attack types, compared to other models, because it has more complex layers that can effectively differentiate between the targeted classes. Figures 7 and 8 show the training and testing time cost for the five models tested on six classes, respectively. It can be noticed that the EIDM model not only has achieved the best accuracy of all tested models, but also its time cost is better than most of the other models and slightly higher than the CNN model and MLP model. This emphasizes the applicability of EIDM in non-time-critical IoT applications since it takes less than 100 us on average to identify the traffic behavior. So EIDM could be considered a good solution for both accuracy and complexity. Hence, an extension of the model has been made to allow it to classify all 15 behavior types found in CICIDS2017. The next section presents the experimental performance evaluation results of 15 behavior types.

Fig. 4
figure 4

Precision of the 5 models tested on 6 classes

Fig. 5
figure 5

Recall of the 5 models tested on 6 classes

Fig. 6
figure 6

F-score of the 5 models tested on 6 classes

Fig. 7
figure 7

Time cost in seconds for training the 5 proposed models (6 classes)

Fig. 8
figure 8

Average time for detecting a sample type among the 6 classes in microseconds for 5 models

5.5.2 EIDM Model with 15 classes

Due to the high accuracy achieved by EIDM with six classes of attacks, it has been tested on the 15 classes of traffic behaviors found in the CICIDS2017 dataset and achieved an accuracy of 95%. Figure 9 shows the precision, recall, and F1-Score values for each class using the EIDM model. It is noticed that EIDM achieves precision, recall, and F1-score exceeding 95% for all attack types except for brute force and XSS web attacks. According to the feature analysis done for those classes, their feature values are so similar that negatively affect their classification accuracy. The time cost of EIDM training, when used on 15 classes, was about 2 h since it needed longer training time to differentiate among the numerous traffic classes, especially the three web attack classes found in the CICIDS2017 dataset.

Fig. 9
figure 9

Precision, recall and F-score of EIDM tested on all types of attacks found in CICIDS2017

5.6 Results and discussion

According to the experimental results, the accuracy of the models cannot exceed a certain value as they are affected by the number of classes and the number of samples per class. The more unbalanced the data and the greater number of classes are, the more complex the model becomes to attain high accuracy. However, the model complexity comes at the price of increasing its training and attack identification time costs which are crucial for IDS. There are several options to keep the model’s complexity at a manageable level. First, the model can be customized to work on a limited number of network traffic behaviors by excluding the weak, unbalanced classes that can negatively affect the accuracy of the model. Second, related classes can be grouped into a single class may also modify the efficiency of the model. Third, traffic behavior identification can be hierarchical so that a binary classification model can be first used to allow fast detection, especially for benign traffic. Then, an attack identification model can be trained for highly discriminative classes, while grouping similar classes together. Finally, a separate DL model can be used to identify the attack type among grouped similar classes.

5.7 Comparative evaluation

EIDM has been compared with the deep learning model presented in [6] which classifies 13 traffic behaviors found in the CICIDS2017 dataset, where the three web attacks are merged into one class. The model in [6] consists of one input layer which has 44 features passed as input to the neural network. The input layer is followed by 8 hidden layers with 140, 120, 100, 80, 60, 40, 20, and 120 nodes, respectively. The final layer is the Softmax layer, which produces the classification of the 13 classes where the prediction takes place. The model has been customized to work on the 15 classes and tested it in the same environment. The model of [6] has achieved an accuracy of 94% while EIDM has achieved an accuracy of 95% which proves the EIDM’s accuracy in detecting and classifying attacks in IDS.

Furthermore, a comparison is presented among deep learning-based IDS models showing the algorithm used in the intrusion detection, employed dataset, number of features, the number of detection classes: binary (benign or abnormal) or multi-class, and the achieved accuracy. The comparison is summarized in Table 4.

Table 4 Comparative study between the proposed model and other models

It is noticed that in binary classification, there is a variety of learning algorithms using either ML or DL. In binary classification, ML techniques can be used and achieve high accuracy and lower consumption of resources because the binary classification models are less complex than the multiclass classification. On the other hand, IDS systems based on Multiclass classification are more complex, and a large number of features are required to increase the accuracy of the classification model. Therefore, DL techniques are a proper solution in this case and can achieve high and accurate detection results. Some models have used relatively few features and still achieve a high accuracy since not all features have the same effect on making the classification decision. Moreover, it is noticeable that most models were able to achieve high accuracy rates that range from 95 to 99% employing a few number of features and focusing on binary classification. This helped in decreasing the model complexity, and hence, decreasing the time of traffic behavior detection. Some multi-class models such as [15, 16, 18, 21] have attained high accuracy but at the cost of model complexity and long detection time. The proposed EIDM model has balanced classification accuracy and efficiency.

6 Conclusion

In this paper, five deep learning models are implemented to detect and classify suspicious behaviors in the network flow. The models are trained on the cloud server and then deployed to a gateway node where the classification of the network traffic occurs. All implemented models were trained and tested using the CICIDS2017 dataset. The accuracy of the five models has been measured and concluded that the proposed model, EIDM, has outperformed the other four models with an accuracy of 99.48% and also considering the time cost. Moreover, EIDM was able to classify the 15 traffic behaviors including 14 attack types contained in the CICIDS2017 dataset with 95% accuracy. To the best of our knowledge, EIDM is the first model that can classify all 15 classes found in the dataset without grouping close classes of similar features into one class. The plan is to enhance the performance and security of the intrusion detection system in the following directions. First, investigating the distribution of the learning process of deep learning models to different machines, to enable high-performance computing via.