Abstract

In the era of the Internet of Things (IoT), connected objects produce an enormous amount of data traffic that feed big data analytics, which could be used in discovering unseen patterns and identifying anomalous traffic. In this paper, we identify five key design principles that should be considered when developing a deep learning-based intrusion detection system (IDS) for the IoT. Based on these principles, we design and implement Temporal Convolution Neural Network (TCNN), a deep learning framework for intrusion detection systems in IoT, which combines Convolution Neural Network (CNN) with causal convolution. TCNN is combined with Synthetic Minority Oversampling Technique-Nominal Continuous (SMOTE-NC) to handle unbalanced dataset. It is also combined with efficient feature engineering techniques, which consist of feature space reduction and feature transformation. TCNN is evaluated on Bot-IoT dataset and compared with two common machine learning algorithms, i.e., Logistic Regression (LR) and Random Forest (RF), and two deep learning techniques, i.e., LSTM and CNN. Experimental results show that TCNN achieves a good trade-off between effectiveness and efficiency. It outperforms the state-of-the-art deep learning IDSs that are tested on Bot-IoT dataset and records an accuracy of 99.9986% for multiclass traffic detection, and shows a very close performance to CNN with respect to the training time.

1. Introduction

The Internet of Things (IoT) network is a set of smart devices such as sensors, home appliances, phones, vehicles, and computers that are interconnected through the global Internet. This type of network is increasingly becoming an essential part of our everyday life and is providing a variety of applications such as smart home, smart grid, smart agriculture, smart cities, and intelligent transportation.

Although the IoT can make the human’s life more comfortable, this benefit comes at the expense of security [1]. Nowadays, IoT networks are becoming an attractive target for cybercriminals and are exposed to major risks. A report from Unit 42 of Palo Alto Networks revealed that 98% of all IoT device traffic is unencrypted, and 41% of attacks exploit IoT device vulnerabilities [2]. The vulnerable devices could be later used by adversaries to join an IoT botnet and participate in sophisticated and large-scale attacks. For example, the first IoT botnet launched in October 2016, named Mirai [3], was able to compromise vulnerable CCTV cameras that were using default usernames and passwords to launch a DDoS attack on DNS servers. This attack resulted in stopping the Internet accessibility in some parts of the USA. In April 2020, an IoT botnet, named Mozi, was discovered and was found capable of launching various DDoS attacks [4, 5].

To deal with this kind of threat, the intrusion detection systems have been widely used to detect malicious network traffic [6, 7], especially when the preventive techniques fail at the level of endpoint IoT devices. As cyberattacks targeting IoT are increasingly becoming more sophisticated and stealthy, the IDS should continuously evolve to handle emerging security threats. Due to its heterogeneous nature, IoT network generates high-dimensional, multimodal, and temporal data. By applying big data analytics on such data, it is possible to discover unseen patterns, reveal hidden correlations, and gain new insights [8]. Artificial intelligence is increasingly used in the big data analysis process. In particular, deep learning techniques have proven their success in dealing with heterogeneous data [811]. They are also capable of analyzing complex and large-scale data to get insights, spot dependencies within data, and learn from previous attack patterns to recognize new and unseen attack patterns [1214]. As IoT devices are resource-constrained and have limited capabilities in terms of storage and computation, heavyweight tasks like big data analysis process and building of learning models need to be offloaded to fog and cloud servers [1521]. Hence, computation offloading [22] can help reduce the execution delay of the task and save energy consumption of battery-powered and mobile IoT devices, but it also poses some security concerns [23].

Many deep learning approaches have been proposed for IDS, and some of them specifically focus on IoT [2432]. Each approach adopts its own design choices, which might limit its capability in achieving good performance in terms of effectiveness and efficiency.

In this paper, we propose five design principles to be considered when developing an effective and efficient deep learning IDS for IoT, and we use these principles to propose TCNN, a variant of CNN that uses causal convolutions. TCNN is combined with data balancing and efficient feature engineering. More specifically, the main contributions of the paper are the following: (i)We identify five key design principles for the development of deep learning-based IDS for IoT, including handling overfitting, balancing dataset, feature engineering, model optimization, and testing on IoT dataset(ii)Based on the identified key design principles, we compare the state-of-the art methods, identify their gaps, and analyze the main differences with respect to our work.(iii)We design and implement Temporal Convolution Neural Network (TCNN), a deep learning framework for intrusion detection systems in IoT. TCNN combines Convolution Neural Network (CNN) with causal convolution.(iv)To handle the issue of imbalanced dataset, we integrate TCNN with Synthetic Minority Oversampling Technique-Nominal Continuous (SMOTE-NC).(v)We employ efficient feature engineering, which consists of the following:(1)Feature space reduction: it helps in reducing memory consumption.(2)Feature transformation: it is applied on continuous numerical features using log transformation and standard scaler, which transforms skewed data to Gaussian-like distribution. It is also applied on categorical features using label-encoding, which replaces a categorical column with a unique integer value.(vi)We evaluate the effectiveness and efficiency of the proposed TCNN on Bot-IoT dataset, and compare it with CNN, LSTM, logistic regression, random forest, and other state-of-the-art methods. The results show the superiority of TCNN in scoring an accuracy of 99.9986% for multiclass traffic detection.

The rest of the paper is organized as follows. Section 3 presents the key design principles with respect to deep learning IDS for IoTs. Section 4 overviews related work. Section 4 and Section 5 describe the design and implementation of TCNN, respectively. Section 6 presents the evaluation results and comparison with state-of-the-art methods. Finally, Section 7 concludes the paper and outlines future research directions.

2. Key Design Principles for Deep Learning IDS in IoT

The objective of deep learning-based IDS solutions for IoT is to generate models that perform well in terms of effectiveness and efficiency. However, each model adopts some design choices that might limit its ability in achieving this objective. For example, some deep learning IDSs in IoT do not consider the overfitting problem, or apply their model on an unbalanced dataset, or neglect employing feature engineering, which negatively affects their performance in terms of accuracy, memory consumption, and computational time. Also, some IDSs do not try to optimize their learning model, and some are evaluated on outdated or irrelevant datasets, which do not reflect real-world IoT network traffic.

Motivated by the above observations, the deep learning-based IDS solution for IoT should advocate the following key design principles: (i)Handling overfitting: overfitting happens when the model achieves a good fit on the training data, but it does not generalize well on unseen data. In deep learning, overfitting could be avoided by the following methods: (1)Applying regularization, which adds a cost to the loss function of the model for large weights.(2)Using dropout layers, which randomly remove certain features by setting them to 0.(ii)Balancing dataset: data imbalance refers to a disproportion distribution of classes within a dataset. If a model is trained under an imbalanced dataset, it will become biased, i.e., it will favor the majority classes and fail to detect the minority classes. By balancing the dataset, the effectiveness of the model will be improved.(iii)Feature engineering: it allows reducing the cost of the deep learning workflow in terms of memory consumption and time. It also allows improving the accuracy of the model by discarding irrelevant features and applying feature transformation to improve the accuracy of the learning model.(iv)Model optimization: the objective of model optimization is to minimize a loss function, which computes the difference between the predicted output and the actual output. This is achieved by iteratively adjusting the weights of the model. By applying an optimization algorithm such as SGD and Adam [33], the effectiveness of the model will be improved.(v)Testing on IoT dataset: a deep learning-based IDS for IoT should be tested under an IoT dataset to get results that reflect real-world IoT traffic.

Deep learning has been applied in many fields of cybersecurity including malware detection [3439] and intrusion detection system [14, 4046]. In this section, we give an overview on deep learning-based IDS for IoT networks.

Lopez et al. [26] proposed RNN-CNN, a combination of recurrent neural network (RNN) and convolutional neural network (CNN). To deal with overfitting, they added some layers such as max pooling, batch normalization, and dropout. They only considered a subset of features to improve the effectiveness of the model.

Putchala [28] applied the Gated Recurrent Unit (GRU) algorithm on KDD 99Cup dataset. He also used the random forest classifier as a feature selection technique. The best possible performance results are obtained by minimizing the loss function.

Roy and Cheung [31] presented Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM RNN). They applied feature normalization and converted categorical features to numeric values.

Diro et al. [24] applied a deep neural network (DNN) on NSL-KDD dataset. The loss function of DNN is minimized using stochastic gradient descent (SGD). There are fog nodes, which are responsible for training the deep learning model. The local parameters are sent to a fog coordinator node for update. This allows sharing the best parameters and helps avoiding local overfitting.

Roopak et al. [29] applied four different classification deep learning models: MLP, 1d-CNN, LSTM, and CNN+LSTM on CICIDS2017 dataset. They also balanced the dataset by duplicating the records. However, it is not explained which balancing method is used. The overfitting issue is handled by adding some layers to the model such as max pooling and dropout.

In [32], Deep Belief Network (DBN) is used to develop a feed-forward deep neural network (DNN) and is applied on an IoT simulation dataset. DNN is optimized by assigning a cost function to each layer of the model.

Otoum et al. [27] proposed Stacked-Deep Polynomial Network (SDPN) on NSL-KDD dataset. For optimal selection of features, they employed the Spider Monkey Optimization (SMO) algorithm [47]. To avoid overfitting, the L2 regularization technique is integrated with the loss function.

Ferrag and Maglaras [25] applied recurrent neural network (RNN) with the truncated backpropagation through time (BPTT) algorithm on two non-IoT datasets and BoT-IoT dataset. They normalized the features before feeding them to RNN-BPTT.

Roopak et al. [30] proposed a sequential architecture combining CNN and LSTM and applied it on CISIDS2017 dataset. For optimal selection of features, they employed a multiobjective optimization algorithm named nondominated sorting genetic algorithm (NSGA) [48]. To avoid overfitting, they implemented a max-pooling layer between CNN and LSTM layers.

Koroniotis et al. [49] are the first who developed BoT-IoT dataset, and they used it to test RNN and LSTM. For feature selection, they computed the correlation coefficient among the features of the dataset and applied feature normalization to scale the data within the range [0, 1].

Aldhaheri et al. [50] proposed DeepDCA, an IDS that combines Dendritic Cell Algorithm (DCA) and Self Normalizing Neural Network (SNN). They adopted Information Gain as a feature selection technique to decide on the set of features to be fed to BoT-IoT dataset. Although, the authors presented results with balanced dataset but no information about balancing method is provided. As for model optimization, they used a loss function to update the weights of the deep learning layers.

Soe et al. [51] proposed Artificial Neural Network (ANN) to detect DDoS attacks in Bot-IoT dataset. To balance the dataset, they used the SMOTE technique. Also, they applied feature normalization before feeding the input data to ANN.

Ge et al. [52] applied feed-forward neural networks (FNN) on BoT-IoT dataset. The dataset is balanced not through oversampling but in an algorithmic way, i.e., giving class weights to the training data. To optimize the model, they used Adam optimizer and a sparse categorical cross-entropy loss function to update weights. To deal with overfitting, they employed different regularization techniques such as L1, L2, and dropout. They also encoded categorical features as numerical using one-hot encoding.

Muna et al. [53] proposed a combination of deep autoencoder (DAE) and deep feed-forward neural network (DFFNN) to detect malicious activities in industrial IoT. The optimal parameters are obtained by calculating a loss function, which allows updating the weights and minimizes the difference between the actual and the predicted output.

3.1. Key Finding

Table 1 summarizes and compares the IDS solutions with respect to the abovementioned five design principles. We can notice that only 6 out of 14 solutions are tested under IoT dataset [25, 27, 4951]. The majority of solutions do not consider dataset balancing. Only 4 solutions are designed with data balancing [29, 5052], two of them do not explain how the balancing approach is implemented [29, 50], one solution considers algorithmic-level data balancing [52], and only one solution considers data-level balancing by applying SMOTE algorithm [51]. Handling overfitting is not considered in the design of 7 solutions [25, 28, 31, 32, 49, 50, 53]. On the other hand, model optimization is only considered by 7 solutions [24, 27, 28, 32, 50, 52, 53]. Most of the solutions employ feature engineering in their design, except for two solutions [29, 32].

3.2. Comparison with Related Work

To the best of our knowledge, our wok and [52] are the only ones that consider all the five design principles. Differently from [52], which adopts algorithmic-level data balancing, our work applies the SMOTE-NC algorithm on Bot-IoT dataset, which can handle continuous and categorical features. We use overfitting and optimization techniques in achieving effective IDS. We also use feature space reduction and feature transformation in achieving efficient and lightweight IDS in terms of memory usage and training time.

4. Proposed Framework

4.1. Basic Principles

Deep learning is a concatenation of different layers. The first layer is called the input layer, and the last layer is called the output layer. In addition, hidden layers are inserted between the input and the output layers. Each layer is composed of a set of units, called neurons. The size of the input layer depends on the dimension of the input data, whereas the output layer is composed of C units, which corresponds to the C classes of a classification task.

Convolutional neural network (CNN), as shown in Figure 1, is a deep neural network that is composed of multiple layers. The three main types of layers are the following: (i)Convolutional layer: it applies a set of filters, also known as convolutional kernels, on the input data. Each filter slides over the input data to produce a feature map. By stacking all the produced feature maps together, we get the final output of the convolution layer(ii)Pooling layer: it operates over the feature maps to perform subsampling, which reduces the dimensionality of the feature maps. Average pooling and max pooling are the most common pooling methods(iii)Fully connected layer: It takes the output of the previous layers, and turns them into a single vector that can be an input for the next layer

The TCNN deep learning architecture [54] is a combination of CNN architecture and causal padding, which results in causal convolutions. Figure 2 shows 1D causal convolution with a kernel size of 3, which is applied on time-series input data . By causal convolutions, we mean that an output at time is convolved only with elements from time and earlier in the previous layer. Therefore, it does not violate the temporal order of the data, and there is no leakage of information from future to past. Zero padding of length is added to the layers to have the same length as the input causal convolution layer.

4.2. Overall Architecture

Figure 3 shows the overall architecture of the proposed TCNN framework, and its implementation is detailed in Section 5. The proposed architecture is composed of the following phases: (i)Dataset balancing: as mentioned above, an imbalanced dataset can produce misleading results. To handle this problem, we use in this phase the SMOTE-NC method, which creates synthetic samples of minority classes and is capable of handling mixed dataset of categorical and continuous features.(ii)First feature engineering (feature space reduction): in this phase, we clean the dataset, i.e., reduce the feature space by removing unnecessary features, and converting the memory-consumption features into lower-size datatype.(iii)Dataset splitting: in this phase, the dataset is split into: training, validation, and testing subsets in order to counter overfitting.(iv)Second feature engineering (feature transformation): in this phase, we apply feature transformation on the training subset. Log transformation and standard scaler are applied on the continuous numerical features. In addition, label encoding is applied to categorical features, which simply replaces each categorical column with a specific number. This transformation process is later applied on the validation and the testing subsets.(v)Training and optimization: in this phase, the TCNN model is built, as described in Section 4.3. It is trained using the training subset, and its parameters are optimized using Adam optimizer and the validation subset.(vi)Classification: the generated TCNN model is applied on the testing subset to attribute each testing record to its actual class: normal or a specific category of attack.

4.3. Training and Optimization of TCNN Framework

The training and optimization phase of the proposed TCNN is composed of two 1D causal convolution layers, two dense layers, and a softmax layer, which applies softmax functions for multiclass classification task. To overcome overfitting, we use global maximum pooling, batch normalization, and dropout layers. We choose Adam optimizer to update weights and optimize cross-entropy loss function. Adam optimizer combines the advantages of two stochastic gradient descent algorithms, namely Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp).

Specifically, the training and optimization phase of the proposed TCNN architecture, as shown in Figure 4, is composed of the following layers: (i)First 1D causal convolution layer: it convolves across the input vectors with 64 filters and filter size of 3.(ii)Second 1D causal convolution layer: it uses 128 filters and a filter size of 3. This second layer before pooling allows the model to learn more complex features.(iii)1D global maximum pooling layer: it replaces data, which is covered by the filter, with its maximum value. It prevents overfitting of the learned features by taking the maximum value.(iv)Batch normalization layer: it normalizes the data coming from the previous layer before going to the next layer.(v)Fully connected dense layer: it employs 128 hidden units and a dropout ratio of 30%.(vi)Fully connected dense layer with softmax activation function: it produces five units that correspond to the five categories of traffic for multiclass classification.

5. Implementation

To implement the detection learning models, we use Intel Quad-core i7-8550U processor with 8 GB RAM and 256 GB Hard drive. As for software, we use Python 3.6 programming language, and TensorFlow to build deep learning models. Moreover, different libraries are used including Scikit-learn, Keras API, Panda, and Inmblearn. We implement the framework in Figure 3 on Bot-IoT Dataset [49].

5.1. Bot-IoT Dataset

We use Bot-IoT [49], an IoT dataset that was released in 2018 by the Cyber Center in the University of New South Wales. By virtualizing the setup of various smart home appliances including weather stations, smart fridges, motion-activated lights, remotely activated garage doors, and smart thermostats, legitimate and malicious traffic is generated. The dataset consists of more than 73,000,000 records, which are represented by 42 features, as shown in Table 2. Each record is labeled either as normal or attack. In addition, the attack dataset is divided into four categories: DoS, DDoS, reconnaissance, and information theft, and each category is further divided into subcategories, as shown in Table 3.

In this work, we use a subset of Bot-IoT dataset, consisting of approximately 3,700,000 records, which is the same as the one used in [49].

5.2. Dataset Balancing

In the dataset, there are 9,543 normal and 73,360,900 attack samples. The subset of the dataset is composed of 477 normal samples and 3,668,045 attack samples. We can notice that more than 97% of the samples belong to DoS and DDoS categories, as shown in Table 3. In this way, the learning model will predict the majority classes and fail to spot the minority classes, which means the model is biased.

To deal with this problem, different resampling methods have been proposed [55] like (1) random oversampling, which randomly replicates the exact samples of the minority classes, and (2) oversampling by creating synthetic samples of minority classes using techniques such as synthetic minority oversampling technique (SMOTE), synthetic minority oversampling technique for nominal and continuous (SMOTE-NC), and adaptive synthetic (ADASYN). In this work, we use the SMOTE-NC technique as it is capable of handling mixed dataset of categorical and continuous features [56]. The minority classes such as normal and theft are increased to 100,0000 samples in the training subset, as shown in Table 4.

5.3. Feature Space Reduction

One of the main objectives of this work is to develop a lightweight IDS for IoT environment. Therefore, it is important to improve the efficiency of the detection models by reducing the feature space and noise in the dataset, as well as reducing the memory usage and computation complexity. By using the full set of features, 2.9 GB of memory is used. Feature space reduction decreases the processing complexity and speeds up the training and detection processes. The following steps are applied to the dataset, which successfully decrease the memory consumption to 668 MB, i.e., 77% reduction. (i)Conversion of object data type into categorical data type: Table 5 shows the data types and the number of features encoded for each type. As shown in the table, there are 9 memory-consuming features that are encoded as objects, which are “flgs,” “proto,” “saddr,” “sport,” “daddr,” “dport,” “state,” “category,” and “subcategory.” As category datatype is more efficient, object features are converted into category datatype [57].(ii)Conversion of Int64 data type into Int32 data type: by default, the 22 integer features in the dataset, as shown in Table 2, are stored as Int64 (8-bytes) type. After checking these features, we find out that they do not exceed the capacity of Int32 (4-bytes) type. Therefore, all the values of Int64 type are encoded into Int32 type, which incurs half of the memory consumption that is incurred by the Int64 type.(iii)Removing unnecessary features: in the dataset, we exclude some useless features such as the following: (1)“pkSeqID”: it has the same role as the automatically generated index.(2)“stime” and “ltime”: they are captured in the “dur” feature, which computes the duration between “stime” and “ltime”.

5.4. Feature Transformation

We describe how the numerical features and categorical features are transformed. After the dataset is split into training, validation, and testing subsets, the transformation is only applied on the training subset. Then, the same transformation is reapplied on the validation and the testing subsets. (1)Numerical feature transformation: the dataset contains 31 numerical features, including both discrete and continuous values. There are two discrete features, i.e., “spkts” and “dpkts,” and are represented by a finite number of values. So, they do not require any feature engineering.

There are 29 continuous features in the dataset, which are “pkts,” “bytes,” seq, dur,, mean, stddev, sum, min, max, spkts, dpkts, sbytes, dbytes, rate, srate, drate, TnBPSrcIP, TnBPDstIP, TnP_PSrcIP, TnP_PDstIP, TnP_PerProto, TnP_Per_Dport, AR_P_Proto_P_SrcIP, AR_P_Proto_P_DstIP, N_IN_Conn_P_DstIP, N_IN_Conn_P_SrcIP, AR_P_Proto_P_Sport, AR_P_Proto_P_Dport, Pkts_P_State_P_Protocol_P_DestIP, and Pkts_P_State_P_Protocol_P_SrcIP. Figure 5 shows the histograms of 4 features. As shown in the figure, the continuous features are not normally distributed, which usually affects the performance of linear models. To this end, log transformation and standard scaler are applied to the continuous features to be Gaussian-like distribution as follows: (i)Log transformation: the new value of the feature , where is the original value of the feature.(ii)Standard scaler: it computes the mean and standard deviation on a training set. Then, the features are normalized to Gaussian distribution. For each , we compute the normalized value

5.5. Dataset Splitting

Conventional splitting and cross-validation are the main approaches used to split datasets. Cross-validation is mainly used in legacy machine learning to overcome the overfitting problem. When a large dataset is used with deep learning, cross-validation increases the training cost. In this work, the dataset is split using the conventional three-way split into: training, validation, and testing subsets. In addition, regularization is applied to deal with the overfitting if it appears [58]. Also, a stratified split is used to ensure that there is a portion of each class in each split [59].

5.6. Deep Learning Models

All deep learning models are built using Keras API on top of TensorFlow. Different Keras packages are used, including preprocessing, models, layers, optimizers, and callback. The same activation functions are used in all models. To model nonlinear relationships between input and output in each layer, relu activation function is used. The output layer activation function is softmax; a generalized logistic regression activation function is used. The number of output units in softmax is equivalent to the number of attack categories in addition to the normal class [60]. The deep learning architectures of TCNN, LSTM, and CNN are shown in Figure 6, and their hyperparameters are shown in Table 6.

To deal with overfitting, some techniques such as Global maximum pooling, Batch normalization, and dropout are used. To adjust the weights, Adam optimizer is selected since it outperforms the other optimizers, such as SGD and AdaGrad.

6. Evaluation

We evaluate the performance of TCNN and compare it with two legacy machine learning algorithms, i.e., logistic regression (LR) and random forest (RF), and two deep learning models, i.e., LSTM, and CNN.

6.1. Performance Metrics

The multiclass detection models are evaluated with respect to the following metrics: (i)Effectiveness metrics: we measure how the detection model is effective in distinguishing between the different classes of network traffic. To this end, we use the following metrics:(i)(ii)(iii)(iv)where , , , and denote the true positives, true negatives, false positives, and false negatives, respectively. (ii)Log loss (cross-entropy loss): it measures the performance of the classification model whose output is a probability value. A perfect model would have a log loss of 0, and it increases as the predicted probability diverges from the actual label. Formally, where is the number of classes.(iii)Training time: it measures the required time to build the classification model

6.2. Evaluation of Legacy Learning Models

Logistic regression and random forest are evaluated under original and rebalanced datasets, and their results are shown in Table 7. Training and testing scores are almost similar for all the experiments, which confirm the absence of overfitting. As for logistic regression, SMOTE-NC oversampling leads to an improvement in precision, recall, and F1-score, which means that there is improvement in detecting minority classes. On the other hand, the oversampling does not improve the effectiveness of random forest.

6.3. Evaluation of Deep Learning Models

We conduct a series of experiments with different hyperparameter values (e.g., learning rate, batch size, number of layers, and number of units in each layer) in order to get the best performance. Different learning rates of the optimizer are tested. The best performance is achieved when the learning rate is 0.001. Also, different number of epochs 10, 15, 20, 50, and 100 and different batch sizes 100, 256, 512, and 1024 are tested. We can notice that increasing the number of epochs will slow down the learning process. Similarly, using a smaller batch size does not improve the performance. The number of epochs and the batch size for TCNN are set to 15 and 1024, respectively.

Figure 7 shows the accuracy and log loss of TCNN for multiclass classification during the training and validation phases. TCNN reaches high performance in the first epochs, which emphasizes that 15 epochs would be enough. Additionally, the training and validation results show the absence of overfitting. The log loss results of LSTM and CNN are shown in Figure 8. We can observe that TCNN outperforms LSTM and CNN in terms of log loss.

Tables 810 show the performance of TCNN, LSTM, and CNN, respectively. We can observe that deep learning models perform better than LR and RF, as some accuracy results exceed 99.99%. The accuracy results are very close but TCNN slightly outperforms LSTM and CNN in terms of effectiveness metrics. We can also observe that the deep learning models show good results even without applying dataset balancing. By applying, SMOTE-NC oversampling, we record an insignificant and very slight decrease in the effectiveness of TCNN and LSTM. On the other hand, the effectiveness of CNN slightly increases after applying SMOTE-NC oversampling. CNN also incurs lower training time compared to TCNN and LSTM. TCNN offers a good trade-off between effectiveness and efficiency, as it is the closest competitor to CNN with respect to training time, and it records the best accuracy result.

6.4. Comparison with Related Work Tested under Bot-IoT Dataset

In Table 11, we compare the performance of our work with other state-of-the-art methods that are tested under Bot-IoT dataset. The comparison is conducted with respect to accuracy, precision, recall, F1-score, training time, and classification task. According to the table, we can identify the following classification tasks: (i)Binary classification task: it aims to distinguish between normal and attack records.(ii)Normal/one-attack classification task: it aims to distinguish between normal records and one type of attacks.(iii)Multiclass classification: it aims to attribute a record to its correct class among the five classes, i.e., one normal class and four attack classes.

It is known that multiclass classification is the most challenging task, whereas the normal/one-attack classification is the easiest one as the dataset only contains one type of attack, which means less diversity within the dataset, and easy learning for the detection model. From Table 11, we can observe that [51] achieves 100% effectiveness. However, this result can be explained by the fact that [51] aims to distinguish between normal traffic and only one type of attack, i.e., DDoS. The three deep learning models, TCNN, LSTM, and CNN, outperform the rest of related work, although they are evaluated under multiclass classification task. We can also observe that TCNN, LSTM, and CNN incur the best results in terms of training time. This is due to the adopted feature engineering that reduces the computation complexity and due to the use of simple deep learning architectures with larger batch size and less number of layers.

7. Conclusion and Future Work

In this paper, we have identified five design principles for the development of an effective and efficient deep learning-based intrusion detection system for the Internet of Things (IoT). By adopting these principles, we have designed and implemented Temporal Convolution Neural Network (TCNN), which combines Convolution Neural Network (CNN) and causal convolution. TCNN is integrated with SMOTE-NC data balancing and efficient feature engineering, which consists of feature space reduction and feature transformation.

TCNN has been evaluated on Bot-IoT dataset and compared with logistic regression, random forest, LSTM, and CNN. Evaluation results show that TCNN achieves a good trade-off between effectiveness and efficiency. It outperforms the state-of-the-art deep learning IDS methods, which were tested under Bot-IoT dataset, by recording an accuracy of 99.9986% for multiclass traffic detection. Also, it shows a very close performance to CNN with respect to training time. As part of future work, it would be interesting to consider another design principle, i.e., testing the resiliency of IDS against adversarial attacks, which can confuse the deep learning model to produce wrong predictions.

Data Availability

We used the Bot-IoT, which is a publicly accessed dataset (https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/bot_iot.php), for the evaluation of the proposed IDS.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group No (RG-1439-021).