Attack Detection in IoT using Machine Learning

-Many researchers have examined the risks imposed by the Internet of Things (IoT) devices on big companies and smart towns. Due to the high adoption of IoT, their character, inherent mobility, and standardization limitations, smart mechanisms, capable of automatically detecting suspicious movement on IoT devices connected to the local networks are needed. With the increase of IoT devices connected through internet, the capacity of web traffic increased. Due to this change, attack detection through common methods and old data processing techniques is now obsolete. Detection of attacks in IoT and detecting malicious traffic in the early stages is a very challenging problem due to the increase in the size of network traffic. In this paper, a framework is recommended for the detection of malicious network traffic. The framework uses three popular classification-based malicious network traffic detection methods, namely Support Vector Machine (SVM), Gradient Boosted Decision Trees (GBDT), and Random Forest (RF), with RF supervised machine learning algorithm achieving far better accuracy (85.34%). The dataset NSL KDD was used in the recommended framework and the performances in terms of training, predicting time, specificity, and accuracy were compared.


INTRODUCTION
The Internet of Things (IoT) is probably the greatest modern advancement, considering its effect on our daily life, while the zones of its utilization are quickly expanding. In 2018, the quantity of IoT devices was roughly 28 billion. This amount is expected to touch 49.1 billion by 2022 and the showcase size of IoT is estimated to reach around $10 trillion by 2022. IoT is recognized as a method regarding suitable mechanisms that interconnect by servers, sensors, and various software. A city structure, is shown in Figure 1 which comprises of three main layers: fog, cloud, and terminal layer.
The data obtained from the IoT are saved on the Cloud Computing (CC) ecosystem which has progressively high-level processors and sufficient memory. The cloud layer has grown fast by the modern developments in IoT. Fog-to-things is created with a feasible clarification of those difficulties. In the fog layer, devices can experience some larger values of data basically given to the cloud layer, which decreases power damage, bandwidth, network traffic, and eliminates the data storage and communication challenges. In addition, it tries to accelerate the estimated method near the endpoint, facilitating some fast reply to the IoT-based urban use. There are two advantages of attack detection in the fog-to-things layer. Either the internet service provider or the network administrator can practice certain measures which can stop extensive destruction if these network attacks are recognized in the fog layer. Besides, this strategy does not prevent the regular daily experience for the people. Framework of a smart IoT-based city.
The model traces the web traffic which passes by every fog-to-things node. As fog-to-things connections resemble IoT devices, it will be more efficient to recognize these network attacks at the fog-to-things connections rather than at the cloud layer. Immediate attack detection can inform the network controllers of the IoT devices of those attacks, which will then support them to evaluate and improve their systems. Artificial intelligence technology like Machine Learning (ML) will do the whole evaluation and send video pictures to people who can react speedily to solve troubles and maintain residents' safety. There are two types of attack detection: primarily signature-based or primarily anomaly-based. In the former, a primarily based solution fits the in-coming traffic closer to acknowledge attack/crime kinds in database whilst the latter checks the behavior of everyday traffic.
II. RELATED WORK Many research studies in the application of ML have presently been presented in the domain, like object identification/recognition, pattern recognition, text processing, and image processing. In addition, much security research had been done using the Deep Learning (DL) approach [1]. Authors in [2] describe the expansion of big data and the evolution of IoT in a smart city. The author in [3] explains the evolution of CC and how big data have been engaged in the advancement of smart cities. He proposed a framework for managing big data for smart city purposes. The framework concentrates on difficulties related to smart cities for real-time decision planning. Many aspects and components of a smart city for upgrading the standard of the people are described in [4]. Authors in [5] suggest a platform design to secure a smart city facing cyber attackers. The structure is giving a warning DL model to identify attackers based on the user's data performance. In [6], resource administration methods of fogcomputing are analyzed, well-systematic research in taxonomy is presented, and various features of resource administration, i.e. mass balancing, resource/device scheduling and allocation, job/task allocation, device/resource provisioning, and task offloading, are highlighted. The given resource management procedures are analyzed by estimating factors such as: Qos metrics, different researches, and applied methods. The benefits and hindrances of these approaches are compared.
Authors in [7] used the idea of an unknown and secure total plan (ASAS) in mist-based open distributed computing. In ASAS, the cloud gives advanced information about open cloud servers. When the ASAS is used, the fog gives devices to exchange information with PCS. Authors in [8] reported the advancements of remote sensor organization (WSN), correspondence innovation, and IoT innovation. Authors in [9] used ML techniques such as KNN, SVM, DT, Naïve Bayes, neural networks, and RF which can be applied in IDS. The authors compared ML models for multi and binary class combinations on the data set of Bot-IoT. Depending on these models they calculated the F1 score, recall, precision, and accuracy. The detection of attacks in FOG design was examined in [10], in which ML is compared with deep-learning neural networks working on an internet-available dataset.
Authors in [11] examined TCP SYN network attacks and authors in [12] introduced deep neural networks for attack detection in IoT systems. The self-adaptive identification method of the security index of the network was studied, performed risk assessment was conducted, and the system was mapped. Authors in [13] developed network NIDS based on the conception of DL. For attack detection, they implemented network intrusion detection system on fog node. Authors in [14] used a novel method that combines isolation forest and One Class Support Vector Machine (OCSVM) with an active learning method to detect attacks with no prior information. Authors in [15] used a two-stage approach combining a fast preprocessing or filtering method with a variation auto encoder using reconstruction probability. Authors in [16] performed a Distributed Denial of Service (DDoS) attack using the ping of death technique and detected it using RF algorithm by using the WEKA tool with classification accuracy of 99.76%. Authors in [17] proposed the detection of network dictionary attacks using a data set collected as flows based on a clustered graph. The results of the mentioned methods on the CAIDA 2007 data set give high accuracy for the model.

III. GAP ANALYSIS
These are some prefaced problems taken from earlier research.
• Worst performance of the detection of attacks on the fog layer.
• Feature selection decreasing the accuracy.
• Low accuracy of DoS, R2L, and U2R attack types.
• Execution of multiple classifier algorithms on reduced data sets • False positive rate and false negatives rate is still in doubt.

IV. A FRAMEWORK TO SOLVE ATTACK DETECTION IN IOT USING MACHINE LEARNING
The proposed model for this research work is an ordinary huge organization or a smart city going through an increasing variety of IoT-associated cyber threats, such as heavyobligation DDoS attacks, achieved with an enormous botnet, e.g. Mirai, which exploit default or weak passwords. The current research specializes in advanced attacks which can be primarily based on violations of organizational protection guidelines. Once completed, an attacker is permitted to take advantage of individuals who connect unauthorized styles of IoT devices to the smart town. The previous approaches have been used broadly because of their excessive detection accuracy and low fake alarms. However, they lack the capability of seizing novel attacks. On the other hand, anomaly detection detects new attacks, although it lacks accuracy. In both procedures, classical ML analysis has been used prominently. Popular devices gaining knowledge of algorithms are incapable to detect complex data breaches [18]. In this research, we examined different algorithms for the different sub-processes of the framework shown in Figure 2 [19].

A. Approaches to Solve Attack Detection using ML
There are six main approaches in ML: • Supervised learning: In this, the data should be labeled like feeding a model with multiple examples of files and decide whether they are malware or not. Based on this data labeling [20], the model could decide on extra data. It is also called the task driven approach. • Ensemble learning: It is an addition of label data like supervised learning while combining multiple models to solve the task.
• Unsupervised learning: In this learning, unlabeled data are used and the model marks them by itself based on the data properties. It is considered to be the more powerful and it usually finds anomalies in the data set [21]. This is also called the data-driven approach.
• Semi-super user learning: It tries to combine both supervised and unsupervised approaches when there is a data set with some labeled data [22].
• Reinforcement learning: This behavior should be used in a changing environment. It is also called the environment driven approach [23].
• Active learning: It works like a teacher who can help in correcting error and behavior in environmental changes [24]. It is a subclass of reinforcement learning.

B. Attack Detection using ML Methods
In this part, the attack detection problems are studied by statistical classification of measurements using the implementation of ML. The spam filter in cyber security separates spam from different communications services. Spam is apparently the leading ML method applied in information security. The supervised learning labeled data method is usually used for classification. In our research, we used the Gradient Boosted DT, SVM, and RF classifications and the results were compared.

1) Support Vector Machine
It is the most popular and widely recognized technique. It can be used for regression, but mostly it is used in classification algorithms. In SVM, we sketch data items by the point in an ndimensional area where n represents the considered features [25]. It creates a hyper plane and separates the data into classes [26].

2) Gradient Boosted Decision Tree
GBDT is an ensemble of DTs. GBDT is an ML algorithm which constructs vulnerable DTs through the boosting technique. For building the tree ensemble, we need to train over the algorithms on different samples. Unfortunately, we cannot train them on a single set. GBDT uses the present-day ensemble to predict the label of every instance, after which the results are compared with the accurate labeled data. It works on large datasets and has high predicting power [27].

3) Random Forest
RF [28] is based on random subspace, bagging, and uses CART DTs as base algorithm. It works on both regression and classification. The education is achieved in parallel. It injects randomness within the learning (testing and training), a process in which each tree isn't the same with the others. In predictions, each tree is combined, which reduces the variance of prediction and hence improves performance [29].

V. EXPERIMENTS AND RESULTS
In this section, the dataset, which is applied for the experiment and for testing results, is described along with the performance metrics used for result comparison and the recommended model is reviewed by applying various selections and classifications. Three ML algorithms were applied for the evaluation of the given proposed model [30].

A. Dataset
The NSL KDD dataset was used in this research. This dataset is available in CSV and JSON files. We can use this for the model and the evaluation phase. The dataset is modifiable, extensible, and reproducible [31].

B. Proposed Method
Our research is a novel combination of several independent ML algorithms. In our framework, the first step is the dataset collection and analysis. In this process, the data were collected and observed deeply to analyze the types of data. In the data preprocessing step, the data were cleaned, visualized, and feature engineering was applied along with implemented vectorizations. Hence, the data were converted into feature vectors [32]. After the analysis of the NSL-KDD dataset, the attacks can be categorized into four principal classes: • Unauthorized to remote (R2L) The details of each attack are shown in Figure 3. Our data is converted into feature vectors. The dataset is then split into 80% for training and 20% for testing sets (Tables  I-II). For the learning algorithm, the training data set was utilized and our final model was deployed using a boosting technique. Figure 4 shows the data distribution in testing and training subsets.

C. Algorithm
The algorithmic steps are mentioned below.
• Load the NSL-KDD data set.
• Apply the pre-processing technique.
• Divide into 80-20 ratios of testing and training datasets.
• Select feature selection vectors.
• The training dataset is given to the classifiers.
• The test data set is fed to the three selected classifiers for classification.

D. Classifiers and Training
For the model training, RF supervised ML algorithm was selected. The algorithms which combine DT with ensemble learning have several advantages, such as their need of only a few input parameters and their resistance to overfitting. The number of tree parameters is set to be 500. When the number of branches increases, the variance would be decreased without ensuing in bias. RF has changed into applied to traffic data sets, which include in-network traffics misuse detection and Command and Control (C&C) IoT attack detection from traffic flow-base [33].

E. Performance Metrics
In the suggested framework, four performance metrics were considered: Accuracy (A), Training Time (TT), which is the total time to train a classifier, Specificity (S), and Prediction Time (PT), which is the total time which an algorithm takes to predict all the data. TP (true positive) represents the correct identification of an attack, FP (false positive) represents the incorrect identified attacks, TN (true negative) represents the correctly identified normal connections, and FN (false negative) represents the number of attacks that were not correctly identified [34]: False Negative = Π Accuracy shows how accurately the algorithm can detect the normal and attack connections: Specificity is used for measuring the negatives which are correctly identified: Roc gives a graphical representation that compiles the review of a classifier's overall thresholds on a diagnostic criterion.
A threshold is the expected value for all the predicted classes. The ROC curve can be drawn using binary classes. The values of the TPR and FPR range from 0 to 1.

F. Experimental Setup
The experiments were conducted on a Lenovo Thinkpad system with Ubuntu 20.04 operating system, 4500U Processor, 8GB memory, integrated AMD (attached NVIDIA) graphic card which was used for training the dataset. During data preprocessing, cleaning, and feature selection, Numpy and Pandas libraries were used.

G. Result Analysis
As mentioned above, three ML algorithms were applied to the NSL-KDD dataset, namely RF, GDBT, and SVM. From the cross-validation, RF has performed best in terms of testing and training accuracy. The results show that the RF obtained the highest accuracy on fog layer which is 85.34%. The obtained accuracy of SVM and GDBT was 32.38% and 78.01% respectively, as shown in Table III. In terms of specificity, GDBT algorithm performed best with 97.02%. The specificity achieved by SVM and RF was 2.02% and 95.09% respectively. Table III shows the result of the performance evaluation of the mentioned algorithms including A, TT, PT, and S.

VI. CONCLUSION
Through the obtained results, it can be confirmed that supervised ML can be used to analyze traffic data and accurately expose the data that are maliciously traveling over IoT devices. To identify that traffic accurately, NSL KDD dataset is critically evaluated by making use of ML techniques. This dataset is used for the comparison of the given framework by employing functions such as selection and classification. Overall, the RF algorithm provided the best accuracy of 85.34% on the fog layer in comparison with the other two learning algorithms. In the future, it is planned to analyze different IoT devices, explore further technologies and, testing with different data of IoT devices infected by malware and cyber-attacks.