A new Intrusion Detection System for Secured IoT/IIoT Networks based on LGBM

In this study, a multi-class classification method was applied using various datasets from ToN_IoT and the Light Gradient Boosting Machine (LGBM) classifier, showing that it is an effective method in preventing cyber attacks on IoT/IIoT networks


Graphical/Tabular Abstract (Grafik Özet)
In this study, a multi-class classification method was applied using various datasets from ToN_IoT

INTRODUCTION (GİRİŞ)
Internet of Things (IoT) is a developing technology that is used in many areas such as smart transportation, smart health services, smart home, smart city. The Internet of Medical Things (IoMT), which is the adaptation of IoT to the health sector, and the Industrial Internet of Things (IIoT), which is the adaptation to industrial areas, have created a great revolution. IoT networks, which are formed as a result of connecting many smart devices such as sensors, actuators and smart modules, provide great convenience to users [1,2]. IoT networks and devices used in many areas such as SCADA systems, healthcare services, transportation services are vulnerable to cyber attacks [3].
Intrusion detection in IoT networks is one of the most important problems today. New methods based on artificial intelligence methods are being X developed both in the literature and in the industry for intrusion detection on IoT networks [4]. Alsaedi et al. [5], proposed the ToN_IoT dataset for intrusion detection in IoT and IIoT networks. Their proposed dataset includes telemetry data, system logs and network traffic in IoT and IIoT networks. In addition, they used machine learning and deep learning methods to measure the performance of the ToN_IoT dataset they proposed in their study. Essop et al. [6], produced a new IoT/IIoT dataset using the Cooja simulator. Zachos et al. [7], proposed the Anomaly-based Intrusion Detection Systems (AIDS) system to detect anomalies in IoMT networks. In their proposed method, they used machine learning methods to detect attacks in IoMT networks.
Increasing the performance of machine learning methods, which are frequently used in the intrusion detection on IoT networks, has become an important issue today. Weinger et al. [8], applied their proposed data augmentation method to DS2OS and ToN_IoT datasets for intrusion detection on IoT networks. Bui et al. [9], established a toolchain called Configuration, REproduction, Multi-dataset, and Evaluation (CREME) to increase the intrusion detection capabilities of IDS, and measured both a new dataset and the quality of the dataset they created. Haider et al. [10], proposed the Fuzzy Gaussian Mixture-based Correntropy-Host Anomaly Detection Systems (FGMC-HADS) method based on the Fuzzy Rough Attribute Reduction (FRAR) method and the Gaussian Mixture Model (GMM). They used NGIDS-DS, KDD-98 and ToN_IoT Linux datasets to measure the intrusion detection capability of the proposed method.
In this study, a Light Gradient Boosting Machine (LGBM) based system has been developed that detects cyber attacks on IoT networks with high accuracy. In addition, the ToN_IoT dataset, which comprehensively addresses attacks on today's IoT networks, was used in the study. Other parts of the research are presented as follows. In the second part of the study, the material and method related to the proposed method are discussed. In the third part of the study, the performances of the proposed method in all datasets are presented in a table. Performance analysis, conclusion and future studies are examined in section 4 and 5, respectively.

2.MATERIALS AND METHODS (MATERYAL VE METOD)
In this study, an advanced intrusion detection system is proposed for the detection of attacks on IoT networks. In summary, the proposed method consists of data preprocessing, training and test data separation and classification. The flow chart of the proposed method is given in Figure 1.
According to the flowchart given in Figure 1, the proposed method consists of the following steps.
1. The IoT sensor datasets taken from the ToN_IoT data sets were first standardized by label encoding and NaN value check. If the datasets have NaN values, the NaN values are replaced by the column averages. 2. After the datasets were set to a certain standard, the datasets were separated as 70% training and 30% test data. 3. At the last stage, the datasets were classified with the proposed LGBM classifier. If the desired accuracy rate is achieved as a result of the classification, the process is finished. After the label encoding step, the classes under the "type" label in each dataset were encoded as in Table 1.
The amount of data received for each attack scenario is also listed in Table 1. The data amounts of the attack scenarios in each dataset are visualized in Figure 2.

Dataset (Veri Seti)
In the study, the ToN_IoT dataset was used to reflect the attacks in a real IoT network. ToN_IoT datasets were collected from data of IoT/IIoT networks. The dataset is generated from operating systems logs and IoT network traffic. The datasets were obtained from a realistic UNSW Canberra IoT lab consisting of cloud layer, edge layer and fog layer. In the datasets obtained, there is the label "Label", which indicates whether a feature is normal or an attack, and the "type" label, which indicates the subclasses of the attacks. Scanning, DoS, DDoS, ransomware, backdoor, data injection, Cross-site Scripting (XSS), password cracking attack and Man-in-The-Middle (MITM) attacks were made under the "type" tag for multi-classification [5,11,12]. In this study, fridge, garage door, gps tracker, motion light, modbus, thermostat and weather datasets obtained from IoT sensor data were studied. Unnecessary "date" and "time" features were removed from all datasets.    LGBM Classifier (LGBM Sınıflandırıcı) The LGBM classifier, developed to improve the training time performance of the XGBoost algorithm, uses a leaf-wise tree growth strategy. The leaf-wise growth method used by LGBM is summarized in Figure 3 [13,14].
In the Leaf-wise growth strategy shown in Figure 3, the decision trees try to open the tree vertically as far as they can go, when the maximum depth is achieved, it starts to open the other branch vertically from the top. In this study, LGBM classifier was used for classification due to its high performance.

BULGULAR)
In the study, sensor datasets from ToN_IoT datasets are discussed. Multi-class classification was made according to the "type" parameter in the considered datasets. The codes of the proposed method were written using the scikit-learn, matplotlib libraries in Python 3.7 environment. Accuracy, precision, recall, F-Score values were calculated for each dataset in the study. Performance metrics calculated as equations 1, 2, 3 and 4 respectively [11,15]. The results obtained are given in Table 2.
Confusion matrices obtained for all datasets are given in Figure 4. The numbers in the matrices show the encoded classes in Table 1. (TARTIŞMA) In this section, the performance of the proposed method in the study is compared with the existing studies in the literature. The comparative analysis made is given in Table 3. Only the Accuracy values are compared in the table.     (SONUÇLAR) In this study, a method has been proposed for the detection of cyber attacks on IoT/IIoT networks that we encounter in almost every field. The proposed method has been applied to the ToN_IoT dataset, which represents a realistic IoT/IIoT network. In future studies, it is planned to obtain a new IoT dataset using the Cooja simulator in the first stage.

5.CONCLUSIONS
In the second stage, it is aimed to establish a new IoT laboratory and to create a new IDS dataset for IoT networks by applying various attack scenarios to this IoT laboratory to be established.

DECLARATION OF ETHICAL STANDARDS
(ETİK STANDARTLARIN BEYANI) The author of this article declares that the materials and methods they use in their work do not require ethical committee approval and/or legal-specific permission.

KATKILARI)
İlhan Fırat KILINÇER: He conducted the experiments, analyzed the results and performed the writing process.
Oğuzhan KATAR: He conducted the experiments, analyzed the results and performed the writing process.