Design of an Intrusion Detection Model for IoT-Enabled Smart Home

Machine learning (ML) provides effective solutions to develop efficient intrusion detection system (IDS) for various environments. In the present paper, a diversified study of various ensemble machine learning (ML) algorithms has been carried out to propose design of an effective and time-efficient IDS for Internet of Things (IoT) enabled environment. In this paper, data captured from network traffic and real-time sensors of the IoT-enabled smart environment has been analyzed to classify and predict various types of network attacks. The performance of Logistic Regression, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine classifiers have been benchmarked using an open-source largely imbalanced dataset ‘DS2OS’ that consists of ‘normal’ and ‘anomalous’ network traffic. An intrusion detection model “LGB-IDS” has been proposed using the LGBM library of ML after validating its superiority over other algorithms using ensemble techniques and on the basis of majority voting. The performance of the proposed intrusion detection system is suitably validated using certain performance metrics of machine learning such as train and test accuracy, time efficiency, error-rate, true-positive rate (TPR), and false-negative rate (FNR). The experimental results reveal that XGB and LGBM have almost equal accuracy, but the time efficiency of LGBM is much better than RF, and XGB classifiers. The main objective of the present paper is to propose a design of an efficient intrusion detection model with high accuracy, better time efficiency, and reduced false alarm rate. The experimental results show that the proposed model achieves an accuracy of 99.92% and the time efficiency comes to be much higher than other prevalent algorithms-based models. The threat detection rate is greater than 90% and less than 100%. Time complexity of LGBM is also very much low as compared to other ML algorithms.

the detection rate of threats is extremely lower than the threat increment rate. Most of the connected IoT devices around the world generally exhibit no symptoms of attacks. Hence, people don't realize the presence of anomalous events being launched in their smart network. IoT systems can be victimized due to certain prevalent vulnerabilities. An intrusion detection system (IDS) is one of the crucial components of the cybercrime investigation process [3]. An IDS model scrutinizes the system activities to predict and detect attack patterns and to monitor every user's behavior in order to prevent any kind of security violations.
A well-developed intrusion detection system can efficiently deal with various forms of cyber-attacks triggered in a smart home environment. The present paper introduces a fast and efficient intrusion detection model that has been evaluated on an IoT dataset 'DS2OS' [4]. This dataset has been created in a smart home environment. The dataset accommodates traces captured from various smart home devices namely light controllers, batteries, washing machines, thermometers, smartphones, smart doors, and movement sensors. The dataset also presents the communication between various IoT devices. Traced patterns of network traffic are able to detect various attack ('DoS attack', 'data probing', 'malitious control', 'malitious operation, 'scan' 'spying', 'wrong Setup') behaviors as well as 'normal' behavior.
Machine learning is an extensively used technique to harmonize IDS with intelligent information systems, to detect various types of malicious activities in a smart network [5]. Network traffic passing through the nodes connected in IoT infrastructure needs to be discriminated between benign and malicious traffic. In most cases, the majority of network traffic showcases normal (benign) behavior. If malicious traffic also shows normal behavior, it could be more dangerous and may lead to the problem of high attack detection rate with low false alarm rate (FAR) [6].
The performance of ML algorithms can be strengthened through various problem optimization techniques. The ensemble technique of ML is an advanced convincing tool that can upgrade the performance of existing models [7] designed for application forecasting in different application areas. In ML, the ensemble-based learning methods perform classification by creating and integrating multiple models to solve a problem [8]. Ensemble-based ML models combine multiple base models and provide better prediction performance than the conventional classification models.
Besides accuracy, latency and true positive rate must also be considered to be essential metrics for evaluating any proposed predictive model. The authors in the present paper aim to develop a model for intrusion detection for IoT systems using light gradient boosting method (LGBM), which is a fast gradient boosting ensemble library of supervised machine learning [9]. This model aims to identify and classify various types of attacks existing in an IoT network. The following are the major research contributions of the present paper: • To study various machine learning classifiers and to identify the best classifier using the 'Voting' method of ensemble technique.
• To design a time-efficient realistic intrusion detection system using an ML-based light gradient boosting machine (LGBM) ensemble classifier by predicting network traffic behavior in an IoT-enabled smart home environment using benchmark dataset 'DS2OS'.
• To remove irrelevant and repetitive features using dimensionality reduction and feature reducing approaches.
• To evaluate and compare the performance of the proposed intrusion detection model with related approaches of state-of-art in terms of train and test accuracy, TPR, FPR, error-rate, and above all time efficiency.
Gradient boosting-based machine learning (ML) ensemble algorithm is a good approach to perform classification for predicting the behavior of network traffic. The information captured from connected nodes, real-time sensors, and network traffic can become a source of evidence that may help cyber forensic investigators to identify the sources of threats. In the present paper, the proposed intrusion detection model has been trained and tested using the 'DS2OS' dataset which is also known as the 'mainSimulationAc-cessTraces' IoT dataset [4]. The metrics utilized to assess the performance of the proposed model are training and testing accuracy, prediction error, time efficiency, true positive rate, false positive rate, etc. The remaining paper is organized as follows. Section II presents several research and state-ofart performed by different researchers. Section III presents a detailed discussion of certain machine learning multiclass classifiers used to achieve the objective. Section IV presents the research methodology of the proposed model that promises to conduct intrusion detection in an IoT-based smart home environment. Section V presents the construction methodology of ensemble-based IDS. Section VI gives the result discussion of various operations performed for the proposed intrusion detection model. Finally, section VII presents the conclusion of the study and the research work.

II. RELATED WORK
So far, several pieces of research have been conducted to resolve the problems related to intrusion detection in various application areas [10]. The models built using machine learning can efficiently deal with large and complex data. Several techniques of machine learning are used for achieving remarkable results. The literature survey can be further classified on the basis of the nature of selection methods and single-class and multiclass classification methods.
In 2020, Htwe et al. [11] applied the classification and regression tree (CART) method for their proposed intrusion detection architecture which was implied on an IDS dataset 'N-BaIoT'. In this paper, the authors state that the results obtained by their proposed classifier are better than Naïve Bayes classifiers. But the main shortcoming of their proposed 52510 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. system is that the authors evaluated the model only on one metric i.e., accuracy.
In 2020, Zhou et al. [12] proposed an intelligent IDS framework using feature-selection and ensemble learning mechanisms. They also proposed the CFS-BA heuristic algorithm for dimensionality reduction in order to select the most relevant and distinct subsets and show correlations between features. The proposed ensemble approach was based on c4.5 and RF by Penalizing Attributes algorithms with an average of probability rule. The probability distributions of base learners were incorporated using voting techniques for better performance of attack recognition. The results of the proposed system were evaluated using NSL-KDD, AWID, and CIC-IDS2017 datasets. But in this proposed work, the authors' concern about time efficiency was missing.
In 2020, a comprehensive study was carried out by Verma et al. [13] on ML classifiers for the development of anomaly-based IDS in order to secure IoT systems against DoS attacks. The researchers evaluated the reliability of the anomaly-based intrusion detection model by evaluating on several existing IoT datasets including CIDDS-001, NSL-KDD, and UNSW-NB15. Raspberry Pi was utilized to perform a statistical significance test in order to examine the response time of classifiers as per the requirement of applications. In this study, authors compared several ML algorithms of single classifiers CART and MLP and ensemble classifiers RF, Adaboost, Gradient Boosted Machine, Extreme Gradient Boosting and Extremely Randomized Trees to identify an optimal model for IDS.
In 2021, Hadem et al. [14] proposed an IDS using SVM for software-defined networking. The main focus of this approach is maximizing the detection accuracy with less overhead of computation and improving memory saving. The accuracy performance of the proposed scheme has been recorded at 95.98% using the 'NSL-KDD' full set dataset while on the selected features dataset it has been recorded at 87.74%.
In 2021, Kumar et al. [15] proposed a cyber-attack detection system using random forest, KNN, XGBoost algorithms. The proposed work has been evaluated on BoT-IoT and DS2OS datasets. The proposed work claims to achieve 90% to 100% detection rate. But it is not possible to detect every threat. There might be some hidden threats which can't be recognized. In other words, there is always the possibility of false alarms. XGBoost is a leading method that is widely used for designing IDS model due to high accuracy. But in this paper too authors ignored the latency factor. Random forest, KNN and XGB are highly time consuming.
In 2021, Huˇc et al. [16] proposed an anomaly detection model for edge devices. In the present paper, authors have analyzed different machine learning algorithms and evaluated the performance of the proposed intrusion detection model on a largely imbalanced dataset 'DS2OS'. In this study, the entire dataset was divided into training (80%) and test (20%) datasets, and then a new smaller dataset was created by selecting the samples randomly for balancing the dataset. Models based on different ML algorithms (LR, DT, SVM, RF) have been compared with ANN algorithm-based models. The performance of the models has been evaluated only in terms of accuracy. Classification results have been presented using a confusion matrix. Besides this, smaller balanced training datasets have been determined with clustering and performance has been compared with bigger imbalanced datasets.
In 2022, Devprasad et al. [17] proposed a context-adaptive classification mechanism utilizing hierarchy-based chisquare and bat algorithms. The algorithm was deployed and tested on 'NSL-KDD' and 'UNSW-NB15' datasets. For experimentation, authors used Decision tree (DT) and SVM algorithms which were implemented with NSL-KDD and UNSW-NB15 datasets. The ensemble classifier returns 89.43% prediction accuracy and 3.215% FPR. Accuracy rate is not so good and they did not concern about execution time.
In 2022, Gupta et al. [18] proposed cost-sensitive networkbased IDS for handling class imbalance using deep learning and ensemble algorithms. This model was evaluated on CIDDS-001, NSL-KDD, and CICIDS2017. The proposed system focused on the identification of new and infrequent attacks in computer networks with a high detection rate. The entire work is divided into three stages. The first stage utilizes a deep neural network to separate normal and suspicious network traffic. In the second stage, XGB was deployed in the second stage for the classification of major attacks and in the third stage, Random Forest was deployed for the classification of minor attacks. The proposed model returned 99% accuracy for NSL-KDD, 96 % accuracy for CIDDS-001, and 92% accuracy for CIDIDS2017.
In 2022, Çetin [19] proposed a model for imbalanced network attack traces. The performance of the proposed model has been evaluated on some imbalanced datasets namely, DARPA98, NSL-KDD, KDD99, USWN-NB15, and Caida DDoS. Besides being imbalanced, these datasets are also highly dimensional. The proposed model was targeted to overcome high dimensionality and obtain high accuracy. The model was also tested at the CICIDS2017 dataset using a genetic algorithm. The study made performance evaluations on F1-score and G-mean metrics in order to select the most effective classifiers. Authors in this paper also did not concern about time.
In 2022, Xu and Fan [20] proposed an IDS that was based on XGB and logarithmic auto-encoder. The evaluation of the proposed model has been performed on the CICIDS2017 and UNSW-NB15 datasets. The accuracy score recorded on CICIDS2017 has been recorded as 99.92% and on UNSW-NB15 it has been recorded as 95.11%. In this study, the researchers also evaluated the run-time performance of different classifiers.
In 2022, Saheed et al. [21] also proposed an IDS using a machine learning algorithm for detecting network attacks in an IoT environment. The min-max scheme has been used in the first step for normalization on the UNSW-NB15 dataset and dimensionality reduction has been used in the next step using PCA. The dataset has been trained using various machine learning classifiers including XGBoost, CatBoost, KNN, SVM, and naïve Bayes algorithms. The experimental analysis has been done using the BoT-IoT dataset and compared the results obtained on the UNSW-NB15 dataset. The accuracy score of boosting algorithms (XGB and CatBoost) was higher than other ML classifiers. Due to the substantial growth of Industrial IoT (IIoT), this area creates the highest opportunities for cybercriminals to perform easy attacks.
In 2022, Le et al. [22] proposed multiclass classificationbased IDS for an imbalanced dataset of IIoT. ML-based XGBoost classifier was used to detect abnormal behavior of network traffic in order to protect the network from cyber-attacks of similar behavior. Two modern IIoT datasets, TON_IoT, and X-IIoTDS have been used in this study, reflecting the signs of modern network traffic. The proposed model outperforms with a good attack detection rate that has been recorded as 99.9% and 99.87%.
In 2023, Mohamed et al. [23] also proposed an IDS using fog and cloud computing. Authors used Gated Recurrent Unit and Bidirectional LSTM to recognize the existence of network attacks. Until the emergence of IoT, many expert digital forensic approaches were developed for botnet detection in computers and similar systems. Authors calculated the run-time but the accuracy calculated by model was only 96%.
In 2023, Awajan [24] proposed an IDS for IoT devices using deep learning. Five classes of intrusions have been addressed in the paper. The model based on deep neural network is highly dependent on considered dataset. However, the IDS model proposed in the present paper addresses 7 attack classes with one benign class. The average accuracy is 93.74% while in the present paper it is more than 96%. One more drawback of the paper is that it requires retraining for every new IoT network. However, the accuracy rate of the model was not so good.
In 2023, Sharma et al. [25] also proposed a network IDS for IoT attacks. This model was also designed using deep learning. Authors used filter-based feature selection which has been implanted by dropping highly correlated features. However, the accuracy achieved by this model is only 84%. By resolving data unbalancing issues, the accuracy achieved was 91%. However, the accuracy rate of the model was also not so good.
In the present paper, the proposed intrusion detection approach deeply focuses on the time efficiency of related work. The present research work has been conducted as a part of a cybercrime investigation in an IoT environment. Reducing the run time can lead to an increase in the efficiency of the rest of the utilized resources. The comparative analysis of the models reviewed in this paper have been compared with the proposed model that can be seen in table 10 given at the bottom of this paper.

III. CLASSIFICATION ALGORITHMS
There exist many decision-tree-based classification and predictive machine learning algorithms which can be classified as single and ensemble classifiers. In the present paper, the authors will perform a simulation on a single class classifier and three linearly separable ensemble classifiers. Logistic regression (LR) [26] is a type of single classifier and Random Forest (RF), gradient boosting (GB), XGboost (XGB), and light gradient boosting machine (LGBM) are ensemble classifiers. An ensemble learning classifier refers to a classification technique that combines multiple base models and provides results as a single optimum classification model. Utami et al. [27] performed a systematic comparison of single and ensemble classifiers in order to characterize their performance. Sub-classifiers are derived from randomly selected samples (Sub-features) which are further aggregated in order to make better decisions. Figure 1 illustrates the workflow of single and ensemble classifiers. In this paper, the performance of various ML classifiers has been analyzed to predict intrusions in the IoT ecosystem.

A. SINGLE CLASSIFIER
This section briefly discusses a single classifier in which decision-making is conducted on a single set of features. The 52512 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
parameters of the single classifier-based resulting model are estimated using linear optimization. The performance of the single classifier can be evaluated using statistical algorithms.

1) LOGISTIC REGRESSION
Logistic regression (LR) is also called the sigmoid function. It is a statistical function that was initially developed to determine the properties of the growth of the population in an environment. It is also called as logit () function. It is a supervised ML classification algorithm that is commonly used to calculate the probability of a malicious event. It is used to predict the categorical or discrete value. By default, this is called binary logistic regression. Another type of logistic regression is multinomial logistic regression, which is used for solving the problems of multi-class classification [28]. However, multinomial logistic regression is an extension of traditional logistic regression [26], but this method can handle more than two feasible discrete outcomes. LR is a precise model that anticipates the quality of the dataset. Suppose X is a given set of distinct features. Using logistic regression, the probability estimation can be done using the mathematical formula given in equation 1.

B. ENSEMBLE CLASSIFIER
Ensemble classifiers are expected to provide better prediction outcomes as compared to single classifiers like logistic regression. Rather than depending on a single decision tree, ensemble classifiers perform predictions by aggregating the outcomes of multiple contributing trees which are constructed using given data points. In the ensemble learning approach, the prediction depends upon the combination of important features of two or more contributing models. The ensemble learning approach attempts to extract the harmonizing information from its contributing models. There are mainly two types of ensemble methods that are bagging and boosting [29]. Bagging algorithms reduce the variance while boosting algorithms reduce the bias. Using bagging methods, all the models are constructed simultaneously while using boosting methods, once the first model is built then only the second model is built after knowing the errors of the first model. The aim of an ensemble model is to reduce errors. The proposed model in the present paper performs the classification and prediction using different machine learning ensemble algorithms which will be further analyzed on the traffic patterns collected using a simulator based on different sensorenabled devices used in the IoT home environment. The model has been designed on the basis of optimal outcomes obtained from different classification algorithms which have been evaluated on the 'DS2OS' training dataset. The network has been classified on the basis of behavioral patterns of network traffic.

1) RANDOM FOREST
Random Forest (RF) is a bagging ensemble technique of ML in which classification relies on multiple decision trees which collectively construct a forest. RF contains multiple decision trees constructed using different subsets of a specified dataset in order to improve predictive accuracy. Instead of depending on a single decision tree, RF captures the outcomes of prediction of every tree. The highest accuracy depends on the largest number of trees which means the accuracy increases with the increase in the number of trees. Unlike many other models of machine learning, it can test and train on the network traffic of real-time datasets. The following conditions need to be satisfied before using this process: • There must be actual values in the dataset against features to obtain accurate results instead of guessed results.
• There must be a low correlation among the predictions of each tree. The main properties of RF are independent fast learning over datasets of distinct nature. It is an ensemble classifier that is composed of multiple decision trees which are generated using two distinct randomization sources [30]. At the split of every node, a randomly selected subset is chosen as the input variable to find the best split. But due to multiple trees, the complexity of the RF may become very high. It may cause more power utilization.
Random Forest Algorithm: 1) Select the K number of random data points.
2) Construct a decision tree using the chosen data points.
3) Recognize the behavior of a tree and give it a class label. 4) Repeat steps 1, 2 & 3 until n number of trees. 5) Find the prediction of every decision tree and the winner decided using the 'majority voting' is considered as the final output class. VOLUME 11, 2023 The basic structure of a random forest classifier with a majority voting' concept is shown in Figure 2. This structure simulates the process to predict the behavior of network traffic. Consider a given dataset 'D' that contains the traffic logs which are stored in the '.CSV' file. The individual record in the dataset can be considered as a data point or a subset using which multiple decision trees have been constructed (which collectively make a structure like a forest). The arrows and green nodes present the direction of the level-wise tree growth. Different decision trees show different types of output depending on their network traffic behavior. Random forest operates two types of decision trees. Decision trees having malicious traffic nodes are labeled as the 'anomaly class', while others are labeled as the 'normal class'. Applying the majority voting operation to decision trees, the best result can be obtained. Suppose, a random forest is operating an 'n' number of decision trees, where the value of n >= 2. There are n-1 decision trees with anomalous behavior and one class with normal behavior. Hence, the majority of voting goes to the anomaly class. In the present scenario, the authors are working with multiple decision trees. There may be a possibility of less over-fitting in prediction through voting. The random forest can efficiently run with higher accuracy on larger datasets, with high dimensionality. But it may not return good results for small and low-dimensional datasets. Random forest classifier works like a black box that cannot be fully controlled by users as well as it does not provide complete visibility. Its computations may also become far more complex; hence it is not easily interpretable.

2) GRADIENT BOOSTING
Gradient Boosting (GB) is also a widely used decision tree classifier algorithm that is also known as Gradient Boosting Decision Tree (GBDT). Boosting is a technique to integrate multiple weak learners (base classifiers) to construct a strong learner using certain machine learning algorithms [31]. GB classifiers are used to analyze the abnormal behavior of devices in different technological scenarios [26].
It is supported by strong hypothesis results which describe how the powerful predictors can be constructed by integrating multiple base models or through a greedy approach that correlates to gradient descent in a function space. GB-based feature selection enhances the detection rate as well as execution speed [32]. In the GB approach, the classification and prediction are performed on the basis of residuals obtained from previous iterations. The performance can be improved by reducing the over-fitting. The model is computed against classification using the residuals obtained from previous iterations. The model prepares a strong classifier using an ensemble technique on many weak classifiers. The model is optimized using the shrinkage process that means minimizing the loss function L(ϕ) and constructing additive approximation F * (X). Residuals are computed by calculating the loss function L(ϕ) that can be optimized by gradient descent (GD). The mathematical formula for additive approximation of weighted sum is presented in equation 2.
Here, ρ l is used for the weight of lth function hl(X). In each iteration of approximation, the constant approximation F0(X) is computed. Due to iteration, this model is a type of ensemble classifier. However, there are certain major drawbacks in the core gradient boosting model such as high-power consumption and much training time. It also gives the overfitting problem that can be reduced by proper regularization of hyper-parameters.

3) EXTREME GRADIENT BOOSTING
Extreme gradient boosting (XGB) is an advanced implementation of the gradient boosting (GB) that is created for improving performance such as better accuracy and reduced false alarm rate [33]. XGB helps in designing a stronger classification model that enables the classification of the data more accurately during entering into a network. XGB is one of the best promising boosting ensemble approaches which come with competitive outcomes. XGB can effectively deal with over-fitting challenges when the system is flooded with data. In this case, the classifier must be faster to adapt to such a large number of data entries. XGB is empowered by tuning the maximum number of hyper-parameters regularization. It is a decision-tree-based gradient-boosting algorithm of ML that can upgrade the efficiency, accuracy, and attainability of ensemble-based IDS by tuning the hyperparameters. It can steadily handle the bias-variance trade-offs. High memory usage and slow running speed is the main drawback of this model. Overfitting in boosting methods can be reduced by carefully tuning the hyper-parameters [34]. XGBoost is calculated using the mathematical formula given in equation 3.
Here, X is the set of inputs and W represents the weights for respective inputs. F (X, W) is the target model that is to be obtained using mathematical computation. Here, hl is a single tree. α l is the weight for L number of trees. The model is optimized by minimizing the loss function. The model constructed using the XGB algorithm provides high training speed due to the ability of a large number of hyperparameters. But the major drawback of this model is high cost, run-time, memory usage, and high-power consumption due to high training time. This model also consists of interpretability problems.

4) LIGHT GRADIENT BOOSTING MACHINE (LGBM)
Light Gradient Boosting is a histogram-based decision tree algorithm that improves the efficiency of the model as well as reduces the execution time and memory usage of a machine.
LGBM is greatly optimized over other boosting ensemble decision tree algorithms [35]. It is a faster, more distributed, more powerful, and highly improved learning algorithm.
LGBM can handle large data traffic efficiently [36]. Like many other boosting algorithms, LGBM uses a Histogrambased algorithm and a pre-sorted algorithm for decision tree learning and computing the superlative split [37].
LGBM makes use of two new mechanisms which are Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). GOSS down-samples the instances, based on the size of the gradients to segregate the data samples for locating a split value. Small gradient samples are discarded and large gradient samples are targeted for the model. Samples with small gradients are well-trained and samples with large gradients are undertrained. This method provides more accuracy as compared to uniform random sampling. On the other hand, EFB overcomes the limitation of traditional histogram-based algorithms. The LGBM algorithm expands leaf-wise (bestfirst) while other decision trees grow level-wise (depth-first).
Leaf-wise growth of the decision tree is shown in figure 3. In this figure, green nodes represent the point at which tree growth takes place. Delta value selected for leaf-wise expansion should be maximum. LGBM can provide solutions for several ML problems like classification, regression, and decisionmaking [38]. Leaf-wise tree growth reduces the loss or errors than level-wise tree growth.

5) LGBM METHODS
Here, an analysis of LGBM is obtained using variance gain at splitting features with the help of GOSS and EFB techniques. It provides the following advantages: • It outperforms in terms of time-efficiency i.e., much better that many other classification algorithms.
• It provides good training and test accuracy.
• Besides classification, it can also be used for regression.
• The overfitting can be reduced by setting a suitable value for the 'max_depth' hyperparameter.

Gradient-based One-Side Sampling (GOSS) for LGBM
The GOSS method is developed by modifying the Gradient boosting method that focuses on those training samples which produce a larger gradient. The gradients used in this method speed up the learning by reducing the computational complexity of the learning method. A significant proportion of the samples with smaller gradients is excluded and the samples with larger gradients are used for information gain. GOSS provides better accurate estimation with smaller data sizes.
Suppose, given a training dataset with m samples where X = {X 1 , X 2 , X 3 , . . . ., X m } and X i is the vector matrix with dimension 'd'. It grows in horizontal space. In a gradient boosting iteration, the loss functions with negative gradients are signified as {g 1 , g 2 , g 3 ,.., g m }. The training samples are graded in descending manner in line with their gradient-based absolute values (g i ). The sample subset T is obtained by keeping ((top-x) * 100%) samples with large gradients. The remaining T inverse consisting of ((1-x) * 100%) samples with small gradients are placed in a subset U.
The split of samples can be calculated according to the estimated variance gain 'Vector V j (d) ′ applied to the subsets T and U, using the mathematical formula shown in equation This equation also presents the LGBM using GOSS analysis [39]. Here, T a = {X i ∈ T: Top T% instances with larger gradients are secured to design the subset of samples T. The remaining (1-T)% samples with smaller gradients are used to construct the subsets of samples U. Finally, samples are divided on the bases of variance gain V * j . Here, d is the point of dataset partitioning in a certain dimension to gain optimal gain invariance. In equation 4, the coefficient (1-t)/u has been used to normalize the total number of gradient sums over U a . Each feature in X i has been used to compute the training data splits, having all the trees. Here, GOSS has been used on a subset with smaller samples to estimate the cost reduction by determining the split point [35].
Exclusive Feature Bundling (EFB) for LGBM EFB is a mechanism used for bundling sparse mutually exclusive features. High-dimensionality data is generally sparse data that contains the probability to design an approximately lossless mechanism of feature reduction. A sparse space contains numerous features with mutual exclusion. They do not accept nonzero values concurrently. The exclusive features are selected and put into a bundle that is called exclusive bundling. It is a sort of automatic feature selection. The complexity of the histogram also changes from O (#data * #feature) to O(#data * #bundle). It improves the framework training speed without reducing the accuracy [37].
Ordinary decision trees work with discrete data. If the given data is in continuous form, that needs to be transformed into a discrete form to fit into a decision tree. The main challenge is to identify the optimal splitting points for selecting features. A small number of split points may lead to a loss of information but can also reduce overfitting. In contrast, a large number of splitting points may be lossless but may increase the training time. This algorithm provides a solution in which the data points are iterated many times until the optimal split point is accessed. It provides more information gain with reduced variance. The LGBM approach divides the data into a fixed number of split points of uniform length.

IV. RESEARCH METHODOLOGY
This section outlines the design, analysis, and implementation of the proposed model. Initially, all the project dependencies need to be downloaded and the required libraries need to be installed on the system. The entire research has been divided into the following phases:

A. MODEL DESIGNING
This section presents the design of an intrusion detection model that characterizes the normal (benign) and intrusive behavior of network traffic data obtained from real-time IoT sensors, IoT devices, and networks. This phase is composed of three sub-phases. Data used in this paper is openly available and can be extracted from Kaggle [4]. The algorithm of the entire intrusion detection modeling process is explained in Algorithm 1.
The present paper aims to build an intrusion detection model to identify attacks in IoT-enabled smart environments. This model has been divided into three phases: Data preparation and preprocessing, classifier training, and decisionmaking. The entire process is executed on a feasible dataset. In the present research work, the 'DS2OS' dataset has been taken for the implementation of the proposed work. This dataset contains the traces collected from different sensorenabled smart devices which have been configured in a smart home environment. The detailed construction process of the proposed intrusion detection model is shown in figure 6.

1) DATA COLLECTION (DATASET EXPLORATION AND EXTRACTION)
The first phase is the designing phase which includes dataset exploration and extraction. This phase is followed by data Divide the dataset D into Train data D R and Test data D T Train S samples of input data using learning algorithm and Test (S-1) samples using the same algorithm until optimal prediction Repeat until all considered algorithms are analysed End For Ensemble and compare prediction results obtained from training and testing Step 5: Final Model Prepare final IDS model on the basis of final results and Majority Voting End preprocessing and preparation. Initially, during dataset exploration, appropriate datasets are considered and explored according to the interest of researchers. Many related datasets are publicly available, out of which some are either outdated or unfeasible for current research environments. The required dataset needs to be made available on the system where the whole process is to be accomplished. Till now, many datasets have been generated for training and testing the data of IoT network traffic. The classifiers have been trained and tested 52516 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. using the 'DS2OS' dataset that contains network traffic data with normal and anomalous behavior [4]. Traces contained in the 'DS2OS' dataset have been captured from various smart devices installed in different locations in IoT home environments. The data has been captured using four simulators, designed for smart home environments with different types of services. Figure 4 visualizes the source location, source type and figure 5 shows destination location, and destination type with respect to their number of occurrences. The 'DS2OS' dataset contains a total of 13 features. Figure 4 visualizes the source location, source type whereas figure 5 shows destination location, and destination type with respect to their number of occurrences. The 'DS2OS' dataset contains a total of 13 features.
Here, sources are the sources of attack through which the attacker entered the smart home network. The destinations are the target or victim devices that have been compromised for malicious activities. Location refers to the area where the devices are located. different types of IoT-enabled smart devices have been used in the scenario. Feasible samples (features) are selected out of the total number of captured records and each identical sample record has been assigned a class label (malicious or normal). Table 1 presents the fundamental details of the 'DS2OS' dataset. Here, two types of behaviors have been notified of connected nodes: 'normal' and 'malicious. Nodes showing malicious behaviors are affected by various types of attacks. These attacks have been classified according to the properties of the set of records passed to the dataset. Some well-known attacks have been identified using the training process which have been labeled as 'DoS attack', 'dat-aProbing', 'malitious Control', 'malitiousOperation', 'scan', 'spying', 'wrongSetUp'. Numbers of occurrences of nodes showing normal and malicious behavior are shown in figure 7.

C. DATA PREPARATION
In the second phase, the raw dataset is transformed into a suitable format and prepared for preprocessing and analysis. This dataset is largely imbalanced that can be prepared using data cleaning.

1) DATA CLEANING AND TRANSFORMATION
Initially, the available dataset might not be in an appropriate format. Data cleaning and transformation can make it more constructive. It can be expensive, time-consuming, and tedious if there are incomplete, superfluous, missing, noisy, and repeated data entries in a dataset. Such entries are identified and can either be removed or can be replaced with new values. There are many ways to fill and replace these values. Inconsistent data values are transformed using some transformation techniques. Transformation can handle missing data, categorical data, skewed data, and data in the form of string or non-alphanumeric. Label-encoder and one-hot encoder are the two most useful categorical encoding techniques. Filling the null or missing values and cleaning noisy data are some important operations of data cleaning.
• Filling the Null or missing values: Null or missing values could be replaced either with 'zero' or the most suitable values in order to prevent any error. In figure 8, line no. 53 shows the replacement of irrelevant values with most relevant values in different places in the dataset. In line 5 data.value.astype (float) method converts specific numeric data values to float type.
• Data cleaning: Noisy data can be removed or replaced by relevant and suitably fitted data. The replacement of noisy data can be selected from the data existing in the dataset as per suitability. In figure 8, in line no. 6, data.drop() function has been used to remove the unnecessary columns from the dataset. Some examples of the implementation of data preparation techniques using python code are shown in figure 8. It shows how a missing or noisy value can be repaired or replaced and the irrelevant feature can be removed explicitly.

D. DATA PREPROCESSING
Data preprocessing is the most necessary phase to prepare a raw dataset for experiments. It is an essential phase for any project to improve the performance [40]. Generally, adequate time and effort are spent during data preprocessing. This process is also essential for ensuring a fitted labeled dataset that is to be created for reducing the complexity and ensuring the premium quality results of analysis [41]. Models developed using ML techniques can produce highly accurate and quick results. During the data preprocessing phase, the following operations are performed.

1) FEATURE ENGINEERING
High-dimensional datasets can increase time and space complexity. In order to overcome the complexity issues, the feature selection approach is highly recommended. Feature engineering is a process to create a new set of features based on current features according to the requirements of the project [42]. The main activities of feature engineering are: feature extraction, feature scaling, and feature relationship capturing. It becomes too time-consuming to process the whole collected data. Therefore, the most significant features or attributes are selected from the total number of features of a dataset to perform analysis. Features, which are not useful for the model, can be avoided during processing [43]. Entries containing irrelevant or null values can be dropped. This phase reduces the complexity of the analysis. Most promising features are selected dynamically from the entire dataset. Dimensionality Reduction is significant in ML and predictive modeling for data compression in order to reduce storage space and computation time. It refers to the process of reducing the number of features by obtaining a set of principal instances.
• Feature selection: The main principle of feature selection is to select the most important features (attributes) from the dataset and remove the least important features on which the performance of the model doesn't depend. The performance of the model can be suffered if irrelevant or less useful features are selected. Best feature selection may lead to improved model accuracy (improves the result by reducing misleading data), reduced overfitting (less redundancy also reduces noise), and reduced training time (reduced algorithm complexity). There are several methods to perform feature selection which are: Univariate selection, feature importance, and correlation matrix with a heat map. F1 Score: F-measure or F1 score represents the harmonic mean of precision and recall. The computation formula of F1 score is presented in Equation 9.
Here, True Positive (TP) represents the number of malicious nodes correctly characterized as malicious. True Negative (TN) represents the number of benign nodes correctly characterized as benign. False Negative (FN) represents the number of malicious nodes wrongly characterized as benign. False Positive (FP) represents the number of benign nodes wrongly characterized as malicious.

V. CONSTRUSTION PROCEDURE OF PROPOSED INTRUSION DETECTION MODEL
After preprocessing and classification of samples, the considered dataset needs to be trained and tested with a selected classifier. The dataset is split into a training set and a test set to calculate the accuracy and error scores of the proposed model. Features or independent variables 'X' and 'target variable 'y' need to be initialized. The LGBM classifier has been implemented using the 'LGBMClassifier()' method which is an ensemble learning method shown in figure 9. The usage of some important parameters of the LGBM classifier is given below: 1. boosting_type: The value 'gbdt' i.e. Gradient Boosting Decision Tree. This is used for exclusive feature bundling. This feature overcomes the limitation of the histogrambased approach. 2. max_depth: A value is assigned to this parameter to limit the depth of the tree. This hyperparameter is effective to control the over fitting. In present scenario, the value is set to 1. 3. n_estimators: The number of estimators can be varied to optimize the results. Here, the value of n_estimators is assigned as 400. 4. num_leaves: It refers to the number of leaves to construct a tree. The value of num_leaves should be less than the square of max_depth. 5. learning_rate: The value of learning_rate can be in the range of 0.1 to 1.0. In the current scenario, 0.1 gives the best result. 6. max_depth: Its value should be set to avoid overfitting.
The method of the LGBM classifier with necessary hyperparameters that need to be tuned with relevant values in order to obtain optimal results has been given in Figure 8. The classifier predicts the results of the test set. The model is fit VOLUME 11, 2023 and the X and Y data of the 'DS2OS' or 'mainSimulationAc-cessTraces' dataset is trained with the LGBM classifier. With default hyperparameters, the LGBM-based intrusion detection model achieves 93.4 % classification accuracy. The LGB-IDS model is an ensemble-based intrusion detection system constructed using an LGBM ensemble classifier with leaf-wise tree expansion is illustrated in figure 10.
In LGBM, the tree grows leaf-wise, following the bestfirst technique. The leaf is chosen with maximum delta loss to grow. Leaf-wise tree growth is not suitable for the small numbers of data. The parameter 'max-depth' is used to limit the depth of the tree. However, the tree grows leaf-wise; still 'max-depth' parameter is specified to control the tree depth.
Here, train and test accuracy, run-time, TPR, and FPR are calculated for final prediction.
LGBM optimizes the prediction model in the following ways: • Speed and memory optimization: LGBM uses histogram-based algorithms, which store continuous feature values in discrete bins. If the numbers of bins are small, a small data type can be used for storing the training data. This approach speeds up the training time and reduces memory usage.
• Reduced gain computation cost of every split: Other bagging and boosting algorithms including the histogram have time complexity problems because most of these are presorted.
LGBM can sum up the operations faster. Histogram has O(#data) time complexity. But once the histogram is built, the time complexity becomes O(#bins). (#bins) are much smaller than (#data) that makes the reduction in computing costs.
• Optimized accuracy: Due to fixed #leaf, the leaf-wise tree provides optimized accuracy as compared to level-wise.
• Optimal categorical feature split: One-hot encoding is used commonly to represent categorical features. But this approach may lead to unbalancing and the tree needs to grow very deep for achieving good accuracy. Instead of onehot encoding, the better alternative is to perform a split by partitioning features according to their categories into two subsets.
LGBM sorts the histogram according to its categorical features and stored values.
• Network optimization: LGBM uses collective communication algorithms instead of point-to-point communication that reduces the scatter.

VI. RESULTS AND DISCUSSIONS
The designed model has been examined by deploying on the selected dataset 'DS2OS'. A portion of the dataset or the whole dataset is used for statistical classification operation to strongly validate the proposed model and divide the data into meaningful categories. Train and test datasets have been first trained in ratios 80% and 20% respectively. In another round, the ratio of train and test data has been considered as 70% and 30% respectively. However, not much difference was observed in the results.

A. SPLIT DATASET: TRAINING AND TESTING
The dataset 'DS2OS' has been split into training data and test data. Then input labeled data trained the machine learning model and validated results using new test data.

B. DATA EVALUATION AND RESULT DISCUSSION
In this section, a substantial comparative analysis of different ML classifiers has been carried out. The model mainly focuses on gradient-boosting algorithms for classification and prediction. The algorithms are evaluated by calculating certain performance metrics (accuracy score, speed (runtime), and error determination) operations by applying them to an appropriate ratio of training and test data. For every parametric performance, the results are obtained by optimizing the hyper-parameters with their best values. The prediction performance has been evaluated using the following metrics.

1) CLASSIFICATION ACCURACY
The train and test accuracy scores are the most important parameters to be considered for building an ideal model for prediction. The classification accuracy score has been evaluated on the 'DS2OS' dataset using n numbers of samples for train data and n-1 numbers of samples for test data. The train and test accuracy scores of LR, RF, XGB, and LGBM classifiers are shown in Table 2. The ratio of train data can be varied for obtaining the best results. In most of the cases, the train and test data are taken as 80% and 20% respectively. Here, the train and test accuracy scores of XGB and LGBM Accuracy = true positive + true negative true positive+false negative+true negative+false negative (5)  are 99.99% which is a good value. After that, LGBM also returns good train and test accuracy scores as 99.92%.

2) ERROR-RATE PERFORMANCE
Error-rate computation results of LR, RF, XGB, and LGBM are shown in Table 3. Mean absolute error (MAE) shows the difference between predicted or actual values, mean squared error (MSE) or average squared difference between predicted and actual values, and root mean squared errors (RMSE) is the average value of MSE. These parameters determine the error-rates analysis and a minimum error rate is a favorable outcome for prediction accuracy. The best-fit model depends on the lowest error value. Here, the error-rates of XGB and LGBM classifiers are minimum as well as equal for both.

3) ROC-AUC SCORE
The ROC (receiver operating characteristic) results are shown in Table 4

4) ROC-AUC (TPR VS. FPR)
True positive rate (TPR) and false positive rate (FPR) determine the detection rates of events. TPR is also referred to as sensitivity and recall [44]. TPR and FPR are determined by following formulas presented in equations 10 and 11 respectively. TPR represents the accurate outcomes and FPR represents the false alarm rate.      Figure 11 consists of four parts that represent the ROC score of LR, RF XGB and LGBM classifiers in the form of graphs. Here Figure 11(a) presents multiclass classification using Logistic Regression. Figure 11 (b) presents the multiclass classification using RF. Figure 11(c) presents the multiclass classification using XGB and Figure 11(d) presents the multiclass classification using LGBM. These models predict seven attack classes and one normal class. In the above figures, the curves are going beyond the AUC limit. Different patterns of network traffic present 'Dos attack', 'data probing', 'malitious control', 'malitious operation', 'scan', 'spying', 'wrong Setup' attack behaviors as well as 'normal' behavior. The performance of XGB and LGBM is highest in terms of accuracy and error metrics. TPR and FPR of the LGBM classifier are also strongly acceptable. For strong validation, the LGBM will also be compared with some other gradient-boosting ensemble algorithms.

5) VALIDATION
Cross-validation (CV) is a technique for validating the efficiency of the model by training it on a set of input data and testing it on a set of previously unseen input data. This is different from an ordinary train-test split. CV technique is also used to check the stability of the model as well as compare the performance of different models. It means that the model developer can't fit the model only on the basis of the training dataset. For this, a particular sample of the dataset needs to be reserved which was not taken as a training dataset earlier. After that, the test operation is performed on the reserved sample of the dataset before the deployment process.  There are several methods of validating the efficiency of the model. But K-fold cross-validation and stratified k-fold cross-validation approaches are most useful. The 'K-fold' cross-validation divides the input dataset into k subsets of equal-size samples which are called folds. But sometimes this method results in a noisy performance of the model while splitting a dataset with big data. The noise can be reduced by increasing the k-value. 'RepeatedK-fold' CV performs by repetition of k-folds n times with distinct random states each time. 'RepeatedStratifiedK-fold' CV works on the stratification concept which means the data is rearranged to ensure that each fold is a good representative of the dataset. It performs the splitting in a stratified way instead of a random way. It is the best approach to deal with bias and variance. Table 6 shows the mean accuracy, standard deviation, and run-time of cross-validation scores. The values of hyper-parameters for the RepeatedStratifiedKfold() validated method can be tuned for obtaining the optimal results of mean accuracy, standard deviation, and run time. Here, the values of hyper-parameters n_splits, n_repeats, and random_state have been taken as 5, 2, and 0 respectively for every model constructed using LR, RF, XGB, and LGBM classifiers.

6) TIME EFFICIENCY
Speed or run-time efficiency is also one of the most important metrics for the selection of a classifier for developing a model. The observation of time efficiency or prediction latency provided by underlined classifiers is shown in Table 7. The results have been validated using the 'RepeatedStratifiedKfold' cross-validation (CV) method.
Compared to other pieces of literature, the intrusion detection system proposed in the present paper acquires better performance in terms of run-time. The accuracy value of LGBM is also very good. However, in some cases, some other algorithms show 100% accuracy which may lead to the overfitting problem. The considered dataset contains 7 malicious classes which include 'DoSattack' (DoS), 'dataProbing' (Probe), 'malitiousControl' (MC), 'malitiousOperation' (MO), 'scan', 'spying' (Spy), 'wrongSetUp' (WS), and one 'normal' class. The proposed method primarily throws light on the computation time taken by the proposed IDS using the classification and prediction approach. The LGB-IDS model takes 6.465 seconds to execute a method.

7) THREAT PREDICTION AND DETECTION RATE
Threat detection rate of each class can be measured on the basis of predicted (P) results and actual results. Actual detection (AD) rate can be different from predicted results. The dataset has been trained and tested by selecting different data samples. There are a total of 7 threat classes and one normal class. Table 7 presents class-wise average threat prediction (P) results and actual detection (AD) rate in % which have been calculated on the basis of precision, recall, f1-score and support to each class using LR, RF, XGB, and LGBM algorithms. Detection rate for all attacks calculated by XGB and LGBM are almost equal. XGB and LGBM have achieved detection rates between 99% to 100%. In most of the cases, it is 99%. LR has not performed very well for detecting some threats. However, RF has performed better than LR. The overall performance of XGB and LGB are better than others. The true positive rate and true negative rate is high for XGB and LGBM. False alarm rate (FAR) is also very low for these algorithms. If a threat exists and is not identified or predicted, that may lead to increase in value of FAR. The threat prediction (P) and detection (D) of LGB-IDS model have been compared with the performance of some other models and shown in table 8.

8) OVERALL PERFORMANCE OF PROPOSED IDS
The performance of the proposed IDS in terms of accuracy, precision, recall, F1 score, and support are showcased in table 8. Total number of samples is 357941, out of which 71589 samples have been chosen for training and testing the proposed model. The average accuracy is more than 99%.   Class-wise precision, recall and f1-score have been shown in table 9. Here, precision is the correctly classified attack samples. Recall is the percentage of samples out of the total number of attack samples that have been classified correctly as malicious.
For DoS attack, the precision rate is 98, recall is 64, and f1-score is 77. On the other hand, for other attacks and for normal class, precision, recall, and f1-score are 100. Observation of intrusion detection rate has been illustrated in figure  12 through precision, recall, and f1-score. Here, classification of threat detection rate has been shown in percentage. Support represents the numbers of samples which truly support the observation.
Classification of threat detection rate can be better illustrated through a confusion matrix. Confusion matrix has been shown in figure 13 to evaluate the classification accuracy. Total number of observations have been classified into numbers of predicted values and actually detected values. it shows where errors in the proposed model occurred. Columns in the confusion matrix represent the predictions made by the model. While rows represent the actual expected values.
The proposed Ensemble-based model for IDS shows remarkable performance in terms of accuracy, TPR, and time efficiency. Only the models presented by Huˇc et al. [16], Xu and Fan [20], and Mahamed et al. [23] have calculated the time efficiency of their models. The model proposed in the present paper shows outstanding performance. The comparative analysis of proposed model and existing intrusion detection models has been presented in table 10. Based on the results evaluated in this paper, the proposed LGB-IDS model developed using the LGBM ensemble technique promises a high prediction rate as well as low prediction latency that improves the security and accuracy of intrusion detection. Most of the models studied in this paper have not been evaluated in terms of time efficiency which may lead to even disaster conditions. All the observations performed in this study point that LGBM is the most suitable classifier to develop IDS. The run-time utilized by LGBM is the lowest and (approx. 10 times) very much less than many other algorithms. In the present study, the speed performance has been shown in 'seconds' not in 'milliseconds' because the evaluation operation has been performed on thousands of samples taken from a dataset. The development of a model not only depends on a single parameter but it must be evaluated on multiple parameters to ensure the validation. In IoT, an intrusion is a serious problem and it must be detected using an efficient and time-efficient IDS that has been validated in different dimensions. Along with the prediction accuracy, the run-time is also important to utilize the available resources and give fast results.
LGBM is a fast and high-performance gradient-based boosting algorithm that is derived from widely used boosting machine learning algorithms.

VII. CONCLUSION
The main objective of the present paper is to propose an effective and time-efficient intrusion detection system (IDS) having efficient attack detection capability. The present work has been evaluated on the smart home dataset 'DS2OS' that exhibits the modern traffic in the IoT-enabled environment. The proposed Intrusion Detection Model is a novel approach that has been designed using ML-based ensemble algorithms. Feature selection methods have been applied to the dataset for reducing the model's prediction latency. Feature selection reduces the size of input data which also reduces time complexity, space complexity, and memory efficiency. The model has been evaluated using certain important metrics such as accuracy, time, errors-rate, TPR, and FPR. The results in the present paper indicate that no single algorithm can be taken into consideration as fiercely superior to the others. However, extreme gradient boosting (XGB) and light gradient boosting (LGBM) based ensemble models outperform in terms of accuracy and error rate. Still, the LGBMbased IDS showcases a lower threat prediction latency as compared to models based on other algorithms. Other benefits of the proposed model include low power consumption, high accuracy, and reduced over-fitting. The proposed model efficiently balances the input dataset and detects the behavior of traces for intrusion detection. TPR and TNR of the LGB-IDS model also exhibit good performance. The 'DS2OS' dataset contains more than two classes and therefore, instead of binary classification, the multi-class classification has been performed in the present model to segregate normal (benign) and certain types of anomalous classes. These classes have been identified on the basis of decision trees constructed from behavioral patterns of connected nodes. The proposed work may be quite helpful to perform classification, prediction, and decision-making in large, complex, and high-dimensional datasets as well as small and low-dimensional datasets not only in various IoT environments but also in various other real-time scenarios. Time efficiency is the most important feature of the proposed model. In future, the proposed model may prove to be significant to investigate cybercrimes in different IoT-enabled environments with less time consumption, low memory usage, high accuracy, and low error rate. A faster and efficient intrusion detection system (IDS) will be helpful to combat various security threats in different IoT environments. The results obtained using the proposed model may be quite helpful to target various sources of attacks in much less time and can block further attempts of attacks in various scenarios deployed in government and private sectors. PREETI GULIA received the Ph.D. degree in computer science in 2013. Since 2009, she has been serving the Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India, where she is currently an Associate Professor. She has published more than 80 research papers indexed in SCI, SCIE, and Scopus. She presented papers at national and international conferences. She has guided four scholars and is guiding five more scholars. Her research interests include data mining, big data, machine learning, deep learning, the Internet of Things (IoT), and software engineering. She is an active Professional Member of IAENG, CSI, and ACM. She is also serving as an editorial board member and an active reviewer of international/national journals.
FABIO ARENA received the bachelor's and master's degrees in telecommunication engineering from the University of Catania, in 2006 and 2010, respectively, and the Ph.D. degree in civil infrastructures for the territory from the Kore University of Enna. He is currently an Assistant Professor with the Faculty of Engineering and Architecture, Kore University of Enna, Italy. He has published several scientific articles in various international journals and collaborates with some of them. His current research interests include ITS, driverless vehicles, big data, smart cities, and network architecture.
GIOVANNI PAU (Member, IEEE) received the bachelor's degree in telematic engineering from the University of Catania, Italy, and the master's (cum laude) and Ph.D. degrees in telematic engineering from the Kore University of Enna, Italy. He is currently an Associate Professor with the Faculty of Engineering and Architecture, Kore University of Enna. He has authored or coauthored more than 95 refereed papers published in journals and conference proceedings. His research interests include wireless sensor networks, fuzzy logic controllers, intelligent transportation systems, the Internet of Things, smart homes, and network securities. He has been involved in several international conferences as the session co-chair and a technical program committee member. He serves/served as a leading guest editor for special issues of several international journals. He is an Editorial Board Member and an Associate Editor of several journals, such as IEEE ACCESS, Wireless Networks (Springer), the EURASIP Journal on Wireless Communications and Networking (Springer), Wireless Communications and Mobile Computing (Hindawi), Sensors (MDPI), and Future Internet (MDPI), to name a few. Open Access funding provided by 'Università degli Studi di Enna "KORE"' within the CRUI CARE Agreement