Ensemble learning-based IDS for sensors telemetry data in IoT networks

: The Internet of Things (IoT) is a paradigm that connects a range of physical smart devices to provide ubiquitous services to individuals and automate their daily tasks. IoT devices collect data from the surrounding environment and communicate with other devices using di ﬀ erent communication protocols such as CoAP, MQTT, DDS, etc. Study shows that these protocols are vulnerable to attack and prove a signiﬁcant threat to IoT telemetry data. Within a network, IoT devices are interdependent, and the behaviour of one device depends on the data coming from another device. An intruder exploits vulnerabilities of a device’s interdependent feature and can alter the telemetry data to indirectly control the behaviour of other dependent devices in a network. Therefore, securing IoT devices have become a signiﬁcant concern in IoT networks. The research community often proposes intrusion Detection Systems (IDS) using di ﬀ erent techniques. One of the most adopted techniques is machine learning (ML) based intrusion detection. This study suggests a stacking-based ensemble model makes IoT devices more intelligent for detecting unusual behaviour in IoT networks. The TON-IoT (2020) dataset is used to assess the e ﬀ ectiveness of the proposed model. The proposed model achieves signiﬁcant improve-ments in accuracy and other evaluation measures in binary and multi-class classiﬁcation scenarios for most of the sensors compared to traditional ML algorithms and other ensemble techniques.


Introduction
Internet of Things (IoT) is influencing our lifestyle from the way we act to the way we behave. It is a collaborative network with connected smart IoT devices. These devices are equipped with sensors, actuators, storage, computational, and communication capabilities to gather sensitive telemetry data from remote locations and share this data to receiving systems for analysis [1,2].
In an IoT network, the devices are interconnected and communicate with each other through some protocols to provide a service. Thus, the behaviour or decision of one device depends on the data (telemetry) coming from another device, and that implicit relationship is device interdependence.
Today, IoT applications are surprisingly increasing in all fields to automate the traditional systems in the industry, retail, home and building, health, agriculture, etc. IoT smart objects (sensors and actuators) provide intelligence to our standard devices by blending computing and network capabilities [3,4]. Due to which the smart appliances can be operated automatically, e.g., a smart fridge can monitor the temperature condition and set it according to a threshold value and the state (open/close) of the smart door based on the detection of movement [5].
As per [6][7][8] currently, the devices connected to the internet are estimated to be more than 25 billion and up to 50 billion connected devices expected till 2022 [9]. With the exponential (dramatic) growth of IoT applications, IoT devices have become a smart object for attackers in attaining their goal of intrusion. For example, in 2015 Ukrainian government reported the outage of power service due to cyber security attacks in which approximately 225,000 customers were affected. An attacker penetrated the monitoring system of IoT devices due to a poor security mechanism which resulted in a power blackout [10]. The requirements for IoT applications vary as per industry. Thus, multi-layered network security system also takes into account the security challenges across each layer [11].
Various available security solutions such as encryption, firewall, and intrusion detection cannot be directly installed on IoT applications. Because IoT devices have their unique features such as interdependence, constraint (energy [12][13][14][15] , storage etc.,), diversity, intimacy, etc. [16]. These features also affect the security and privacy [17] of IoT networks. For example, in an automated smart light system, an intelligent device (sensor) senses the light level and compares it with the threshold value. If the light level is below than threshold, it tries to balance the light level and automatically turns on the smart bulb. The smart bulb activation depends on the light level capturing device data in this scenario. Similarly, another example (illustrated in Figure 1) of an intelligent room consists of multiple IoT devices, i.e., a smart plug, smart window, thermometer, and air-conditioner (AC). When the thermometer detects that the temperature inside the room has risen over a threshold and, simultaneously, the smart plug detects that the AC is turned off, the window will automatically open to stabilize the room's temperature. Here, if an intruder wants access to the room, he does not need to attack the smart widow directly. He can create a physical security breach by gaining access to a smart plug and using it to switch off AC that raises the temperature in a room and cause to open the window automatically [18].
Such independence among IoT devices in IoT applications opens a path for an intruder to indirectly control the targeted device by altering the data coming from another device. Therefore, to secure the sensitive telemetry data of IoT devices and ensure the secure interdependency among IoT devices, a system such as an Intrusion detection system (IDS) that specifically meets IoT applications' requirements is the need of the day.

Intrusion detection system
IDS is a software or a monitoring device that keeps track of the network to secure and flags the administrator if any suspicious activity has been detected. It can detect the malicious attacks data and normal data which might not be identified with traditional security mechanisms, i.e., firewall [19]. In literature, [19,20] multiple dimensions, such as the source of information, identification method, detection method, and learning approach, are used to classify IDSs. Figure 2 illustrate the classification of IDS.

Information source based IDS
This type of IDS is mainly divided into Host-based (HIDS) and network-based (NIDS) IDS. HIDS are usually installed on localhost (i.e., IoT device or system) to secure it from malicious activities. It monitors the host's file system, system calls, network events and informs the system when the intruder performs any suspicious activity. HIDS is installed on a single computer or device and is only capable of detecting attacks on the system on which it is installed [21]. Whereas NIDS keeps an eye on the data related to a network, such as IP addresses, network packets, protocols [22], traffic volume, etc., to detect the intrusion [23]. To achieve real-time intrusion detection, HIDS and NIDS are also used.

Detection based IDS
In general, Signature-based IDS (SIDS) and Anomaly-based IDS (AIDS) are primary intrusion detection techniques.

Signature based IDS (SIDS)
SIDS (misuse detection) is a database based mechanism for storing known attack signatures. The signatures are the patterns of attacks extracted from network packets' features. SIDS monitors the packets of network traffic and compares them with the previously stored signatures in the database. If the pattern matches with any existing database signature, then SIDS generates an alarm. It only detects the known attacks and is unable to identify the zero-day (unknown) attacks [24]. Thus, the system's security requires the continuous identification of new signatures and updating the database with expert inception. The performance of a SIDS methodology is contingent upon the rules deployed, and modifications to the rules may affect the IDS's performance. The most common example of SIDS is the Snort tool [25].

Anomaly based IDS
An AIDS is referred to as behaviour-based detection and deals with unknown attacks. It observes the network and system behaviour and establishes a baseline from typical behavioural patterns using a variety of ways. Any considerable variation from the baseline of an inspected pattern is labelled as an anomaly [26] .

IDS placement schemes
In an IoT network, multiple devices communicate and share data through a medium like a router.
There are a few techniques for deploying intelligent IDS to secure these data-sharing nodes.

Distributed IDS placement
Intrusion detection systems are deployed in every physical resource-constrained object or node in distributed placement. This aspect of resource-constrained devices must be considered when developing IDS. [27] address this issue and proposed lightweight IDS. The nodes in a distributed deployment may also be responsible for keeping an eye on their neighbours. Moreover, distributed IDS comprise several IDS dispersed around a vast IoT ecosystem and linked to one another or a central server.

Centralized IDS placement
The intrusion detection system is deployed in a central device, such as a router or dedicated host, in the centralized IDS location. It is also known as NIDS. The network router/dedicated host receives all the data collected by IoT devices and sends it to the internet. Resultantly [28], an IDS were installed in a router or dedicated host that can examine all traffic flowing among the nodes and the Internet.

Hybrid IDS placement
HIDS placement blends centralized and distributed placement techniques to take the edge of their advantages while avoiding their disadvantages.

Validation strategies of IDS
Validation ensures that the proposed or developed model acts accurately within the scope and determines objectives. There are many validation strategies of IDS. [29] defined the primary strategy followed by the researcher in literature for validating the proposed techniques are: • Simulation: includes the methods for simulating some IoT scenarios using software like MAT-LAB, IoTIFY, etc. • Empirical: Empirical validation is how a model's accuracy can be verified through systematic experimental data collected from an operational setting or context.  The dependence feature among IoT devices allows intruders to alter the data into network by compromising another dependent IoT device. Without sufficient security and intelligence in smart objects, attackers manipulate the telemetry data with the help of MItM assault [30] and code injection assaults. The intruder attempts to send messages between two dependent devices (nodes) in a network and control the behaviour of other devices.
Therefore, The features (interdependence, constrained, etc.) of IoT devices (sensors) and new diverse attacks on telemetry data motivate us to upgrade IDS which meets the specific requirements of IoT devices and detect tempered telemetry data during communication between objects. It is found from the literature that ML has shown efficiency in different application areas, including intrusion detection systems for IoT [31][32][33][34]. Other solutions such as deep learning (DL) and artificial neural network (ANN) are also helpful, but they need more computation power and resources than ML. Deep learning solutions also requires large amount of data for training. In a survey [35] authors also highlighted the security and privacy issues of DL in recent years with respect to two types of attacks. Some IoT devices, i.e., sensors, are resource-constrained and require a lightweight solution based on computation power and resources [36]. Therefore, ML classifiers have been selected for designing an IDS for telemetry data security. But a single ML classifier's detection accuracy is not always good. For example, single LR does not perform well for non-linear data, and classification done by DT tends to majority class which causes overfitting. To improve detection rates while lowering training and generalization errors, complex strategies should be used, which drive us to apply ensemble methods to detect attacks in telemetry data. Consequently, a stacking-based ensemble model has been proposed based on ML classifiers to improve the detection rate in this study. In the proposed model, Linear Discriminant Analysis (LDA), Naive Bayes (NB), Random Forest (RF), and Linear SVC have been ensembles as a base model, and Logistic Regression (LR) is selected as a meta-model. This model is deployed on the Ton IoT telemetry dataset for binary and multiclass classification. The model achieves significant improvement in accuracy. The proposed model handle the variance and biasness issues of single classifiers. The selected meta model have low variance which solves overfitting problem. The proposed model detect the attack and also identify the type of attack.

Contribution of study
The novel contributions of this study are: • A stacking based ensemble intrusion detection (ID) model has been proposed for telemetry data protection • The proposed model has been evaluated for both binary and multiclass classification scenarios on single IoT device data and merged IoT devices data. • Cross comparison of homogeneous and heterogeneous ensemble model has been performed for telemetry data

Structure of paper
The remainder of the paper is organized as follows: Section II continues the study of existing ML and EL approaches for IDS. In contrast, Section III provides an overview of various supervised ML techniques. Section-IV deals with our research methodology's flow, followed by a detailed overview of the dataset and experimental setting. Whereas section-V discusses the experimental outcomes. Section VI concludes the study with a few concluding points.

Literature review
This section focuses on the prevailing IDS techniques based on ML and Ensemble Learning (EL) used by the research community to identify intrusions across various IoT applications. In the digital era of the smart environment, various application fields, including industrial processes, public safety, energy consumption, home automation, environmental monitoring, healthcare, etc., could benefit significantly from IoT systems [37]. Different sensors are deployed to control or execute the operations using multiple techniques [38,39] in a smart environment. Smart things become more effective when IoT systems and smart environments are integrated. However, IoT systems are subject to several security vulnerabilities because of interconnectivity. The attack against any smart environment such as household appliances could harm the end-user (family). [40] describe that any IoT system's security problems can be divided into four categories: authentication and physical threats, confidentiality hazards, data integrity/ aggregation [41] issues, and privacy concerns. Most of the studies [22,42,43] provide efficient and secure data aggregation solutions for wireless sensor networks.
A smart home contains many smart appliances, such as door locks, power switches, light bulbs, smoke alarms, etc. A variety of network-based attacks can target these devices. The study in [44] describes how IoT devices are susceptible to attacks with an example. The author performed a test with three standard smart home devices: nest smoke-alarm, Phillips hue light-bulb, and the Belkin WeMo power switch. The author observed the network activities of these IoT devices and illustrated the ease with which security and privacy can be compromised. One of the results of this investigation was finding flaws in the request-response message passing between the bridge and the Philips hue light bulb app. The attacker can easily capture the user name and bridge IP address. They used a primary python language to deliver HTTP-PUT requests and gain complete control of the bridge. As a result, the author provided a solution for this problem by restricting access at the network level, i.e., the cloud service provider offers security as an overlay service that does not affect connected home devices. As a result, the proposed security solution, namely home network access control rules, is a feasible way for ensuring privacy and security [45] in smart homes.
Another fascinating discussion about the smart home environment was done by [46]. He addressed privacy challenges and ramifications with connected devices. The article covers the data collected by smart home users. This discussion led to the conclusion that malware management was a significant research gap.
A detection model based on deep learning techniques was proposed by [47]. The experiments were conducted by collecting data from the gas pipeline system's Remote Telemetry Unit (RTU). The vital information was then extracted from this data, valid for the experiments. The proposed model outperformed other detection algorithms regarding detection rate, precision, and false-positive rate. Its results for the identification of zero-day attacks were good. For accurately designing and evaluating IoT/IIoT defense systems, [48] generate a new telemetry-based dataset by using heterogeneous devices. Diverse cyber-attacks were launched in a dataset. The performance of attack classification over IoT telemetry data was investigated using a variety of ML and deep learning algorithms. As a result, CART mainly obtains high accuracy.
A study based on supervised IDS for a smart home (consisting of 8 IoT devices) IoT network is conducted by [49]. The suggested IDS architecture comprises three levels to identify prevalent network-based cyber-attacks. The proposed IDS detects malicious activity and determines its type. Three tests were run with nine different classifiers over different layers. The result shows that the J48 classifier had the best performance in terms of f-measure in all three experiments, with 98%, 96.2%, and 90%. The system's flaw was that it required the integration of all three layers to identify an intrusion. The entire system will be affected in case of failure in any layer.
[50] used an SVM-based classifier including three kernel functions to create a lightweight IDS. The author solely considered the forms of assault that will impact the traffic intensity in that study. The proposed IDS only evaluates a packet arrival rate attribute according to the given results. The experiments were performed by using MATLAB version 2018b. Several functions such as linear, polynomial, and radial-basis evaluated the classifier. Because there were fewer inputs rather than significant inputs, SVM processing time and complexity were lowered. Traditional evaluation measures, i.e., accuracy, TPR, FPR, and FDR, were selected to evaluate the model. The inability to recognize intrusions without a corresponding influence on traffic intensity was a weakness of this proposed technique.
In another ML approach, [51] worked on KDD Cup 99 dataset for classification. The 22 types of attack were categorized into four classes in the dataset, namely, DDoS, Prob, U2R, R2L. They were developed Sequential Minimal Optimization (SMO), NB, Bayes Net (bN), Multilayer Perception (MLP), RF algorithms for the dataset. The results of the algorithms were compared with one another in terms of false rate, precision, recall, f-measure, and accuracy rate value. The experiments were performed on weka software for the classification of data.
Another comprehensive study conducted by [52] investigated 14 different ML algorithms that were applied for IDS in diverse situations. The experiments were conducted on the KDD99 Cup dataset using the MLA approach. This suggested model showed that algorithms such as RF, ANN, and decision tree provide superior development than others for identifying attacks. It also reveals that the area of application and the algorithm influenced the FPR, detection rate, and accuracy.
Furthermore, A comparative evaluation of commonly used ML algorithms such that LR, MNB, G-NB, B-NB, KNN, DT, AdaBoost, RF, MLP, and GB was tested on the UNSW-NB15 dataset in [53]. The study reveals that The RF classifier exceeds the other classifiers in terms of accuracy, positively
predicted value, and f-measure, with 87, 98, and 84%, respectively. Ensemble approaches have gained considerable attention in recent years in intrusion detection. In a study, [54] an XGBoost ensemble model was proposed to examine the botnet attacks against three protocols DNS, HTTP, and MQTT. An author claims that its proposed model is built on tree boosting ML methods, which smooth out the "bias-variance" trade-off. Precision, accuracy, recall, F-1 measure, and support evaluated the proposed IDS on the KDDCup99 dataset. The proposed approach attained a precision of 99.95%. However, the KDDCup99 dataset was obtained from traditional networks, and it did not include sensor telemetry data. Therefore, it was unable to serve as an adequate IoT benchmark dataset.
Three standard ensemble techniques bagging, boosting, and stacking, were explored in study [55] for a NIDS. The selected ML classifiers were NB, ANN, J48, and REPTree, and each classifier was employed as a based model in the bagging and boosting technique. While in the stacking model, REPTree was used as the base learner and an ANN as meta learner. The stacking model achieved the maximum accuracy of 87.92% than bagging and boosting. All tests were conducted on the UNSW-NB15 data set using the Weka Data Mining tool. The study results depicted that J48 and REPTree outperformed the NB and ANN by achieving the highest accuracy rates.
Another study [56] was presented an Ada Boost ensemble IDS for mitigating botnet attacks against the DNS, HTTP, and MQTT protocols used in IoT networks. An Ada boost model was implemented via DT, NB, and ANN. A new statistical flow feature technique was presented to generate new features from the protocols and used to evaluate them in malicious activity detection. The evaluation was conducted on UNSW-NB15 and NIMS botnet datasets using ensemble techniques in accuracy, DR, and FPR. The new method outperformed three current methods in DR and FPR when compared to SVM, Markov chain (MC), and Bayesian network (BN).
An AIDS model for IIoT networks with two phases was presented in [57]. In the first phase, SVM and NB classifiers were an ensemble, and Kfc-validation was used to build the train and test datasets. The second step used the ANN and RF results as input for model classification. The best classifiers were determined based on the results. The models were validated using three publicly available data sets: WUSTL IIOT-2018, N BaIoT, and Bot-IoT. The selected evaluation metrics Precision, recall, and accuracy attains up to 98% performance on all datasets.
Author [58] introduced an architecture ELNIDS to detect routing attacks against IPv6 Routing Protocol based on the Signature technique. Four ensemble models, boosted Tree, bagged Tree, subspace discriminant tree, and RUSBoosted tree, were implemented for this study. The dataset RPL-NIDDS17 was developed by using the NetSim tool. The expanded dataset contained traces of routing attacks on the RPL protocol. The simulation was carried on Matlab 2017b. Ensemble of Boosted trees achieved the highest accuracy of 94.5% then others based on provided results.
To determine the effectiveness of ensemble learning for NIDS [59] constructed both homogeneous and heterogeneous ensemble techniques. Rf was selected inhomogeneous method, and NB, KNN, RIPPER, DT classifiers were ensemble for the heterogeneous ensemble model. The binary and multiclass classification was performed on the UNSW NB15 dataset. Reported results depicted that LSTM achieves accuracy up to 80% for binary type and 72% for multi-classification.In contrast, homogeneous model RF conducted high accuracy 98% for binary and 87.4% for multi-classification then LSTM. The results heterogeneous model was comparably same as the homogeneous model.
Authors in [60] proposed an ensemble model based on a Bayesian network and RF as base classifier along with vote and RandomCommitte as meta classifiers. The proposed model was evacuated on the KDDcup99 dataset with 10-fold cross-validation. The model was evaluated regarding the accuracy and ROC curves as evaluation metrics. Their proposed model achieved better results (0.99%) in terms of AUC than a single Bayesian network and random tree for all attacks( Probe, DOS, U2R, R2L) classification.
To develop AIDS, a three-tier paradigm was proposed in [61]. The tries include feature selection, modeling of the classifier, and validation. The first feature selection tier uses a hybrid technique based on three evolutionary search algorithms to select important features. The model's rotation tree and banging were trained for classification in the second tier. In the last stage, the validation was performed with 10fcv. The study was conducted on datasets UNSW-NB15 and NSL-KDD datasets. Precision, accuracy, FPR, and recall were selected for evaluation. The model achieves accuracy 85.8%, sensitivity 86.8%, and detection rate 88.0% for the NSL-KDD dataset.
To increase the efficiency of AIDS in IoT networks [62] employed an RF classifier. Parameter tuning was performed with different sizes for ensemble trees in their work. The proposed methodology evaluated FAR, and accuracy for publicly available datasets named UNSW-NB15, GPRS, and NSL-KDD. The authors use statistical tests such as the Friedman and Nemenyi tests. The results indicate that the suggested model works better on the NSL-KDD dataset than on the GPRS dataset. It is demonstrated that the proposed strategy does not produce satisfactory results when dealing with sensorrelated data. An author [48] tested several ML algorithms on proposed ToN-IoT dataset. Random forest is selected to test the performance of EL for telemetry data of IoT devices.
The Table 1 represents the comparison of literature based on Ensemble Learning techniques.

Limitations:
The following limitations have been extracted from the literature: • Old datasets were used concerning telemetry data.
• The selected datasets do not have specific characteristics of IoT and IIoT applications • Most of the study focused on network-based security and not focused on sensors telemetry data security • The evaluation of ensemble techniques to secure the sensors telemetry data is not available in the literature

Supervised models for IDS
From the literature, it is found that researchers have used a variety of machine learning algorithms to detect intrusions. It is difficult and risky to share network data with the research community since it may contain sensitive communications or personal information. Due to that, the absence of training data makes it challenging to use ML for intrusion detection. This section presents the primary mathematics behind some classical machine learning models used as members of our proposed ensemble-based model. When applying multiple models to different problems, a comprehensive understanding of what happens under the hood is often unneeded. Understanding the fundamentals of each algorithm is beneficial when choosing a model and adjusting parameters to improve model performance.

Logistic Regression (LR)
LR is a classification algorithm that deals with the discrete set of classes. LR estimates a probability (chance that a test sample belongs to a particular class) value of the test sample that may be mapped to two or more discrete types using the logistic sigmoid function [64]. Sigmoid function mapped absolute values into another value between 0 and 1. During classification, the model predicts that the instances belong to a negative class if the estimated probability is below 50%; otherwise, illustrations belong to a positive class [65]. The mathematical representation is shown as: where z is the function's input, e is the natural log's base, and g(z) is the output, a probability estimated between 0 and 1. The cost function for logistic regression is the average log loss overall training cases.
The cost function is written as follows: where m is the set of possible training samples, y i is the actual label for the i th training sample, and h(z(θ) (i) ) is the model's prediction for the i th training example. LR performance is better for the simple datasets with linearly separable classes. The LR underperforms to increasingly complicated datasets, but regularization methods can avoid this limitation.

Linear support vector machine
SVM stands for Support Vector Machine and is a supervised learning technology that deals with classification and regression problems. However, in ML, it is usually turned to account for classification problems. The number of features determines the number of dimension plans in SVM, and the value of an attribute is defined as the value of a set of coordinates. In n-dimensional space, the data elements are marked as a point. Then classification is accomplished by locating the optimal hyperplane (best decision boundary) that distinctly classifies the data points. Linear SVM classifies the linearly separable data points by using a straight line [66]. Multiple hyperplanes can be created via kernel such as a linear, polynomial, Gaussian Radial Basis Function (RBF), etc. The mathematical representation of a single hyperplane is: The equation is driven from two-dimensional vectors, where w is represented as the weight vector, x is the input vector, and b is the bias. The hyperplane is used to predict by using a hypothesis function that is defined as: The data elements above or on the hyperplane are classified as positive class +1, and the data points below the hyperplane are classified as negative class -1.

Naive bayes
The Naive Bayes algorithm is a supervised classification technique based on the Bayes theorem. It does not work as a rule-based classification. It works on probability theory for classification. So NB is a simplified Bayesian probability model that assumes that all attributes' class values are independent. It indicates that the probability calculated for one feature does not affect the other [67]. The three main types of NB are Gaussian, Multinomial and Bernoulli. NB combines prior probability and conditional probabilities into a single formula, which is mathematically expressed as follows: P(M/L) represents the probability calculated for each instance m for targeted class L. Whereas, conditional probability is mathematically expressed as: where µ and σ represents the mean and variance of a value for a given class. The values of i and j can be 1,2,3...n. The Naive Bayes technique is a popular one that is frequently used by the research community for classification, and feature reduction [68,69]. Assumption of all mutually independent attributes and dependence on all attributes being categorical are the limitations of Naive Bayes.

Random Forest (RF)
RF or random decision forests is an EL method, in which a considerable number of de-correlated trees are built and then averaged [70]. RF generates the forest of decision tree from randomly split dataset into samples. An individual decision tree is created for each attribute depending upon an independent random sample [71]. To classify test data, predictions from each tree are obtained, and finally, the class is assigned to test data by majority voting or averaging technique [48].

Linear Discriminant Analysis (LDA)
At the stage of data pre-processing, LDA is a well-known linear algorithm that is widely used as a dimensionality reduction [72] and pattern classification method. However, in this study, LDA is used as a classification approach to developing an IDS. LDA has several advantages, i.e., easy to apply, efficient, and lower computation cost. Therefore, it makes LDA a good choice for creating an IDS [73]. The main steps of LDA are listed as follows: • Estimate d-dimensional mean vectors for the distinct classes from the dataset. The mathematical representation of the mean vector of class metrics is: m a 2 , m a 3 , ......., m a n ] where m a i represents the mean of the i th attribute of the class matrix.
• Compute covariance matrix for multivariate features from the training data from the equation as: where Cls mc T represent the mean corrected class matrix. The number of rows are represented by n. whereas i can be 1 to n. • Based on the Bayes theorem and the probability of each class, the probability of output class (attacked/normal) is estimated for a given observation. • Make final prediction by using discriminate function. The discriminant function expresses as: where f k (x) is the discriminate function for class k given observation x, co-variance matrix is as sum sign. µ sign shows the mean and estimated probability is denoted as P.

Decision Tree (DT)
DT is one of the most widely used for classification and intrusion detection. DT consists of three basic components, namely decision node (identifies test attribute), branch (possible choice based on the test attribute value), and a leaf node (the class that the instance is a member of) [74]. The data set is learned and modeled first, and then a tree is formed in the DT algorithm. When test data is given to DT, it will be classified based on the prior dataset's classification procedure. A test is performed for classification using the test attribute value and a decision procedure (denoted by root node). The class (normal, attack) is assigned to the test data when the leaf node is reached. DT better perform for huge datasets [26]. DT has the advantages of better detection performance, generalization accuracy, etc.  Each ML algorithm has various constraints, like a low bias and a slight variance. EL address the limitations of standalone machine learning techniques. The concept of the EL technique was introduced in 1979 by [75] to improve the performance of independent ML algorithms. An ML paradigm

Mathematical Biosciences and Engineering
Volume 19, Issue 10, 10550-10580. merges a diverse range of models (weak learners) to produce one optimal predictive model that gives more accurate results than a single model. The modular structure of EL avoids the overfitting issues associated with high variance. The algorithms combined in the EL technique should be selected based upon their computational cost to achieve better performance.
The visual representation of EL is given in Figure 3. In the ensemble learning process, firstly, the dataset is split into training and testing data. Then training data divides into multiple subsets using various techniques (i.e., with-replacement, without replacement). The subsets are given to selected models to train them in the next step. Then the test data(unseen) is provided as input to these trained models for the prediction. Finally, the predictions made by each model (base learner) are combined with a majority vote or averaging methodology [76]. Mathematically it is expressed by the equation: where E n,m is the output of the ensemble model of the N machine learning classifier. P is the prediction of each classifier.

Types of EL
EL is categorized into two main types: Homogeneous EL and heterogeneous EL. In the homogeneous EL approach, the same kind of base classifiers (weak learners) is used to train on a different subset of data. The result of each classifier is aggregated to improve the precision. The homogeneous EL uses the same feature selection method for all training data. This type of EL is suitable for large datasets. Bagging and boosting are the most common type of homogeneous EL. Whereas, in heterogeneous EL, different base classifiers are used to train on the same data. This technique works well for small datasets. The feature selection technique for each base learner is different for the same data. An example of heterogeneous EL is stacking.

Bagging
The term "bagging" is an acronym for "bootstrap aggregation." It is a parallel ensemble process that reduces the variance by averaging multiple trees and increasing the prediction accuracy [77]. Random forest is considered the advanced version of bagging. The following lines summarize how to bootstrap aggregation works: • The random sample of data are selected from the training dataset with replacement technique (which means the individual instance can be chosen multiple times) • In the next step, the base models are trained individually in parallel.    The term "bagging" is an acronym for "bootstrap aggregation." It is a p ble process that reduces the variance by averaging multiple trees and increasi

Proposed methodology
The workflow of our methodology is illustrated at Figure 4.

Dataset: TON IoT
The available datasets used for intrusion detection have some limitations. These datasets are outdated in telemetry data and do not have specific IoT and IIoT applications characteristics. The frequently used datasets (KDD-CUP99, UNSWNB (2015), CIC-IDS (2017), BOT IOT (2018), and CIC-IDS (2017)) in the IDS domain for IoT are compared based on several parameters. Study [48] shows that these datasets did not have various sensors following with diverse new attacks, which could result in a lake of telemetry data. Resultantly, the TON IoT dataset was chosen for the evaluation of EL approaches. The most recent effort to create an IDS dataset was made by UNSW (Australia) in 2020,

Mathematical Biosciences and Engineering
Volume 19, Issue 10, 10550-10580. and developed telemetry data name as TON IoT. It is a publicly available dataset [78]. The name TON shows that this dataset was collected from different Telemetry data of IoT applications, Operating system logs, and Network traffics of IoT networks. The testbed based on realistic representation of medium-scale networks has been set up to collect data from cyber range and IoT labs. The testbed was built using three layers (edge, fog, and cloud) where multiple IoT applications and network elements interacted with each other. The dataset comprises seven CSV files that have sensors telemetry data. The features of each file are presented in Table 4.
Type and label features are common in each file. The Label feature in each file comprises 0 (indicate normal instance) and 1 (indicate the attack instance) values. The label feature performs binary classification, whereas the type feature contains categorical values (indicating the attack type) used for multiclass classification. The dataset contains nine diverse attacks named scanning, password cracking, data injection, XSS, backdoor, and ransomware. Table 3 represents the detail description of dataset. The seven files are combined in one CSV file that holds 401119 instances. A total of 245000 instances were normal, and 156119 were attacked instances. The dataset is well balanced, with normal records accounting for 61% of the total and attacked records accounting for 38%. To evaluate the model on the Ton IoT dataset, it is necessary to pre-process it before using it. Some frequently used methods (cleaning, encoding, and scaling) have been employed to perform preprocessing on ToN IoT. The detail in each method is given below:

Method for feature selection
To select the essential features from the pole of features in the dataset is the most vital step. The dataset chosen contains seven files which comprise up to six parts. The integrated file (combined seven files) holds 22 features vector in total. The chi-Square feature selection technique has been employed to select the top 19 features. Time, timestamp, and date features were removed due to overfitting problems at training time [48].

Method for data cleaning
This step involves handling missing and null values in the dataset. For Ton IoT, the median values are calculated for each feature. Afterward, all the missing/ null values have been filled with the estimated median value. The reason behind the selection of median is that it is less vulnerable to outlier mistakes as compared to mean imputation [79].

Method for data encoding
ML deals with numeric data; therefore, the features in categorical form should be converted into numeric values. The conversion of absolute values into numeric values is data encoding. There were six features in the selected dataset that contained categorical values. Temperature-condition, light-status, phone-signals, door-state, thermostat-status, and type. The robust method "Label Encoding" has been used to perform encoding. Label encoding transforms categorical values into numeric in ascending order. The categorical features with values on/off, true/false, high/low, open/close were converted into nominal values 1/0. The "type" feature containing nine different attack names was converted into numerical values 1 to 9.

Method for normalization
The dataset contains multiple features which have a different range of values. The values of some features were in binary (1/0), and some were in hundreds or thousands, which leads to inaccurate results. So, there was a need to rescale all values in a range. To do this, the Min-Max scalar method has been employed on concerned features before modeling it. Min-Max Scaler method is one of the most widely used scaling algorithms. In this method, the minimum of all data (data in one column of the dataset) is subtracted from each value and divided by the minimum and maximum value difference. The equation of Min-Max scaler is shown as:

Training of model
At the stage of training, the dataset has been divided into two parts (training data and testing data) with a ratio of 80: 20. The ratio (80: 20) for splitting the dataset is more recommended in literature [79,80] due to the fair chance of selection of data in both training and testing sets. Therefore, 80% of data has been used to train the given model, and 20% of data has been used to test the model using the sklearn library's train-test-split method.

Evaluation metrics
To compare the performance of EL methods, the quantitatively used evaluation matrices: precision (PRE), recall (REC), f1-score (FS), and accuracy (ACC) were selected. The detail of each metric is represented in Table 5.

Proposed stacking-based ensemble model
Stacking is a type of heterogeneous EL paradigm used for classification problems. It is also known as stacked generalization [81]. In the proposed stacking ensemble model, two-level estimators were used to solve the classification of intrusion or normal activities. In the first level, all base learners were trained on the subsets of the dataset for the prediction of output. Four classifiers Naive Bayes (NB) [55,63,82] , Linear Discriminant Analysis (LDA) [48,73] Random Forest (RF) [70,71] and Linear SVC [66] were selected as base learners. In the next level, Logistic Regression (LR) [64] was selected as a meta-model that takes predictions of all the base learners as input and produces a final prediction that is more accurate than homogeneous EL models. LR is mostly used for binary class classification problems and, with the default setting, cannot be used for multi-class directly. Therefore, the well-known heuristic method one-vs-rest (OvR) has been used in LR in the "multi-class" parameter to perform multiclass classification. OvR scheme divides the multi-class data into several binary class problems, and then the model is trained on each binary class problem. The probability of each has been calculated. The reason for choosing RF as a classifier is that it performs well on telemetry data [48]. Studies [55] and [82] also claims that DT and NB has better performance in IDS respectively. The model is presented to improve the standalone performance of NB, LDA, and Linear SVM with ensemble techniques for telemetry data.

Results and evaluation
To perform the experiments, the hardware specification of the system includes an Intel (R) Core (TM) i5 CPU running at 3.20 GHz, 16.0 GB of RAM, and Windows 10 Pro. In software specification Jupyter Notebook (Anaconda3) has been used to implement the proposed model in the programming language Python version 3.7.3. Several libraries, such as Pandas and sklearn, etc., have been used to pre-process the model.
The findings and comparison of our suggested model with classic ML models such as LDA, RF, NB, LR, Linear SVC, and LSTM [48] are presented in this section. We have applied K-fold (4,5,7) cross-validation on the given model. The proposed model achieves better results with K = 5. Thus, all the results were obtained using the average value of accuracy, precision, recall, and F-measure. The proposed model is evaluated against each IoT sensor using binary and multi-attack classification. To test the performance of a model for heterogeneous telemetry data (data from different sensors), all the seven dataset files are integrated into one CSV file. The proposed model outperforms binary and multiclass classification.

Performance of binary classification on per-device(sensor) data files
The binary classification was applied to the dataset using the state of art models RF, LR, NB, LDA, SVM, and the proposed stacking-based-ensemble model (Stacking). Table 6 shows the comparison of the results of binary classification of the proposed model and state-of-art method. The proposed model outperforms most individual IoT sensors data file compared to traditional ML algorithms. The experimental results depict that a single ML algorithm does not provide more accurate results for all IoT sensor data files. The performance of the proposed model in terms of accuracy is 100% for fridge sensor and Garage door file compared to linear classifiers such as LR, LDA, NB, SVM with 57, 77, 50, and 81% respectively. The 100% result is that both files contain discrete values, which are easier

Mathematical Biosciences and Engineering
Volume 19, Issue 10, 10550-10580. to work with. The model outperforms both GPS tracker and Mod bus sensor data file, then ML and LSTM (a DL model) with 95 and 96%, respectively. The model gave an accuracy of 97% for weather sensors data. In contrast, The achieved accuracy and other evaluation metrics (precision, recall, and F1-score) from stacking EL are comparably same as the ML models for Garage door and light motion file, presented in Table 7 due to the heterogeneity of data sources in Ton-IoT dataset. The data of each sensor file is automatically combined into a single CSV file (known as merge IoT data) by implementing a python script. The three features, date, time, and timestamp, were removed from the feature vector due to the over-fitting problem. From the results depicted in Table 8, it is evident that model achieves the 86% accuracy as compare to linear classifiers such as LR (61%), LDA (68%), NB (62%), SVM (61%), LSTM (81%). The proposed model has better accuracy, precision, recall, and F1-score results than LSTM in the merge file. The success of our model is due to its ensemble nature which combines the performance of individual classifiers and boosts up the results.

Cross comparison of proposed model and homogeneous ensemble model(bagging)
Bagging is homogeneous EL that uses the same base model for the classification. Different base models such as Decision Tree, KNN, SVC, and Logistic Regression are used in bagging to conduct the experiments. They were finally bagging with Decision Tree as base estimator was selected for comparison because it achieves better results than others. The proposed model, which is heterogeneous in type, is compared with bagging (with DT). Comparison of both homogeneous and proposed heterogeneous techniques for binary classification for each file and merged file are presented in Table 9. The results depicted that the proposed heterogeneous ensemble model outperforms another bagging in

Mathematical Biosciences and Engineering
Volume 19, Issue 10, 10550-10580.   The proposed model is applied for multi-classification with all 19 features of the dataset. The time, timestamp, and date are excluded due to the mentioned reason above. The TON IoT dataset contains a feature name "type." The type feature includes seven different IoT attacks names, which helps identify the attack type. The evaluation of the proposed model for the multi classification problem is presented in this section. The proposed model has LR as a meta-model mainly used for binary classification problems. LR with default setting cannot be used for multi class directly. Therefore, the well-known heuristic method one-vs-rest (OvR) has been used in LR in multi class parameter. OvR scheme divides the multi-class data into several binary class problems, and then the model is trained on each binary class problem. The probability of each has been calculated. Finally, the class with the highest probability has been selected as the final prediction [79]. The results for multi-classification for individual IoT sensor data files are shown in Table 10. The results depict that the proposed model achieves good results for mod bus weather file with all metrics up to 98 and 99%, respectively. The model shows similar results for fridge, light motion, and garage Door sensors. The accuracy score for these files is 66%. The proposed model outperforms all those data files that have heterogeneous sensors data compared to the data files that have binary data in case of multi class classification.

Performance of multi classification for merged IoT telemetry data
The summary of multi classification for merge file is presented in Table 11. The results show that single ML classifiers do not perform well for multi classification. RF has a good score of 71% accuracy whereas other classifiers (i.e., LR, LDA, NB, SVM) performance is not more excellent than 62%. In contrast, our proposed model outperforms with 77% accuracy, 80% precision, and 75% recall. In multi classification, both the proposed model and bagging (DT) achieve almost similar results presented in Table 12 for individual files as well as for merge files in terms of accuracy(bagging = 76% and stacking = 77%).

Discussion
In an IoT network, devices are interdependent to make smart decisions. Attackers capture the data of IoT devices to temper it and alter the decision of other IoT objects within the network. This study examines the performance of ensemble learning for IDS designed for telemetry data. The analysis of experimental results shows that the proposed model gives optimum performance as compared to other single ML models (i.e., LR, LDA, RF, NB, SVM), a deep learning model LSTM, and heterogeneous model (bagging with DT). It is evident from the results that a single classifier does not produce accurate results for all sensors. The proposed model can detect both binary and multi-class attacks for single sensor data and for merge data of multiple sensors. To test the performance of the proposed model for heterogeneous data, the data is integrated into one file from all sensors. Moreover, we have applied Kfold (with K = 4, K = 5, K = 7) cross-validation on the given model. The proposed model achieves better results with K = 5. The RF in proposed model boost up the performance of LDA and SVM. Therefore, our model achieves significant increase in accuracy for both binary and multi-class attack detection.

Conclusions
This study investigates the performance of ensemble learning techniques for IDs of sensors telemetry data in IoT networks by using ML classifiers. A stacking-based ensemble model has been proposed for making the devices more intelligent. The proposed stacking ensemble integrated NB, LDA, Linear SVC, RF as base classifiers, and LR was selected as meta classifier. The experiments were evaluated on a recently publicly available ToN-IoT telemetry data set. The ToN-IoT dataset is an accurate representation of telemetry data. The experimental results reveal that the stacking ensemble technique achieves high accuracy for all sensors data compared to the stand-alone ML classifier and bagging ensemble model. For binary classification, the performance of the proposed model compared to the state-of-art classifiers in terms of accuracy was 100, 95, 97% for the fridge device, GPS tracker device, weather sensors, respectively. However, the accuracy performance for Garage door and Thermostat devices were comparably the same as the ensemble model. The proposed model achieved a significant increase of 86% in accuracy for merge file in binary classification and 77% in multi classification.
In future work, a deep learning model will be used and extended to improve the proposed model's accuracy in identifying unusual and novel types of IoT intrusions in IoT devices.