M-IDM: A Multi-Classification Based Intrusion Detection Model in Healthcare IoT

: In recent years, the application of a smart city in the healthcare sector via loT systems has continued to grow exponentially and various advanced network intrusions have emerged since these loT devices are being connected. Previous studies focused on security threat detection and blocking technologies that rely on testbed data obtained from a single medical IoT device or simulation using a well-known dataset, such as the NSL-KDD dataset. However, such approaches do not reflect the features that exist in real medical scenarios, leading to failure in potential threat detection. To address this problem, we proposed a novel intrusion classification architecture known as a Multi-class Classification based Intrusion Detection Model (M-IDM), which typically relies on data collected by real devices and the use of convolutional neural networks (i.e., it exhibits better performance compared with conventional machine learning algorithms, such as naïve Bayes, support vector machine (SVM)). Unlike existing studies, the proposed architecture employs the actual healthcare IoT environment of National Cancer Center in South Korea and actual network data from real medical devices, such as a patient’s monitors (i.e., electrocardiogram and thermometers). The proposed architecture classifies the data into multiple classes: Critical, informal, major, and minor, for intrusion detection. Further, we experimentally evaluated and compared its performance with those of other conventional machine learning algorithms, including naïve Bayes, SVM, and logistic regression, using neural networks.


Introduction
Nowadays, information and communication technology is increasingly applied to the the healthcare sector in smart city infrastructure, the foundation of which is network technology for data transmission and reception. Network ows in such infrastructure are also increasing in Previous studies mostly focused on security threat detection and blocking technology (based on testbed data composed of a single medical IoT device or simulator) [5][6][7][8][9][10][11][12]. However, such approaches lack re ection of features that exist in the real world.
Therefore, in this study, machine learning technology was applied to classify network events into four different classes (critical, informal, major, and minor) using data collected by real devices in order to suf ciently re ect the complex network ow and characteristics of the actual healthcare IoT environment. We built real world data-based models using a neural network-based multi-class intrusion classi cation algorithm for these classes.
To address the above problems in healthcare IoT, we proposed a Multi-class classi cation based Intrusion Detection Model (M-IDM) for healthcare IoT in a smart city that relies on machine learning techniques. The contributions of this paper are as follows: • We proposed a novel intrusion classi cation architecture based on machine learning techniques to overcome problems related to the detection of unknown attacks in healthcare IoT.
• A service scenario is presented to classify the security event in the network as "normal" or "anomaly (critical, major, minor)" based on various features.
• We experimentally evaluated and analyzed the proposed model architecture using a large amount of data to demonstrate its practicability and feasibility.
The structure of the rest of this paper is as follows. Section 2 discusses related works on intrusion detection and machine learning. Section 3 proposes a prediction model using machine learning algorithms for intelligent network intrusion detection. Section 4 provides analysis and comparison of the existing and proposed models for network intrusion detection. Finally, Section 5 summarizes the main ndings of this study and the concluding remarks.

Intrusion Classi cation
This procedure is divided into the network intrusion detection system (NIDS) and host-based intrusion detection system (HIDS) according to the detection location. The NIDS analyzes the network traf c, and the result is combined with other technologies to increase the performance of the detection and prediction speed. In particular, arti cial neural network-based intrusion detection systems can recognize intrusion patterns more ef ciently, which helps them analyze large amounts of data. Meanwhile, the HIDS monitors important operating system les and the inbound and outbound packets of the device and also sends alerts in cases of a suspicious activity.
Classi cation techniques can be divided into signature-based and anomaly-based methods. Signature-based methods search for speci c patterns, such as byte sequences of network traf c or sequences of known malicious instructions using malware. In contrast, anomaly-based methods can easily detect known attacks but show poor detection performance in the case of new attacks in which patterns cannot be used. Anomaly-based methods are primarily used to classify unknown attacks due to the rapid development of a malicious code. Essentially, the machine learning algorithm is used to create a reliable model, then, its operations are compared. Although unknown attacks can be detected, this method may also result in false positives. An ef cient feature selection algorithm must be used to enhance the reliability of classi cation [13][14][15][16][17][18][19].

Machine Learning
In theoretical terms, machine learning is a eld of arti cial intelligence in which algorithms are developed that enable machines to learn and execute operations that are not speci ed in codes. Representation and generalization are the key elements among the many features that are involved in machine learning. Representation refers to the evaluation of given data, whereas generalization refers to processing of unknown data. In practice, the three key elements of machine learning are the training set, model, and inference. The training set refers to data used for learning, the model is the output obtained through the training set, and the inference is the training output prediction based on input values through actual data [20,21]. Fig. 1 summarizes the above descriptions. In a conventional program, data are input and the program presents the results of processing the input data. However, when machine learning processes the data, the model (algorithm) developed from the training dataset provides the prediction results of the input values in the test dataset. Hence, machine learning algorithms are suitable for solving problems where it is dif cult to explain the sequence or reasoning clearly [20]. The machine learning model was selected based on whether the data were labeled or not; if the data are labeled, supervised learning models are used to perform classi cation and prediction, whereas if the data are unlabeled, unsupervised learning models are used to perform clustering.
The two models are different, but when applying actual data to the model, a harmonized methodology is used because labeled data are rare [20][21][22].

Existing Research
Kabir et al. [23] proposed an algorithm that selects representative samples from sub-groups so that the samples faithfully re ect the entire dataset. In the optimal allocation technique, leastsquares support vector machine (SVM) is applied to the extracted sample to detect intrusion after generation based on the diversity of observations in the subgroup.
Wang et al. [24] proposed an effective intrusion detection framework with improved functionality based on SVMs, emphasizing that high-quality training data are important for enhancing detection performance. In this framework, log marginal density ratio conversion is implemented to achieve high-quality SVM detection.
Farnaaz et al. [25] constructed a model for an intrusion detection system using a random forest (RF)-based classi er. The RF algorithm is used to detect four types of attacks: denial of service (DoS), probe, U2R, and R2L attacks. Cross-validation is adopted to achieve accurate classi cation, and a feature selection algorithm is applied to the dataset to remove redundant or irrelevant features.
Swarnkar et al. [26] proposed a novel and ef cient data structure called the probability tree structure. If not found in the database in the test phase, or if the probability of packet occurrence is not found in the training phase, then the short sequence is treated as an anomaly. The possibility of an abnormal short sequence is used to generate the class label for the test packet. Some intelligence algorithms are utilized to optimize the parameters of machine techniques for feature selection or feature weighting in network intrusions. In this regard, Yang et al. [27] presented a modi ed naïve Bayes algorithm based on the arti cial bee population algorithm.
For the search strategy, Khammassi et al. [28] applied a wrapper method based on a genetic algorithm, whereas for the learning algorithm for network intrusion detection, they used a method that selects the best subset of functions by applying logistic regression.
Caminero et al. [29] rst applied hostile reinforcement learning to intrusion detection and proposed a novel technique that integrates the behavior of the environment into the learning process of the modi ed reinforcement learning algorithm. The researchers demonstrated that the proposed algorithm is appropriate for supervised learning based on labeled datasets and veri ed its performance through comparisons with other well-known machine learning models for two datasets.
To identify a variety of unauthorized use, misuse, and abuse of computer systems, Liu et al. [30] proposed an adaptive network intrusion detection technique based on the selective ensemble of a kernel extreme learning machine with random functions.
Handling redundant or irrelevant features in high-dimensional datasets has been a long-term challenge in network anomaly detection. Removing these features through spectral information not only speeds up the classi cation process but also helps classi ers make accurate decisions during instances of attack recognition.
Salo et al. [31] proposed a new hybrid dimension reduction technique, namely the principal component analysis-ensemble technique, using an ensemble classi er based on information gain, an SVM, an instance-based learning algorithm, and a multi-layer perceptron.
Divyasree et al. [32] proposed an ef cient intrusion detection system using the ensemble core vector machine (CVM) method. The CVM algorithm, which is based on the minimum enclosing ball concept, detects attacks such as U2R, R2L, probe, and DoS attacks. CVM classi ers are modeled for each type of attack; chi-square tests are used to select the relevant function for each attack, and the functions are weighted for dimension reduction.
Al-Jarrah et al. [33] presented a semi-supervised multi-layer clustering (SMLC) model for network intrusion detection and prevention. SMLC, which achieves a detection performance similar to that of the supervised ML-based intrusion detection system (IDS) intrusion prevention system (IPS), performs learning using partially classi ed data. SMLC's performance is identical to those of algorithms that make up the layers of the well-known semi-supervised model (tri-training) and the supervised RF, bagging, and AdaboostM1 machine learning models.
Hady et al. [34] built a real-time testbed to monitor patient biometrics and collect network ow metrics. They combined network ow data with a patient's biometric data to improve system performance and used it as a training dataset. The proposed system improved the area under curve (AUC) by up to 25%. The aforementioned system used four machine learning methods: RF, K-nearest neighbors, SVM, and arti cial neural network.
Gao et al. [35] developed a feature set speci cally for implanted medical devices and conducted experiments to test the performance of different learning algorithms including decision tree, SVM, and K-means algorithms. The study showed that decision-tree based algorithms achieved the highest detection accuracy, low false-positive rate, and fast training and prediction speed compared with other algorithms. In addition, several other researchers discussed intrusion detection from different perspectives, including distributed DoS attacks, deep packet inspection, emotion classi cation, and network sub-slicing [36][37][38][39].
In this paper, we demonstrated that a model created using machine learning based on extracting actual data from the hospital environment can respond to the security threats of IoT medical devices, which are otherwise dif cult to manage. Moreover, it is useful to classify detailed risks to enable greater focus on serious events in an IoT medical device mass produced from heterogeneous medical devices, as it shows that it is possible to classify threats of four labels beyond simple binary classi cation with high accuracy.
In summary, existing studies demonstrated that machine learning is a good approach to support network intrusion detection in communication and distributed infrastructure. Thus, this paper presents an M-IDM to respond to the security threats of IoT medical devices, which are dif cult to manage, through a model trained by extracting actual data from the hospital environment. The proposed model shows that it is possible to classify threats of four labels beyond simple binary classi cation with high accuracy.

Multi-Class Intrusion Classi cation Model (M-IDM)
The proposed security model M-IDM relies on the concept of intrusion classi cation in which a machine learning model is trained over the baseline dataset to classify the anomaly behaviors from legitimate ones. Unlike existing studies, the proposed M-IDM uses the actual healthcare IoT environment of the National Cancer Center, South Korea, and actual network data from real medical devices, such as a patient's monitor, including electrocardiogram and thermometers. Moreover, it employs convolutional neural network (CNN), which exhibits better performance compared with conventional machine learning algorithms such as naïve Bayes and SVM, to classify the data into multiple classes (critical, informal, major, and minor) for intrusion detection.
This section describes the architectural design overview of the M-IDM, including major module data description, data preprocessing, and service scenario.

Proposed Model Architecture
The architectural design overview of the proposed M-IDM is shown in Fig. 2. It consists of ve stages: Input data, preprocessing, feature extraction, classi cation, and output.
During the input stage, raw data is accumulated, which includes network traf c, logs, scan from internal medical sources, vulnerability database, threat feeds from technical sources, social media, forums, and dark web from human sources. Preprocessing eliminates some inappropriate, multifunctional, or noisy data that might be present in subsequent raw data. The feature extraction component provides extraction and speci cation of the relevant features, including network security event data such as the IP, port, protocol, and severity from heterogeneous medical devices to support security threat classi cation in the healthcare IoT environment. The classi cation module is responsible for creating a trained model with relevant features from the preprocessed data. It uses various machine learning algorithms for classi cation purposes.
Here, the processed data is divided into training and test data. The classi cation model is trained using only the training data. The trained model is then repeatedly validated using the validation data. The process either proceeds to the next stage or corrects the parameters, learning method, etc., based on the validation results, and training is repeated. The model is completed through this process. In the output stage, the actual values are input into the model completed in the previous stage to con rm the classi cation. The classes are normal and anomaly (critical, major, minor).

Data Description
The proposed M-IDM architecture uses the actual healthcare IoT environment of National Cancer Center, South Korea, and actual network data from real medical devices, unlike previous studies. The dataset was collected from a total of six medical devices with the same IP band, such as a patient's monitor, an electrocardiogram, a thermometer, a sphygmomanometer, a hygrometer, and a fall prevention bed with an alarm watch, which is used in an isolated internal-medicaldevice-only wireless network. There is a network tab device con gured using the mirror method for transmitting and receiving all traf c between the medical IoT device and gateway. We obtained monthly logs of all traf c going through this rewall to the gateway. Out of the 300,000 cases collected (12 months), 100,000 cases (4 months, approximately 833/day) were selected in an even distribution. For the data label, four risk labels de ned in the rewall were used: Normal, critical, major, and minor.
The network event data consists of 11 features: one target variable and ten explanatory variables for machine learning, as listed in Tab. 1. The target variable is the severity classi cation value of each event, that is, normal, critical, major, and minor. The type of source/destination IP refers to the type of IP that attempts to access or receive access from the device, that is, privateinternal, public-external. The date variable was recorded as year, month, day, hour, and second based on when the event was created. Flag is used in the TCP ag, that is, URG, ACK, PSH, RST, SYN, and FIN.

Data Preprocessing
There are two types of dataset attributes in the proposed M-IDM: Symbolic and numeric. The data set attribute is numeric. However, the data of symbolic properties cannot be directly processed. Thus, it is necessary to convert symbolic data to numeric data. Tab. 2 lists symbolic attributes and their associated values. In the table, the two rede ned attributes "Working hour" and "Type of source/destination IP" have a value of 0 or 1; in this case, these can be processed in the same way as the numeric attribute. Furthermore, "Protocol" and "Flag" attributes were converted as a sequence of integers, that is, Protocol: 1-17, Flag: 1-6, after being represented as on-hot vector. The protocol attribute has 17 unique values; similarly, the ag attribute is de ned with 6 unique values. Many approaches have been proposed for handling symbolic attributes. In an experiment conducted as part of this study, we employed a method that uses conditional probability and dummy indicator variables to process protocol and ag properties [40,41]. However, using only this method increases the dimension of the dataset; thus, we clustered similar types in symbolic attributes. In Tabs. 2 and 3, it can be observed that the dimension is reduced by clustering into different classes for different protocol properties. The study in [42] also performed clustering into a similar type of symbolic attribute. Then, we converted these classes into indicator variables as presented in Tabs. 3 and 4. Data scaling was performed because normalized data is required to perform classi cation. Remaining protocols All other services In this experimental evaluation of the proposed M-IDM architecture, the selected data (i.e., 100,000 cases or instances) were randomly sampled and divided into training or labeled data and testing or unlabeled data. The ratio of training and testing dataset was 90:10, where 90% (i.e., 90,000 instances) is training data and the remaining 10% (i.e., 10,000) is testing data.

Service Scenario
This section describes the service scenario of the proposed M-IDM, which classi es the security event data into classes of "normal" or "anomaly (critical, major, minor)." Fig. 3 shows a schematic diagram of the service scenario for the proposed model. This section describes details of the procedures for each stage from to .

Figure 3: Service scenario of proposed M-IDM model
The details of the service scenario are as follows: Data separation: All security event data collected on the healthcare network are randomly sampled and divided into training and test data. The separated data are used to generate the model through learning and to validate the reliability of the model. Model training: The learning algorithm is selected considering various conditions; then, the parameters are adjusted according to the algorithm and learning is performed using only the training data from the data separated in . After assessing the precision of the learning model using the test data, this process is repeated by applying different parameters and algorithms and other methods until the desired result is obtained. The processes in and are performed in batch form. Real-time classi cation 1: The model generated in is applied to the classi er; then, the real IoT medical devices network security event data (the real data do not overlap with the data in ) are input in real-time. The input data are rst classi ed as "normal" or "anomaly" using a trained model that is not based on rules.
Real-time classi cation 2: The IoT medical devices security event data classi ed as "anomaly" in are further classi ed as "critical," "major," or "minor." The processes in and are performed in real-time.

Experiment and Performance Evaluation
In this study, we experimentally evaluated the performance of the proposed M-IDM, which was developed by employing CNN algorithms in Python 3.7.0 environment with orange. We selected a CNN by validating its classi cation performance and those of conventional machine learning algorithms such as naïve Bayes and SVM. The CNN has the structure: Input → Conv → Maxpool → Fully Connected → Output, where the weights and bias parameters are w 1 = (10, 1, 3, 3) and b 1 = (10, 1) for the Conv layer, w 2 = (1960, 128) and b 2 = (1, 128) for the fully connected layer, and w 2 = (128, 10) and b 2 = (1, 10) for the output layer. We set other training parameters (i.e., learning rate, no. of epochs, number of iterations) as 0.5, 1, and 1500, respectively.
The speci cations of the PC used for the experimental setup are as follows: CPU i7-8700 3.2 GHz, memory 8 GB, and graphic card RTX 2060 4 GB. Several standard measures, such as precision, recall, area under the receiver operating characteristic curve (AUC), and F1-score were used.

Effect of Number of Instances
To achieve an objective comparison of the proposed algorithm against existing conventional algorithms, the precision, recall, AUC, and F1-score [43][44][45] were investigated for different number of instances. When using the same data (sampling type: 10-fold cross-validation, target class: average over classes), the CNN exhibited the best performance in all items for all the number of instances, as presented in Tab. 5.  Excluding the SVM in which the precision was signi cantly reduced, the naïve Bayes and logistic regression approaches (N = 100,000) both yielded AUCs of at least 0.932. Tab. 6 presents the detailed classi cation results for each class. The prediction rate for each class of the M-IDM, which exhibited the best performance (N = 100,000), was con rmed using a confusion matrix. According to the results, the "major" class had a relatively low prediction rate of 87.7% compared with the other classes (with prediction rates of 94.3%-98.8%). Figs. 4 and 5 also show the AUC for each machine learning method for the same data (N = 100,000). Compared with other ML methods, the proposed method showed better performance for each class. In particular, the SVM showed a large deviation of 0.5 (more or less) for each label. The threshold for each method and label was set to 0.5.

Impact of Class
This section describes the effect of the number of labels on the prediction. The same data and conditions were used in these tests as those used for the M-IDM algorithm (N = 100,000), which exhibited the best performance, as presented in Section 4.1. The accuracy for each label was con rmed as the number of labels was increased from two to four. Fig. 6 shows the accuracy of each algorithm in terms of predicting a certain label based on the number of classes. The following rates were observed: Anomaly 99.3% and normal 94.4% at two classes; critical 93.5%, major+minor 85.3%, and informal 95.9% at three classes; and critical 98.6%, major 87.7%, minor 97.7%, and informal 94.3% at four classes.
All the algorithms showed good accuracy of 85.3%-99.3%. At four classes, the accuracy by class ranged from 87.7% to 98.6%, where "major" had a relatively low accuracy of 87.7% compared with the other classes.

Analysis of M-IDM
We compared the ndings of this study with those obtained in existing studies based on various aspects. Tab. 7 summarizes the result of the comparison based on 10 aspects of methodology, number of feature/record/class/hidden layers, minimum/maximum AUC, veri cation, data source, number of device types, and detection range. In Tab. 7, "Methodology" indicates the main method used in this study, and "Number of feature/record/class/hidden layers" indicates basic information of data or learning model. Furthermore, "Min/Max AUC" denotes the method used to perform model learning. "Validation" is a job con rming that the result of a learning model have enough delity. "Data source" is the environment from which the data was extracted, and "Number of device types" is the number of devices used to generate training data. "Detection range" indicates the range of detection from sensor to server.
In existing studies, binary classi cation is mainly used and only simple classi cation is possible. Moreover, because the number of devices used for data acquisition and generation is from a testbed, it is dif cult to re ect the characteristics that occur in a mixed environment of heterogeneous devices. However, this study classi es various classes while considering the constraints of the IoT environment by acquiring traf c logs that multiple actual IoT medical devices communicate with and learning from the data an environment in which heterogeneous IoT medical devices are mixed. Detection range (sensor-gatewayserver) Gateway-server Gateway-server Gateway-server Sensor-gateway (edge node)

Computational Complexity
We evaluated the complexity across the proposed model. As shown in Fig. 7, we observed the average of the calculation resources (CPU and memory) for each data size. As the data volume increased, more average calculation resources were required. Additionally, the ratio of data growth and computational resource use was compared. When the volume of data doubled, the average computational resource usage increased by up to a factor of 1.3, and when the data volume increased by ve times, it increased by up to a factor of 1.8. Therefore, it can be observed that the computational overhead of the proposed model is not a signi cant problem.

Conclusion
In this study, we proposed a multi-class security event classi cation model based on machine learning. The proposed model was built using real-world data and neural network-based multiclass intrusion classi cation algorithm for four classes. This work suf ciently re ects the complex network ow and characteristics of a real healthcare IoT environment, and machine learning technology was applied using data from real devices to classify network events into four different classes. In future work, more meaningful features should be found in security event data before re ning to enhance the performance of the proposed approach, and methods should be developed to improve the somewhat low accuracy for rare classes to address the problem of data imbalance between the classes.

Con icts of Interest:
The authors declare that there is no con ict of interests to report regarding the present study.