Hardening the Security of Multi-Access Edge Computing through Bio-Inspired VM Introspection

: The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the emerging technologies that enables the evolution to 5G by bringing cloud capabilities near to the end users is Edge Computing or also known as Multi-Access Edge Computing (MEC) that will become pertinent towards the evolution of 5G. This evolution also entails growth in the threat landscape and increase privacy in concerns at different application areas, hence security and privacy plays a central role in the evolution towards 5G. Since MEC application instantiated in the virtualized infrastructure, in this paper we present a distributed application that aims to constantly introspect multiple virtual machines (VMs) in order to detect malicious activities based on their anomalous behavior. Once suspicious processes detected, our IDS in real-time notiﬁes system administrator about the potential threat. Developed software is able to detect keyloggers, rootkits, trojans, process hiding and other intrusion artifacts via agent-less operation, by operating remotely or directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Experimental results of remote VMI on more than 50 different malicious code demonstrate average anomaly detection rate close to 97%. We have established wide testbed environment connecting networks of two universities Kyushu Institute of Technology and The City College of New York through secure GRE tunnel. Conducted experiments on this testbed deliver high response time of the proposed system.


Introduction
The main idea of the MEC initiative is bringing the clouds closer to the edge of the network as well as to the users. According to the estimation by Cisco Annual Internet Report [1], 5G devices and IoT connections will be over 10% of global mobile devices by 2023. Utilizing MEC architecture as a forefront for 5G mobile networks ( Figure 1) enables technological advancement to both cloud service providers (CSPs) and to businesses. However, these remarkable benefits are not offered without cost [2]. Due to its decentralized computational architecture, resources in MEC environment expanded across different geographical regions [3]. Usage of edge servers and cloud computing poses a number of security risks in various areas such as in Application-Programming Interfaces (APIs), Virtualization and Containerization, Physical machines and others [4]. One of the most critical type of attacks in such a decentralized environment are insider attacks. In this context, insiders can be a CSP employees having access to the physical servers on which user data are stored. Mitigating the risk of potential insider attacks might require CSP to ensure well coordinated routine background checks for their employees, but this will not be enough if someone already uploaded a keylogger or installed a rootkit. Tackling these types of attacks requires "out of the box" intrusion detection system (IDS), which we are presenting in this paper by implementing remote virtual machine introspection (VMI). Proposed IDS operates outside of the infected VM, which lowers to zero chances of being compromised by any sophisticated malware.
In MEC, the adversary compromising Virtual Machines (VMs) are mostly malicious insiders that have administrative privileges or an application that operates with escalated privileges [4]. In this sort of situation, multiple set of attacks can be performed to the VM including attacks to communication links by continuous data eavesdropping between edge nodes and IoT devices [5]. These types of threats open up the affected VMs to numerous other potential attacks such as logic bomb, trojans, spyware and other malicious applications that could compromise the security of other data centers when the infected VM migrates to different physical location. Data encryption is a common mechanism that can be applied here to protect data confidentiality, but typically this cause a significant reduction in computational resources [6].
Proposed work is a real-time, artificial immune system (AIS) based intrusion detection and mitigation solution for MEC servers, which aims to provide autonomous security and constant virtual machine introspection through a high degree of detection accuracy. Developed IDS constantly introspects multiple VMs by tracking the events such as system calls, interrupts, memory reads/writes, number of open files, network activities and other logs. Built-in VMI tool-KVMonitor, remotely accesses VM memory and gathers data eliminating needs to install any software in VM. Remote VMI along with network traffic monitoring provides efficient malware protection mechanism against many new or existing attacks such as spyware, keylogger, worm, rootkit or trojan. Fully automated, real-time, IR-like discovery task directly into cloud fabric through volatile VM introspection. In this paper, we are demonstrating experimental results of our testbed that connects networks between Kyushu Institute of Technology and The City College of New York through secure GRE tunnel. This project considered as part of the worldwide development of an international networking and wireless platform by federating US research testbeds including COSMOS, ORBIT, FABRIC and PEERING with exploratory facilities in Ireland, Greece, Brazil, and Japan [7][8][9][10]. The global platform will enhance experimental research on a wide range of optical, wireless, SDN and NFV, blockchain, inter-domain routing and edge computing experiments at a global scale. Conducted experiments deliver high response time of the proposed system and efficient detection rate. This paper is organized as follows: Section 2 provides overview of related work in intrusion detection for MEC servers and security of Virtual Machines. Section 3 delivers a brief background on MEC and outlines potential threat vectors in this domain. Section 4 explains the artificial immune system based IDS and the way a negative selection algorithm (NSA) is applied in this work. Section 5 presents our intrusion detection system for MEC environment. Section 6 illustrates a comprehensive performance evaluation of the proposed security approach. Section 7 draws conclusion and discusses future work.

Related Work
With the increasing number of cyber security incidents, caused by the large attack surfaces, intrusion detection and mitigation becomes an even more important topic of research. Existing articles in Fog Computing and MEC security mainly focuses on analyzing the aspects of authentication, access control, intrusion detection, and user privacy. Dsouza et al. [11] first described the advantages of Fog Computing as a new paradigm and underlined the security issues in several scenarios, including Man-In-The-Middle (MIM) attacks and privacy issues. Pacheco, Benitez et al. [12] propose a methodology to develop an intrusion detection system based on anomaly behavior analysis to detect when a Fog node has been compromised. The crucial component of their architecture is Artificial Neural Networks (ANN), which requires the dataset of features and constant offline training. In [13] Evmorfos et al. suggested a technique that uses Long Short-Term Memory (LSTM) and Random Neural Networks (RNN) to tackle SYN flooding type attacks in large cloud-based networks. Their detection tactics includes capturing and saving network traffic as pcap files for further processing.
Since MEC application instantiated in the virtualized infrastructure of an MEC host, therefore, research in development of intrusion detection systems for Virtual Machines (VMs) is paramount in this domain. To prevent information leakage from virtual devices and tampering with their I/O data, Futagami et al. [14] proposed a nested virtualization application (VSBypass) that runs the entire virtualized system in an outer VM. Another paper, written by Inokuchi and Kourai [15], addresses user privacy in virtualized environment. They propose a strong user binding to VMs by decrypting its encrypted disk inside the trusted hypervisor. Sethi et al. [16] proposed Intrusion Detection System for cloud infrastructure based on Deep Reinforcement Learning. They have performed extensive experimentation using the benchmark UNSW-NB15 dataset [17]. This publicly available dataset consists of normal traffic and nine types of attack traffic that includes DoS, DDoS, fuzzing, backdoor, analysis, exploit, worm, and shellcode. The limitation of this dataset is the lack of sufficiently many samples for some attack types.
Bio-inspired algorithms' adaptability allows many researchers and practitioners to utilize these techniques in solving many security-related cloud computing issues. In general, biologically inspired algorithms can be classified into four categories, such as Evolutionary algorithms, Swarm algorithms, Immune algorithms, and Neural algorithms [18]. Chiba et al. [19] describe network intrusion detection system in cloud environment by applying Back Propagation Neural Network and Genetic Algorithm. The anomaly detection technique builds a model from normal behavior and any deviation from the normal model is considered to be an outlier/attack. Obinna et al. [20] proposed a Denial-of-Service attack detection based on Artificial Immune System. Their method is based on implementation of negative selection algorithm that allows us to classify data as self-nonself, providing clear distinction between normal profile and abnormal network activity (i.e., DoS attack).
Some of the existing works on cloud-based IDS and VM introspection were used publicly available datasets. However, these datasets were not designed for Software Defined Networks (SDN). Many existing datasets described here were created by recording and processing pcap files utilizing different tools. Some other listed works focused on Big Data Cogn. Comput. 2021, 5, 52 4 of 21 application of bio-inspired algorithms only in detection network based attacks through local VM introspection. Therefore, there is a high possibility that those systems will not be able to detect anomalies in already compromised MEC servers. An important idea of remote VMI presented in this work is that IDS residing in a client machine is able to access multiple VMs running on different hosts across the world. Therefore, the proposed system can introspect many VMs and at the same time cannot be compromised by malicious application launched on remote host or guest machines.

Security Threats in Multi-Access Edge Computing
MEC technology aims to reduce communication latency between IoT devices and centralized cloud by storing data at the edge of network rather than at some distant data centers [21]. The key aspects of MEC implementation will be massive bandwidth, compute and storage availability at remote locations, reduced latency, minimized network traffic and less compute on the device. Shifting computing requirements from devices to the edge nodes at MEC will reduce energy consumption and deliver new devices, such as portable augmented/virtual reality (AR/VR) headsets. In addition, organizations will benefit from MEC by developing a scalable and efficient IoT capabilities known as Mobile Internet of Things (MIoT) [22].
MEC architecture is designed for optimal softwarization of functions and efficient infrastructure utilization. As shown in Figure 2, all MEC applications and application platform services are software applications running on hardware components that consist of multiple virtual machines. This design allows us to lower the cost of hardware components by combining off-the-shell elements with function virtualization. For example, the MEC virtualization manager layer shown in Figure 2 provides Infrastructure as a Service (IaaS) facilities, which will provision for flexible and efficient multi-tenancy, run-time and hosting environment for MEC application platform services. Compared to conventional on-premises data centers, the diversity of MEC architecture and its distributed nature awakes different types of vulnerabilities and privacy concerns. We have classified these threats into three main categories based on attack vectors: virtualization, application and network based threats. Table 1 illustrates a summary of categorized data security challenges in edge computing. • Virtualization threats. Virtualization is an integral attribute of MEC architecture. The hypervisor being a controlled mechanism of virtualization layer, controls access to hardware resources. This allows each VM to run on a shared hardware and at the same time to hide the presence of others. While virtual machines provides an isolated secure environment there are several types of vulnerabilities they can be exposed to. If an attacker succeeds in taking control of the hypervisor either being a malicious insider or through injection of a fraudulent hypervisor then this considered as a hyperjacking attack. Spurious hypervisor will manage the entire system and regular security measures will be ineffective in detecting this adversary. VM escape is another threat factor when the process is "breaking out" of the VM where it is running and gets access to the host machine [24]. MEC infrastructure can be affected by VM manipulation attack, where software with escalated privileges or a malicious insider can take control of VM and perform necessary modifications. In addition, arbitrary container access manipulation can lead to a control takeover attack on the container, and there is a possibility of data manipulation or data leakage through open API vulnerabilities in MEC applications. • Application threats. Third party applications running in MEC servers can pose a fatal security threats by exposing virtual machines to different malicious applications. With current AI-driven technologies cybercriminals can develop more sophisticated malware to perform data-tampering attacks by injecting a malevolent client. Injection attacks occur when malicious code is embedded into unsecured software. SQL injections and XSS (cross-site scripting) are well known examples of this type attack. Attacks targeting hypervisor software or container engines can lead to data leakage within MEC applications [21]. Often, indirectly launched attacks through remotely controlled software or other malware-infected applications can spread to many MEC applications. Keyloggers, rootkits, spyware/adware, worms, ransomware, trojans and other villainous threats poses potential risks for virtual machines. Often hackers can exploit the vulnerability in software before developers can detect it, that exploit becomes known as a zero-day attack and all three layers in Table 1 can be vulnerable to zero-day attack. • Networking threats. Denial of Service (DoS) attack is one of the most common and agelong threats in network infrastructure. DoS can shut down a machine or network by flooding the target with traffic, or sending it information that triggers a crash [20]. An additional type of DoS attack is the Distributed Denial of Service (DDoS) attack. A DDoS happens when multiple systems orchestrate a synchronized DoS attack to a single target.
The essential difference is that instead of being attacked from one location, the target is attacked from multiple locations at once. In MEC systems, DoS attacks can carry only limited damage to network as described in [4]. Consequently, localized architecture of edge data centers prevents major damage to its core components. MEC systems are also vulnerable to Man-in-the-Middle (MitM) attacks, when a hacker or malicious agent intercepts and alters communication of two or more parties while they believe that they are communicating with each other directly [25,26]. Software Defined Network (SDN) opens up an avenue for DNS spoofing attacks, when adversary divert traffic to an IP address other than where it was originally directed. Therefore, since the MEC architecture relies on virtualization, this type of attacks can disrupt not only multiple connected VMs but also affect all other elements of infrastructure.
Proposed work mainly focuses on detecting security threats in MECs from application and networking perspective. Our approach is based on implementation of remote virtual machine introspection (VMI) followed by applying screening through artificial immune system (AIS) algorithms. Inspired by theoretical immunology, AIS found its application in various Computer Science problems [27]. Utilizing evolutionary algorithms provide efficient anomaly detection mechanism through continuous training on normal models and producing a set of patterns (detectors).

Artificial Immune System Based Intrusion Detection
IDSes are usually classified by their approach of detecting attacks. Two main categories are signature-based detection and anomaly-based detection. One of the largest drawbacks of signature-based IDSes is that they mainly rely on signature database in order to detect attacks. Therefore, they may not recognize a new type of attack if its not listed in their database. On the other hand, anomaly-based IDSses typically work by taking into account a baseline of the normal traffic and any deviation from the normal considered as a threat [4]. Conversely, there can be a large number of false positives from anomaly-based IDSes compared to signature-based detection techniques. Therefore, efficiency of anomaly-based IDSes depends on multiple factors such as types of implemented algorithms, targeted systems, amount of generated input data, type of selected features, application complexity, response time and so on.
Artificial Immune System (AIS) is an area of artificial intelligence that focuses on algorithms abstracted from the models that exist in immunology [28]. These computational models of algorithms inspired by the principles of human immune system (HIS) and have characteristics of learning, adaptation, self-organization, memory and scalability. By imitating HIS these algorithms have developed as an effective solution for scientific computing and engineering applications [27]. Among various applications are data mining, pattern recognition, anomaly detection, predictive analytics, industrial control systems and IoT.

Negative Selection Algorithm
There are several class of algorithms inspired by various immunological theories. The most common of them are Clonal Selection Algorithm, Immune Network Algorithm, Negative Selection Algorithm (NSA) and Dendritic Cell Algorithm. This class of algorithms are generally used for classification and pattern recognition problems, where the problem space is modeled based on available knowledge. NSA is inspired by positive (self) and negative (nonself) selection process that resembles analogy of the human immune system (HIS). Forest et al. [29] describe initial steps of NSA as randomly generated detectors, such as B cells in HIS. These detectors later in the process are used to match with incoming set of data for anomaly detection. Many alternatives of Negative Selection Algorithm have been developed since the original version was first introduced [29], but despite this the original NSA is still popular. The main idea in NSA is that given shape-space U is divided into two sets: a self set S and a nonself set N, as shown below NSA consist of two phases: detector generation phase and nonself detection phase. The generation of detectors Figure 3 involves screening the entire system to obtain its normal profile. This considered one of the challenges in NSA because self elements do not remain unchanged through the whole time. Therefore, continuous learning and building a self pro-file is an important factor of the algorithm. Once the normal profile is obtained, we utilize Genetic Algorithm to generate candidate detectors that differ from normal set [27]. Nonself detection phase is a separate process that constantly utilizes previously built set of detectors to determine potential anomaly. Algorithm 1 illustrates a pseudocode of a basic negative selection algorithm. A detector is defined as is an m-dimensional point that corresponds to the center of a unit hyper-sphere with r d ∈ R as its unit radius. For the generic NSA shown in Algorithm 1, r d = r s .
Where S -set of normal/self profiles, T max -max. number of detectors, r smatching threshold. 3: Generate a random detector (d) 6: if d does not match any element in S then 7: end if 9: end while 10: for All new incoming samples ν ∈ ∪ do 11: if ν matches any element in D then 12: Classify ν as a nonself sample 13: end if 14: end for 15   A block-diagram in Figure 4 illustrates two NSA phases. During the detectors generation process any candidates that matched with self samples are removed. Once the number of obtained detectors is sufficient the algorithm terminates generation process [27]. End Yes Figure 3. NSA detector generation phase. generation process is halted when the desired number of detectors is obtained. HH: Once the number of 307 obtained detectors is sufficient the algorithm terminates generation process [24].  HH: The distance between detectors and self features was calculated using Squared 309 (Euclidean)distance. This can also be derived using any real valued distance measures.
HH: In proposed work, we are constantly applying NSA to the data obtained from a The distance between detectors and self features was calculated using Squared (Euclidean) distance. This can also be derived using any real valued distance measures.
In proposed work, we are constantly applying NSA to the data obtained from a VM through virtual machine introspection. If at any point IDS detects a match then this counts as a potential anomaly. Given a set S that has a subset of [0, 1] m , we can describe a feature vector as x = (x 1 , x 2 , ..., x m ) in [0, 1] m . By employing Genetic Algorithm initial set of candidates are being generated randomly and called candidate detectors. These detectors later in the process evolve to cover more areas around the self set. This happens during each iteration, when the radius of each detector is calculated as r d = dValue − r s , where r s is the variable distance around a self, Figure 5. During an iterative process, detectors are moved away from self data and the other generated detectors. Depending on coverage, to eliminate the gap between self and nonself data, different sized detectors were produced and evaluated on each generation. A clone of detector is generated by moving center of the original detector by a fixed distance to its proximity. In addition, new random detectors are introduced to explore new areas of the nonself space. The overlap between two detectors are also computed in terms of the distance dValue between their centers and radii. The detector generation process terminates when a set of mature detectors evolved that can provide significant coverage of nonself space. Figure 6 shows the flow diagram of the Genetic Algorithm process for generating variable-sized negative detectors. In the detection stage, the list of stored detectors are used to check whether new incoming samples correspond to self or nonself instances. If an input sample matches a detector, then it is identified as part of nonself, which refers that anomaly/change has occurred (see Figure 4b).

Proposed Security Approach
Proposed IDS provides security in MEC environments through automated, bioinspired analysis of network flows, VM system calls and memory readings. Comprehensive intrusion detection process is based on two main components: KVMonitor-a lightweight and secure VMI module that gathers data from virtual machines and Artificial Immune System based IDS that employs KVMonitor and performs analysis of collected data. Composed information is being constantly compared to the list of detectors, any match directly reported as anomaly.
First step of provided approach lies on detailed study of many different malicious applications in order to discover what features of the system have been affected. Every process, whether this is a normal system application or hidden malicious script, creates a fingerprint in the system by constantly changing several system-wide parameters. As a simple example, every running process has some memory consumption that can be tracked. At the same time, we can count how many times a particular process accessed the network, communicated to any drivers, how many child processes it has and so on. Table 2 illustrates eight different features and their values for four normal and abnormal processes running in Linux based VM. First column represent process ID numbers (PID) of running processes. All other columns are system/network based features: system calls Write() and Read(), RssFile-size of resident file mappings, Open Files-number of open files, Socket-number of attached file descriptors, amount of TCP and UDP connections followed by a system call Send()-number of bytes sent by the process. A person can differentiate processes listed in Table 2 without knowledge of normal/abnormal, only based on represented values. We automated this process using Negative Selection Algorithm (NSA) giving prior knowledge about benign and malignant behavior. To generate detectors, features are being converted to the binary tuples in accordance with predetermined string matching rules [27].
Algorithm 2 demonstrates converting process for values of system call Write() to the binary form. Depending on the feature, conversion process can be limited until the value reaches certain defined constant. Applying similar transformation rules for every other feature, final representation of binary form will be as shown on Table 3. Feature column on this table is a concatenation of all features after converting them to the binary form. Lines colored red are detectors since they belong to abnormal processes. On the contrary, green lines are the features that belong to normal processes. In this example, length of detectors l = 20 and PID = 1876 (nonself) consists of the following tuples: . For any C i [s] and partial matching threshold r:  Having a list of detectors and given a collection of self-strings S as an input along with a matching rule and a partial matching threshold, present algorithm is able to recursively compare all the bits on each tuple and return whether a match occurs or not-nonself detection phase, Figure 4b.
To generate binary detectors, in other words, a set of features that correspond to abnormal processes, DEAP (Distributed Evolutionary Algorithms in Python) package has been utilized in the proposed application [30,31].
The overall architecture of proposed application shown in Figure 7 implements security in MEC environment through remote analysis of VM's network and obtaining system level process information. Using Evolutionary Algorithm within the DEAP framework and feeding it in advance with a set of normal features, application generates sufficient amount of detectors. During this process, Squared (Euclidean) distance (Equation (2)) is implemented as a fitness function to measure the distance between self and randomly generated nonself features.
KVMonitor is a pivotal component of proposed IDS that provides continuous and remote VMI. VM memory introspection is performed through accessing a memory file withing the host machine without any interruption to running system. By executing cr3 command using QEMU monitor protocol (QMP), KVMonitor connects to QEMU-KVM and then can send necessary QMP commands as shown in Figure 8 [32]. Through the KVMonitor proposed IDS can also introspect network of a VM by constantly capturing packets from virtual NICs. To introspect virtual disk, KVMonitor links an available loopback device (e.g., /dev/loop0) to a disk image by the losetup command, followed by producing device maps and mounting them in a similar way to a disk image with qcow2 [32].
Version September 19, 2021 submitted to Big Data Cogn. Comput.

Detector Generation Using Genetic Algorithm
A detection rule considered sufficient if it is not covering any positive samples (the self features) and it covers a large area of nonself space. Considering self-space S as a subset of [0, 1] n and a feature vector x = (x 1 , x 2 , ..., x n ) in [0, 1] n , then a detector can be represented as a "detector rule" in the form and ... and X n ∈ [low i n , high i n ]. Here m is the number of detection rules and n the dimension of the Euclidean space [27]. Pseudocode for the Genetic Algorithm on generating detectors illustrated in Algorithm 3. In this approach we assume that all detectors had the same shape and size; particularly, hyper-spheres of a fixed size radius r in an n-dimensional space and size. The fitness function for a rule R  for j = populationSize/2 do 4: select two individuals (with uniform probability) as parent1 and parent2 5: apply crossover to generate an offspring (child) 6: mutate child calculate distances between child and both parents 7: f c = f itness(child) 10: f p1 = f itness(parent1) 11: f p2 = f itness(parent2) 12: if (d1 < d2) & ( f c > f p1) then 13: replace parent1 with child 14: 15: replace parent2 with child 16: end if 17: end for 18: end for 19: return best (highly-fitted) individuals Volume of the effective coverage of a detector was approximated as the volume of the inscribed hypercube. The parameter α denotes a coefficient of sensitivity, which for a specific rule determines the trade-off between the volume covered by it and its interception with the self-set. Therefore, fitness is calculated as a sum of the fitness values of all evolved rules minus the overlaps between hypercubes defined by the rules.

Nonself Detection
List of generated detectors is integral part of nonself detection phase. Results of constant VM introspection are passed to the IDS as a hash table consisting process ids as keys and features as values. Application on the fly starts matching of received data with the list of nonself detectors. Simplified version of matching algorithm is shown below (Algorithm 4). return result 12: end function Algorithm compares set of incoming features with the set of nonself detectors by taking intersection between two sets. Result of intersection x holds list of features being matched. If the length of matched features are equal or greater than the matching threshold r s then it is being added to the resulting dictionary. By introducing matching threshold as an argument, we create flexibility for users to manually increase or decrease detection accuracy. For instance, setting it lower than half number of overall features, increases amount of false positives returned by application. This is useful to observe how certain normal processes are close to abnormal based on their anomalous behavior.

Experimental Evaluation
In this section, we provide an experimental evaluation of the proposed security approach using simulation environment similar to SDN-managed IoT network. We performed experiments to evaluate the distributed intrusion detection and mitigation model in terms of its effect on VM system and network parameters during various attack scenarios. The experiments were conducted on a client machine with Intel Core i7-8750H @ 2.20 GHz processor and 16 GB RAM to introspect remote VM with Intel Xeon Silver 4114 Processor @ 2.20 GHz and 8 GB RAM running on Ubuntu 18.04 LTS based remote host with the same processor on 8 cores and 131 GB RAM.
For deployment of the proposed IDS, the testbed setup illustrated in Figure 9 was used. Secure GRE tunnel was established between the labs of two universities Kyushu Institute of Technology in Japan, where the host OS and VM have been deployed, and The City College of New York, where the client machine with IDS has been launched. The maximum bandwidth of each link in the network was limited to 100 Mb per second. Modern CISCO Gigabit Smart Switch with 50 ports has been used to manage the network.  The City College of New York, where the client machine with IDS has been launched. 451 The maximum bandwidth of each link in the network was limited to 100 Mb per second.

452
Modern CISCO Gigabit Smart Switch with 50 ports has been used to manage the network.

453
To evaluate the detection rate and system/network performance we conducted  Table 4. To evaluate the detection rate and system/network performance we conducted experiments with more than 50 different malicious applications: trojans, adware, worms, keyloggers, spyware, hyperjacking attack using preinstalled rootkit on a VM. In this paper we provide results of experiments conducted on four different open-source malicious applications listed in Table 4. Table 4. Malware used in experiments and detection results.

Logkeys 99
Multi functional GNU/Linux keylogger. Logs all common character and function keys [33].

Rootkit 94
Linux based rootkit that listens to certain ports, gives escalated privileges to user, has built-in keylogger and able to hide from common scanners [34].

Stitch 97
Cross-platform keylogger with remote administrative tool. User can select payload to bind into specific IP and port, listens for a connection on that port, has option to send an email of system info when the system boots, and option to start keylogger on boot [35].
TrojanX 94 Basic Trojan application written in Swift with minimal GUI client on Mac OS. Uses shell scripts to set up the system to use SOCKS proxy [36].
Proposed application successfully detected all listed vulnerabilities and classified every flow entry based on triggered features. VM introspection time for KVMonitor in this experiment was set for 10 s. Within this time frame intrusion detection application periodically received data from KVMonitor and performed nonself detection process. In case if anomaly is detected, application triggered push notifications and sent email with detailed information about potential vulnerability. Figure 10 shows the action upon detection of Logkeys keylogger [33] running on remote VM. Depending on different threat models the output information varies. In the following subsections, performance measurements and detection rate of our IDS system are reported.

Network Performance Results
The maximum available bandwidth of all the links between the switch and hosts in our network were set to 100 Mb per second. We have utilized perfSONAR toolkit (Performance Service-Oriented Network monitoring Architecture) [37], to measure, identify and isolate network problems in established testbed. Built-in iPerf3 tool was used to measure the available bandwidth between two ends of GRE tunnel. The packet sending rate was 1000 packets per second and the payload of the packets was 1000 bytes.
We measured the feature retrieval time taken by KVMonitor from remote host machine with respect to data flow in the switch using our IDS application. Figure 11  One of the important measurement we have conducted is determining the time during which application retrieves features from the VM. It is paramount to retrieve features quickly in order to detect potential attacks. It is also important that the process of retrieving features will not affect productivity on the client machine. Figure 12 shows the flow entry collection by KVMonitor up to 20,000 flow entries in the R2 switch and despite that IDS application collected features for all of the flows in 416.4 milliseconds, which does not cause much overhead for the application on client side.
We observe that the feature retrieval time increases linearly with the number of flow entries in the switch. However, our IDS performs feature processing on the fly and does not wait to finish every flow entry in the switch before taking action. Once data received, application calculates feature vector converting raw values into binary forms followed by classification and all takes 54 milliseconds when the switch has 1000 flow entries, shown on   formance comparison of KVMonitor with Xen described in [29] and [35] shows that 503 KVMonitor was 48 times faster than Xen in accessing VM memory and during introspec-504 tion. This is mainly because Xen is repeating mapping and unmapping for each memory 505 page of VM. KVMonitor, in contrast, can map the whole memory at first and access it 506 like heap memory [29]. Our remote host has eight cores and to measure average CPU 507 utilization for different processes we used the Linux iostat and top commands. Under 508 normal conditions, most of the time CPU usage of the host were 0%. 509 Figure 13 shows the CPU utilization on the host machine during the normal state 510 and while KVMonitor accesses the VM memory file. Remote offloading was 8.5% slower 511 than local due to network delay and additional booting time using VNC and SSH.

Memory and CPU Measurements
Most of the listed vulnerabilities developed on purpose to not waste system resources and act quietly. Especially keyloggers and rootkits are behaving as unprivileged programs and surreptitiously eavesdropping all the keystrokes typed by the user. Performance comparison of KVMonitor with Xen described in [32,38] shows that KVMonitor was 48 times faster than Xen in accessing VM memory and during introspection. This is mainly because Xen is repeating mapping and unmapping for each memory page of VM. KVMonitor, in contrast, can map the whole memory at first and access it like heap memory [32]. Our remote host has eight cores and to measure average CPU utilization for different processes we used the Linux iostat and top commands. Under normal conditions, most of the time CPU usage of the host were 0%. Figure 13 shows the CPU utilization on the host machine during the normal state and while KVMonitor accesses the VM memory file. Remote offloading was 8.5% slower than local due to network delay and additional booting time using VNC and SSH. During intervals of VM introspection, CPU of the host machine correlated around 2.8-3.4 %, which is normal because KVMonitor is only accessing QEMU-KVM memory file stored in the host in qcow2 format. The way how KVMonitor acquire access to the VM is implemented by utilizing a cr3 command that comes with QEMU Monitor Protocol (QMP). The latter provides user-friendly JSON-based structure that enables access to certain functionality. This provides fast and reliable way to access memory of a virtual machine and extract necessary data [32].
Main processing power is balanced on the client machine, where IDS constantly receives raw data from KVMonitor and implements nonself detection phase. In this experiment, we used client machine with six cores and two threads per core. As shown on Figure 14 CPU utilization of feature retrieval on the client machine was around 35-45% during the period of active processing. 19, 2021 submitted to Big Data Cogn. Comput.  Protocol (QMP). The latter provides user-friendly JSON-based structure that enables 520 access to certain functionality. This provides fast and reliable way to access memory of a 521 virtual machine and extract necessary data [29].  Protocol (QMP). The latter provides user-friendly JSON-based structure that enables 520 access to certain functionality. This provides fast and reliable way to access memory of a 521 virtual machine and extract necessary data [29].

522
Main processing power is balanced on the client machine, where IDS constantly 523 receives raw data from KVMonitor and implements nonself detection phase. In this 524 experiment, we used client machine with six cores and two threads per core. As shown 525 on Figure 14 CPU utilization of feature retrieval on the client machine was around 35-45% 526 Figure 14. CPU utilization in the Client machine during normal state and IDS activity periods.

Detector Generation and Nonself Detection
For detector generation process we used set of 200 records-self samples, from different category as input to generate nonself records. Using Genetic Algorithm (GA) within Python DEAP framework [30] we generated close to 61,000 detectors, sample shown in Figure 15.
Application responsible for generating detectors utilize multiprocessing package that offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using sub-processes instead of threads [30]. This significantly reduces time of evolutionary algorithm, which takes on average 4-6 s to generate list of 61,000 detectors. Constant parameters for GA provided in Figure 16 represent size of generated detectors 24, initial population of random detectors 500, number of generations 200, amount of pool workers in multiprocessing 4 and constant memory page size 4096.  The average F1 score (detection rate) of the nonself detection of listed in Table 4 malware was 96.86%. We divided experiments into two parts, first by exposing remote VM separately to each of the listed malicious applications and measuring performance along with the detection time. Second, we exposed remote VM to all four listed malware simultaneously and then launched our IDS. In both cases anomalies were detected with almost similar rate and IDS successfully responded on time. Figure 17 shows the results of detection rate with respect to each malicious application. Experimental results with only 8 primarily system-wide features after disabling network features (total number of sent packets, source/destination ports, TCP/UDP usage, and so on) speeds up detection process four times with average F1 score of 82.48%.

Conclusions
In this work, we proposed an automated, intelligent intrusion detection and mitigation approach for MEC servers, which aims to provide explainable security in the IoT networks of the 5G era. Proposed approach relies on Artificial Immune System based intrusion detection with built-in automated virtual machine introspection module. It has been successfully tested on remote VM over established GRE-based secure testbed between Kyushu Institute of Technology and The City College of New York. Presented results show performance evaluation as well as accuracy of intrusion detection on various types of malicious applications. Residing outside of the guest machine, described IDS cannot be subverted by any malware running in VM. This also provides significantly low performance impact on VM and the host machine.
The proposed security approach is promising for achieving real-time, highly accurate detection and mitigation of attacks in MEC-based servers, which will be in widespread use in the 5G and beyond era. Our future work will include an extension of current classification mechanism to more attack types and network topologies. We aim to run more experiments by increasing number of remote virtual machines within our testbed environment. Through continuous collaboration with different universities we plan to expand our testbed across the world creating cloud-based simulation environment with multiple availability zones. Continuous research on Linux kernel to explore more features will increase detection accuracy. In parallel, we continuously working on improving performance by developing more secure and reliable application.

Data Availability Statement:
The data supporting the reported results in the present study will be available on request from the corresponding author or the first author.