Computer

The enormous number of network packets transferred in modern networks together with the high speed of transmissions hamper the implementation of successful IT security mechanisms. In addition, virtual networks create highly dynamic and flexible environments which differ widely from well-known infrastructures of the past decade. Network forensic investigation that aims at the detection of covert channels, malware usage or anomaly detection is faced with new problems and is thus a time-consuming, error-prone and complex process. Machine learning provides advanced techniques to perform this work faster, more precise and, simultaneously, with fewer errors. Depending on the learning technique, algorithms work nearly without any interaction to detect relevant events in the transferred network packets. Current algorithms work well in static environments, but the highly dynamic environments of virtual networks create additional events which might confuse anomaly detection algorithms. This paper analyzes highly flexible networks and their inherent on-demand changes like the migration of virtual machines, SDN-programmability or user customization and the resulting effect on the detection rate of anomalies in the environment. Our research shows the need for adapted pre-processing of the network data and improved cooperation between IT security and IT administration departments.


Introduction
Nowadays IT environments are a key factor in our modern life. Modern data centers form the basis for our everyday digital life. Digital services play an important role, in the private life (i.e. as a backup for photos, videos and files or as a shared online calendar) as well as in many professional areas such as the financial sector, development & research or the office environment. With the evolution of cloud computing, ubiquitous use of computers, digital services and resources become more and more usual in our everyday life, which has led to a demand for faster connections and higher data rates. To fulfill these demands, modern data centers require a highly flexible infrastructure, which is adaptable without any further administrative work. The introduction of various virtualization layers like virtual machines (VM), virtual networks (VN) and virtual storage provides such an on-demand infrastructure. This evolution led to the implementation of advanced techniques like container-based environments, which create a dynamic infrastructure like the different cloud services Software-as-a-Service, Platform-as-a-Service or Infrastructure-as-a-Service [1].
Cloud service providers (CSP) use the virtualization inside their data centers to comply with the demands of their customers. Especially the use of VMs or containers improves the on-demand provision of new ✩ This paper is an extended version of Spiekermann and Keller (2020). Revised December 2020. * Corresponding author.
systems. These systems create additional flexibility in the environment. Not only the life cycle of containers or swarms have an impact on the internal dynamic, even the migration of a VM increases this adaptability. In addition, user customization for internal changes inside the VM and its assigned networks create extra benefits. But the connection of these systems with a hardware-based network hamper the necessary adaptability. Only with use of VNs the data center plays to its strength. These VNs work on an additional layer in the environment which led to the designation of an underlay and an overlay network as shown in Fig. 1. The overlay networks perform various tasks. On the one hand, a VM needs access to the internet and maybe to different internal or external networks. On the other hand, the same VM has to be separated from VMs of other customers to ensure an isolated environment. Typically, this separation is done with virtual networks, which run on top of the hardware-based underlay network.
Protocols that create the overlay network are so-called virtual network protocols, which are used for the interconnection of the different VMs in modern infrastructures. With these protocols, VMs of one customer are connected together in a logical subnet, which is separated from other subnets of different customers. While the hardware-   underlay network and its addressing scheme, routing rule-set or security features could be implemented in a static manner, the virtual network provides the flexibility and dynamic needed to interconnect the VMs. Various protocols exist to implement a VN, each of them with a different focus. The first technique to implement a virtual subnet was the use of Virtual LAN (VLAN) [2]. The increasing demand in a data center led to the development of adapted protocols Generic Network Virtualization Encapsulation (GENEVE) [3], Network Virtualization using Generic Routing Encapsulation (NVGRE) [4], Stateless Transport Tunneling (STT) [5] and Virtual Extensible LAN (VXLAN) [6].
The easiest way to implement a virtual network is by using the wellknown VLAN protocol that is however limited to only 4096 1 subnets, which is not sufficient in modern networks. The most notable protocol to implement these virtual networks is VXLAN, which is similar to the VLAN protocol, but expands its features and adds some new. VXLAN increases the maximum number of subnets to 2 24 = 16, 777, 216 networks by using a 24 bit virtual network identifier (VNI). Fig. 2 shows the VXLAN-header, its position in the entire frame and the use of UDP as the encapsulating protocol.
Due to the evolution of these dynamic environments, the protection of the network and the internal services gains more and more in importance. Therefore, the detection of attacks or the occurrence of anomalies in the environment is a relevant part of the IT security implementations. A modern network is target of various attacks, either from inside or from outside attackers. The types of attacks vary, from exploited misconfigurations like in 2018, when Amazon S3 buckets with more than 70 million records were leaked due to poor configuration [7], to ransomware attacks like 2019, when one of the biggest US data center providers CyrusOne was attacked [8]. A virtual network facilitates new vulnerabilities on its own, therefore various attacks against the virtual infrastructure exist. [9] presents Network Harvester, an implementation of attacking the isolation of network devices in SDN environments. They evaluated their attack with common SDN controllers like ONOS and Floodlight. [10] categorized 1 VLAN headers use 12 bit VLAN IDs to implement up to 2 12 = 4096 subnets. different attacks against SDN devices and network function virtualisation (NFV) and summarized them as related to Network function virtualisation, virtual layer, orchestrator manager and virtualized infrastructure manager. [11] describes the implementation of covert channels inside a virtual environment and analyzes the possibility of data hiding in network protocols like VXLAN or GENEVE. These attacks show the need for an effective protection of the environments. By using different security mechanisms like firewalls or intrusion detection systems, providers try to increase their overall IT security and counter these kinds of attacks. But the increasing number of network packets transferred inside the environment in combination with the high speed of connections as well as the huge amount of attacks make this task complex and expensive.
Advanced techniques like machine learning (ML) try to support the provider by the detection of anomalies in the network traffic which might be an indicator for unknown attacks against their infrastructure.
ML and its impact on cyber security is a fast growing research area, which results in the definition of different algorithms and an improved analysis of unknown data. One of the most important parts for ML in networks is the detection of anomalies, which [12] defines as . . . the problem of finding patterns in data that do not conform to expected behavior.
The detection of anomalies in the network is part of classification problems [13]. This type of problem can be described by classifying data points to given categories [14]. A wide-spread classification in IT security is the detection of SPAM [15]; here an incoming e-mail is checked against a set of features. To analyze an e-mail correctly, the classifier has to be trained with benign and malicious data, hereby it learns parameters which indicate SPAM mails.
Anomaly detection in networks tries to find changes concerning e.g. the mix of packets in a network. This might be an indicator for the beginning of an attack or a current data leakage [16]. The detection of covert channels with the help of ML is a recent research area [17]. Modern malware uses covert channels to transfer their payload or to exfiltrate sensitive data [18]. The detection of such attacks requires good knowledge of the traffic which is typical in this environment. An outlier of this known traffic might be an indicator for a security issue, therefore it calls for additional investigation of such traffic.
Because of this, machine learning algorithms heavily depend on an environment with some mostly static parameters. Common classifiers use network flows or parts of protocol headers to create a benchmark data-set, which is used to train the algorithm. If a certain level of deviations is reached, e.g. if a threshold is exceeded, the classifier will detect an anomaly and start a pre-defined process like logging or alerting.
Unfortunately, changes are inherent in virtual networks, so a machine learning algorithm will produce various false-positive messages, which therefore demand a reconfiguration of the training sets. This is a common task in company networks and not a new problem in IT security [19]. But changes in virtual networks occur much more frequently. Additionally, these changes are mostly unpredictable, because both administrator and customer are able to adapt their part of the network. The administrator might change some parameters of the virtual environment like the deployed protocol or only some protocol fields. The customer might change the internal IP addresses used in the assigned logical subnet. Because of the separation of the layers, both changes should not have any impact to the other part of the network.
Hence, we investigate the impact of virtual networks on the detection capabilities of machine learning algorithms for malware detection by finding network anomalies. We do this by creating a cloud computing environment based on OpenStack with various virtual network protocols that transfer specific information in this test environment. We capture and analyze the traffic with different ML algorithms focusing on anomaly detection.
Our contributions can be summarized as follows: • We point out challenges for machine learning algorithms when used for forensic anomaly detection in virtual networks. • We identify changes in virtual overlay and underlay networks that may point to anomalies like malware. • We perform simulations of virtual networks to create data sets of captured network packets with above changes that can be used to parameterize, train and evaluate forensic machine learning algorithms for virtual networks. • We evaluate forensic machine learning in virtual networks with above data sets.
The remainder of this paper is structured as follows. In Section 2 we summarize related work from virtual network forensics and machine learning for anomaly detection. Section 3 describes the methodology for data collection and analysis, together with relevant changes in virtual networks that define the data to collect. The implementation of the deployed ML algorithm is discussed in Section 4. In Section 5 we evaluate the different situations and their impact on anomaly detection based on IsolationForest and LocalOutlierFactor. Section 6 concludes this paper and gives an outlook to our future research.

Related work
Virtual networks and modern data centers necessitate a change of the well-known methods of digital investigation in hardware-based networks as discussed in [20,21]. [22] describes the arising problems of network forensic investigation in virtual networks. [23] defines an SDN model usable to perform secure network forensic investigation in nowadays data centers, especially when they are distributed over different locations. The need for a special process to implement valid investigation in modern environments is discussed in [24]. A further discussion about the problems of packet captures for law enforcement in modern data centers is discussed in [25]. An important task in a data center's security strategy is the detection of abnormal behavior inside the transmitted network packets [26]. So, the anomaly detection is a part of forensic investigation, but in some cases the results of such a monitoring has to be accessible much faster. [27] discusses the use of machine learning aspects as an implementation of automated network forensics. [28] discuss problems and countermeasures of anomaly detection in big data networks. The most notable protocol in modern data centers apart from ethernet is the Internet Protocol. [29] analyzes various sources of network data like routing or management protocols and special network probes.
The research of anomaly detection in virtual networks is thin. [30] describes the detection of distributed denial of service (DDoS) attacks in virtual networks with the help of the network analyzer Bro. The analysis results are used to configure parts of the virtual network with the help of OpenFlow. Anomaly detection in modern networks is discussed in various papers. [31] describes the detection of anomalies in a small test environment based on pre-defined network packets and measures the impact of different changes in the network. [32] proposes the use of IP-flow records for anomaly detection with Support Vector Machines (SVM), which improves the performance in high speed networks. The detection of anomalies or outliers is a crucial task in modern networks, and as an internal component a detection system might be faced with adversarial attacks. [33] presents a survey of adversarial attacks against intrusion detection systems. The authors define six different goals, namely evasion, poisoning, over-stimulation, denial of service, response hijacking and reverse engineering. [34] defines evasion and poisoning as the most relevant attack models against machine learning algorithms.
In contrast to the aforementioned research, our present research tries to identify changes in virtual networks originating from anomalies such as malware in a systematic manner, and to create data sets of captured networks packets in virtual networks. We use those data sets to evaluate forensic machine learning in virtual networks.

Anomaly detection in virtual networks
Anomaly detection is a critical task in modern networks, either to detect advanced attacks like covert channels or DDoS. Occurring anomalies in the network might be an indicator for malicious behavior, so using ML to detect such behavior is a common technique. Whereas anomaly detection in traditional networks is well researched, detection algorithms are faced with new challenges in virtual networks.
For a structured procedure, we first introduce the process model that we follow. Our research focuses on the impact of virtualization in a network and the inherent changes on ML algorithms that are used to detect anomalies in the network. A virtual environment provides a huge flexibility on different layers. Both overlay and underlay might create relevant changes in the network infrastructure, which therefore leads to a measurable impact for anomaly detection. Hence, we also analyze possible changes in overlay and underlay networks, which allows to define the data collection in Section 4.

Process model
A successful implementation of ML for network forensic investigation like anomaly detection, malware analysis [35] or event reconstruction [36] depends on various parameters like valid packet captures, correct data extraction and the use of a suitable ML algorithm. An established method in digital investigationo ensure the correctness of a digital investigation is the use of so-called process models or frameworks which define the necessary steps, mostly separated in different phases. Whereas different frameworks for anomaly detection in networks exist [37,38], there is no specific framework with a special view regarding the dynamic of a virtual network. As shown in [39], digital investigations in a virtual network require adapted frameworks which are able to manage the flexibility of the environment.
We propose the use of the process model defined in [40]. The authors define six steps to implement ML for network analysis:

• Problem formulation
In this phase the investigator defines various parameters of the analysis. ML algorithms are often time-consuming, so a detailed definition of necessary input data and ML categories is quite relevant for the subsequent steps.

• Data collection
In this phase the relevant packets are captured. In traditional networks, this step is easy to implement [20,22], but virtual networks increase the complexity of network packet capture processes [41].

• Data analysis
This phase comprises all necessary steps to transform the captured packets into a usable format for the subsequent steps. The captured data might be stored in raw, pcap or pcap-ng-formats 2 and transformed into formats like Netflow, sFlow, csv, json or other user-defined structures. Typically, these techniques do not store the entire network packet but various header information from different layers of the OSI model. Whereas Netflow and sFlow use layer 3 and layer 4 protocols, the other formats might extract information from all other layers. The definition of the necessary data depends on the intended analysis.

• Model construction
The construction involves the training, testing and tuning of the learning model.

• Model validation
In this phase the model is validated to ensure its quality. If errors occur or improvements are needed, all prior tasks are involved to eradicate these issues.

• Deployment and inference
This phase comprises all relevant steps to implement the ML process in the operational environment with a focus on resource usage, accuracy and performance.
Problem formulation is covered in the next subsections. Data collection is described in detail in Section 4. Model construction and validation is done in Section 5. We did not focus on the last phase of deployment and inference in depth, because we limit our approach to the analysis of different changes without an optimization of the deployed algorithm or the selection of the best algorithms.

Changes in overlay network
The part of the overlay protocol is defined by the internal networks, which are under administration of the user. By this the customer is able to change different settings of a VM or the assigned network on his own, which might result in relevant changes.

• Internal IP addressing
Typically, the internal network uses private IP addresses from a predefined subnet as described in [42]. The user of this network is able to change this internal addressing scheme without involving the CSP or any administrator of the cloud environment.

• VM Life Cycle
A VM runs inside the virtual environment and is under control of the customer. So, the customer is able to start and stop virtual machines on his own, which leads to an unpredictable behavior affecting every IT security feature focused on this part of the network.

• Addition or Deletion of VMs
A customer is able to start new VMs within seconds and to connect them to the internal network. Typically, this task is initiated by using the web interface of the cloud environment. If a new VM is started, the internal network changes.

Changes in underlay network
A cloud service provider (CSP) is free to change the underlying network whenever needed, which might lead to a fully different network behavior without affecting the virtual network of the users.

• Overlay protocol
Overlay protocols like VXLAN, GENEVE or STT are used to create the different isolated and separated networks of the different customers. If the CSP changes the separation protocol of the internal virtual networks from VXLAN to GENEVE, this only needs a quick change of the internal transfer mechanism. A possible change is the use of an improved protocol like STT, which implements the use of special features of the network interface card. By this the resource usage of the CPU is reduced, which improves the overall performance of the network.

• Migration
A huge benefit of virtual environments is the migration of VMs. This can be done for availability reasons, e.g. when the hosting server has a critical problem which requires a reboot, or for providing an improved environment, e.g. when different VMs on a single host demand for a higher CPU usage. If a VM is migrated, the system is moved from one hosting server to another. In case of such a migration, some parameters of the VM (like the IP address or MAC address) do not change, but the underlying structure of the network needs some adaptations to create this flexibility. This might include the creation of a separate tunnel depending on the protocol as well as the deletion of existing ones. • Programmability of the network The use of SDN inside a virtual environment provides additional flexibility in the infrastructure. By implementing new or adapted features, the network is able to change its behavior on demand. SDN decouples the traffic management from the traffic forwarding. An SDN controller manages the flows of network traffic inside the environment by communicating with connected devices via the so-called southbound-API. The most notable protocol for this communication is OpenFlow. By adding or deleting OpenFlow rules, a packet flow inside the network is adjusted, which provides a huge flexibility and dynamic in the environment. This programmability is used to implement different applications like network management [43], security [44], quality of service [45] or network forensic [39].

Implementation
This section describes the collection of relevant network data, the definition of our packet based feature set and the algorithms used for anomaly detection.

Data collection
In [31] we analyzed the impact of virtual networks when using machine learning for anomaly detection inside a simple network infrastructure. To measure the impact in a more realistic scenario, we extend the original test environment and created a cloud environment based on OpenStack with two user networks as shown in Fig. 3. This environment provides all relevant aspects which facilitate the analysis of the aforementioned occurrences. The virtual environment is installed on three Dell Poweredge R6415, each with 32GB RAM and three network interface cards (NIC). One NIC of a server is dedicated to the tenant network, the connection between the servers is done with a Cisco WS-C2960X-24PS-L gigabit ethernet switch. We installed Ubuntu 18.04 LTS as the underlying operating system with OpenStack release Train. As the SDN controller, we use OpenDaylight. The deployed protocols VXLAN, GRE and GENEVE are the relevant ones of the most notable cloud environment solution OpenStack [46].
In the green network in Fig. 3, we installed a webserver based on nginx as the frontend and a MySql database as the backend. The webserver hosts two different files, a single html document and a php script that collects some information from the mysql database running on the different system. To increase the number of packets in the network, we added a dedicated VM providing the php-fpm functionality. By this setup, a single request for the php script results in the communication with the php-fpm service, and after that with the database.
The blue network contains a single VM that repeatedly, with random intervals in-between, requests an internet site from the top 500 websites 3 via curl. This produces common network traffic, which is normal in cloud environments.
All administrative work is done via the web interface of the Open-Stack installation. We focus on the so-called tenant networks which define the networks for the VMs and the customers. All other networks like backup or management are not in the scope of this paper. This infrastructure provides a low-level environment, which helps to focus on the impact of the changes. We assume that an implementation of networks with more VMs or connections would not result in different detection rates, but hamper the analysis due the existence of more network packets stored during the collection process.
At first, we validated the usability of the environment by capturing the network data on two different layers of the network. A capture process in the overlay network results in a packet capture without any overlay information, whereas a packet capture performed in the underlay network gathers all involved protocols, which includes the overlay protocol. Fig. 4 shows a packet capture without any encapsulation information. In this case there are information of layer 2 (starting with Ethernet II), layer 3 (starting with Internet Protocol Version 4 (IPv4)) and the transmitted data (in this case the Internet Control Message Protocol) available. Fig. 5 displays a packet capture of the same network packet, but this time encapsulated with VXLAN, which is used by UDP. By this, there are two ethernet headers and two IPv4 headers in one captured packet.
Whereas some connection-specific details are different, the timestamp of the ICMP packet is the same in both captures. So, the capture processes create two files with the same network information encapsulated in different layers. For the further processing of the data, we performed some data sanitization for the per-packet analysis. To remove irrelevant packets, we removed all ARP packets from the capture files due to the fact that these do not contain any relevant information.
To create some anomalies, we injected some crafted network packets, which use illegal combinations of network headers. For creating those packets, we used scapy, 4 a python framework for the creation, manipulation and capturing of network packets. The packets were designed with the following variations (alone and in combination). We injected these packets into the network using tcpreplay, so that they are recorded during the capture process.
The use of the crafted network packets provides a simplified implementation of the possible changes as mentioned in Section 3.2. Table 1 shows the implementation details, which are used to create a specific event to produce a change inside the network.

Feature set
The definition of the correct features to train the ML algorithms heavily depends on the goal of the analysis. So, the selection of significant parameters which differ benign from malicious traffic is still a difficult process, which depends on various parameters and aspects of the network. Different research uses feature extraction algorithms based on the network traffic to define a usable feature set. [47] describe the feature selection with the help of the Poisson Moving Average (PMA), [48] describes the use of Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA). But the use of these algorithms might have some drawbacks, especially the small size of the data set might result in an insufficient data set. Because of this, we focus on a manual selection of features to be used.
The information used in the feature set is variable, and might contain various information like packet details of the different layers, results of deep packet inspection [49] and time or flow-specific values [50].
The use of network flows improves the results and the time needed for the analyses by limiting the amount of information used for the algorithm. A flow is a group of packets with common protocol details like IP addresses or port numbers. So, a flow does not contain any application data, thus this limits the traffic that has to be analyzed in contrast to a per-packet analysis [51]. Common analysis based on network flows use the following five fields [52]: • Source IP address • Destination IP address • Source port number • Destination port number • Layer 3 protocol type In contrast to this, [53] add statistical parameters to the feature set. Especially the number of transmitted bytes are considered as relevant. We use a mostly predefined communication scheme, so the amount of traffic is static to align the different scenarios. Timing parameters are relevant in productive networks, but our test bed is based on a internal LAN, so timing parameters like duration of the flow or delta between packets are negligible. Because of this, we focus on the use of packet details.
In [54] a first feature set of 23 parameters is considered to detect DDoS attacks. After measuring the importance a set of eight parameters were defined as necessary. In contrast, [55] uses only four features consisting only of the parameters source and destination IP address and combinations of them. Advanced attacks like covert channels do not only use application protocols like HTTP or DNS, but implement their information in lower level protocols like IP or TCP.
So, all aspects of Ethernet, IP and TCP/UDP as well as generic values like the length of the frame are relevant. These changes cover modifications on the underlay network, while changes inside the virtual networks typically concern changes of the IP addresses used. The use of length parameters of the protocols like frame.len and ip.len is useful for the detection of various covert channels as discussed in [56]. So, there is no defined list of relevant protocol fields to be used for anomaly detection in network forensic investigation. In contrast to our research in [31], we changed the deployed feature set. The first approach was D. Spiekermann and J. Keller   reduced to application information related to the used ICMP protocol. The analyses in this research focus more on a realistic environment, so we need to adapt the used features. Similar to our first analysis our network bases on an internal LAN without any effects from external networks like packet loss, changing transmission times or different routes between the involved hosts, so all timing parameters as well as the Time-to-live of the packets were discarded. Table 2 lists our feature set, the name of the feature derives from the display-filter name used by Wireshark.
We implement a packet-based anomaly detection in contrast to a flow-based approach due to the fact that we want to measure the impact of small changes in the network. A reconstruction of the flows in a virtual network requires the focus on the traffic inside the tenant network, otherwise the UDP-based encapsulation of the VXLAN protocol led to similar flows, which might irritate the algorithms.

Outlier detection
Anomaly detection with ML is part of supervised learning and can be done by a classification of the different values [57]. The algorithms use different methods to calculate the anomaly score of the different changes. Common algorithms are Decision Trees, Support Vector Machine, k-nearest Neighbors (k-NN) and probabilistic methods [14,58]. The detection of outliers in network traffic can be performed in two different ways.
• The first step to detect outliers in network traffic is the definition of normal or benign traffic and later the comparison of the network data to this baseline of the data. k-NN algorithms implement this kind of calculation by computing a local density deviation of a given data point with respect to its neighbors. An outlier has a lower density than its neighbors. LocalOutlierFactor (LOF) [59] is an algorithm which uses this density. • Algorithms like IsolationForest (IF) [60] define anomalies as . . . 'few and different', which make them more susceptible to isolation than normal points.
. Instead of trying to build a model of normal instances, it explicitly isolates anomalous points in the data set. IF algorithms as a branch of Random Forests detect outliers by randomly selecting features and isolating them by a value between the minimum and the maximum values of this feature.

Evaluation
To evaluate our results, we used capture files produced by the data collection phase. The number of packets in each scenario is listed in Table 3.
We use the IsolationForest -algorithm of the scikit-learn-framework, 5 which provides the function IsolationForest.predict() as an indicator whether a packet is classified as an anomaly or as a normal behavior. This value marks an anomaly with the score of -1 and a normal behavior with 1.
As an application of the first approach, we implement LOF, which calculates a value of 1 if it is a normal behavior, and a value between 1.2 and 2.0 if it is an anomaly. sklearn provides LocalOutlier-Factor.negative_outlier_factor_ as an indicator for the outlier detection. IF and LOF are unsupervised algorithms, so we do not need any training data, and each packet of each capture file was analyzed by IsolationForest.predict() and LocalOutlierFactor.negative_outlier_factor_. In addition to this, we focus on a detection of changes in the network and their identification as an anomaly. The impact of changes in a virtual network and the detection as an anomaly as a result of such a change is the main part of this evaluation, therefore we did not focus on the improvement of the detection algorithms and did not evaluate the correct classification of every network packet.
To validate the detection, we repeated the research of [31], and analyzed the detection of outliers based on the ICMP messages without any overlay protocols. Both algorithms, IF shown in Fig. 6 as well as LOF shown in Fig. 7, detect the ICMP messages successfully as outliers, which validates our implementation. Fig. 6 shows the result of the IsolationForest.predict(), the bar around −0, 2 defines the existence of abnormal packets. Fig. 7 defines the existence of abnormal packets with a red circle, which marks values that have a lower density than their neighbors. The -axis represents the network packets based on its values, and the -axis defines a value of anomaly based on the density. The radius of the red circle is the outlier score calculated by LOF. The calculation is done with IF provides two scores to classify a value as an anomaly. By using the IsolationForest.predict()-function, IF marks an anomaly as -1. The use of IsolationForest.score_sample results in a negative value. The larger the absolute value, the more abnormal is the value. The analysis of the ICMP packets shows an anomaly score of -0.59377377, which defines an abnormal network packet. As the value is not close to −1, it might be an indicator that this network traffic is not easily detected as an anomaly.
LOF calculates anomalies with a score of -1 when using LocalOut-lierFactor.negative_outlier_factor_ [61], thus the network packets with the IP protocol number of 1 (which defines ICMP packets) are marked as an anomaly. The ICMP packets are marked with -1, whereas the TCP datagrams as the layer 4 protocol are marked as normal packets with a value of 1. 5 Details can be found at: https://scikit-learn.org/stable/.  Both algorithms detect the anomalies in our data set correctly. In addition to this, the crafted network packets were successfully identified as an anomaly by both algorithms, too. Therefore, we focus on the use of IF and the resulting graphs for the next steps.
The next step was the analysis of the data with overlay protocols, but without any changes in the structure as discussed in Section 3.2. Both techniques successfully detect the ICMP packets and the crafted packets as an anomaly, both with small changes to the aforementioned analysis, shown in Fig. 8 (the bars around −0.2 → −0.15 define the existence of abnormal packets) and Fig. 9.
The next experiments analyze the different changes as discussed in Section 3.2.

• VM migration
Due to the fact that OpenStack does not support any kind of live migration, we created a snapshot of the VM and restarted the frozen VM on another host. There were no detectable changes in the overlay network, but the underlay network creates a new VXLAN tunnel, whereby a new virtual tunnel endpoint (VTEP) is created. This new VTEP acts as an additional system appearing in the network capture of the underlay network. This results in a small impact on the lower values shown in Fig. 10. As described for Fig. 6, bars at lower values define the existence of anomalous packets. The injection of the crafted network packets has a smaller impact of the detection as shown in Fig. 11, because some of the values in these network packets intentionally collide with the effects of the additional system. Especially the additional IP addresses used by the crafted packets might irritate the algorithm.

• VM Life Cycle
The life-cycle of a VM changes the internal structure of the virtual customer network. Even when a VM is stopping or rebooting, internal identifiers like the IP address or MAC address are still the same. The only necessary adaptation is done in the overlay network, when the VM is connected again to the network. In this case, the network traffic is similar to the traffic related to VM migration. Due to the fact that a VM after a shutdown or reboot might appear as the same device concerning the IP address and MAC address, there is no detectable impact of this process.

• Addition or Deletion of VMs
In contrast to the VM life cycle, the starting or permanent deletion of a VM results in a change within the network. A deletion might lead to a change in the mapping between MAC address and IP address; if another VM is started this system might get the previously assigned IP address. This process is similar to the VM migration, because the location of the new VM is randomly in the network. In detail, there are some different aspects compared to the VM migration process, which affect different parameters of the communication. For example, a new location of a migrated VM might result in changed network details like timing parameters or traffic volume. But these values are not detectable by our packet-based approach.  • Change of the protocol To simulate the change of the deployed overlay protocol, we changed from VXLAN to GENEVE during the packet transmission on the fly. As shown in [31], such a change has a measurable impact on anomaly detection. The result is again a detection of anomalies in the underlay network, shown in Fig. 12. To clarify the anomaly, Fig. 13 shows the detection of anomalies when the protocol is changed back from GENEVE to VXLAN on the fly. The different heights of the bars might emerge from the processing of IF, which uses different values for the beginning of the calculation.

• SDN-programmability
The programmability of the network based on SDN provides a huge flexibility of the network. As a result of a new programming of the network, flows inside the network might be redirected, reconfigured or replaced with other packets. To evaluate the SDN functionality, we implement a simple configuration change, which redirects the traffic originally sent to the web server to an additional web server, which acts as a simple proxy system. This alters the network traffic, but the arising changes are the same as VM addition or deletion, e.g. a new system is available in the network.
Different scenarios as discussed in Section 3.2 might appear in virtual environments and on different layers of the network, but the impact on the process of anomaly detection is sometimes similar.  Especially all processes that create additional systems (VMs as well as VTEPs) have a similar impact on the ML algorithms. As we focus on the detection of anomalies, we did not calculate any further metrics like precision, recall and F-measure, which are typically used in the evaluation of machine learning algorithms [62,63].

D. Spiekermann and J. Keller
All changes in virtual networks result in a modification of the environment. So, an occurring change is, depending on the intended analysis, detected as an anomaly which demands for an adapted method of anomaly detection. Only this way might eradicate the alleged anomalies and thereby do not hamper the detection of relevant anomalies.

Conclusion and future work
Cloud environments are highly used networks and provide a great flexibility, either for the user as well as for the provider. Modern networks provide various virtualization techniques to create a highly flexible and dynamic environment. Whereas this customizability creates benefit for the administrator or the user, IT security processes are hampered when using ML. These ML processes require a valid data set used for training the ML algorithms, but this is not guaranteed in modern networks. The appearance of various changes in the environment might lead to different effects like false positives or false negatives. Changes like VM migration or user customization cover other issues in the network and endanger the detection of real anomalies. This paper analyzes unsupervised packet-based anomaly detection with two different algorithms. IsolationForest as well as LocalOut-lierFactor detect arising changes in the network and are therefore a suitable technique to detect outliers in highly used networks. We defined possible changes on two different layers of the virtual environment and evaluated the algorithms in a realistic scenario based on an OpenStack cloud.
Changes in this network are relevant and might occur quite often. As shown in Section 5, these benign changes are detected as anomalies and therefore impede the detection of real attacks or anomalies in the network. To cover this problem, the provider needs to adapt its implemented algorithms and focus on the possible changes. As the changes might occur on all levels of the OSI model, a limitation to specific parameters is not usable. Thus, a periodic redefinition or sanitization of the network traffic is necessary to overcome these issues.
Our future work will be focused on the analysis of large packet captures like UNSW-NB15 [64] in virtual environments and the comparison of packet-based and flow-based feature sets. Modern networks provide high speed connections, therefore a real time packet-based anomaly detection has to be fast enough, to capture and analyze each network packet in time. In our scenario, the network speed was 1 gigabit per second, but the network traffic was limited to only a few involved systems. In addition to this, we firstly captured the data in the so-called online phase, and analyzed this data in a subsequent offline phase. By this, the performance of the analyzing system is not relevant for the intended process. We will improve the analyzing part of the approach by further investigations to filter the network traffic and analyze only relevant information. Furthermore, as a simulated scenario, our approach is not protected against any model of adversarial attacks, so we will harden future implementation against attacks like evasion or poisoning.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.