Keywords

1 Introduction

In recent years, research into new methods of anomaly detection within a network has increased in prominence. The need for a fast, reliable method to identify possible malicious activity has grown in response to emerging threats. Protecting confidential and proprietary data is of paramount importance to any organization to ensure that both legal and contractual obligations are kept. In addition, the data stored may not necessarily be the property of the company storing and handling the data. Malicious activity such as Botnets and Port Scans are increasing in frequency. These attacks, while simple, have the potential to allow for unauthorized access onto the network.

Within any organization, it is common place to use network monitoring and analysis tools to help with the detection of any anomalous behaviour on the network. Tools, such as McAfee ePO and Tipping point for example, are signature based models, which require a known example of a threat to be catalogued and a signature generated. The signature based model, while highly exact, fails if a novel attack occurs (e.g. zero-day vulnerabilities), since no previous signature exists. This limitation gave rise to anomaly based detection mechanisms. These methods require no signature database, but instead model the “normal” traffic on a network and alerts to any activity that happens outside of these normality bounds.

While much research has been conducted on various methods for anomaly based systems using a variety of approaches [1,2,3,4,5], key limitations apply when attempting to adapt these approaches to a real-time system. These include, most notably, computational cost. Within commonly used distance based outlier detection mechanisms, the need for distance based calculations for each new sample can be overwhelming for high volume data. This gave rise to optimized algorithms designed to mitigate this limitation. Algorithms such as Fast Local Outlier Factor (FastLOF) [6] and Micro-Cluster based Outlier Detection (MCOD) [7], reduce the overall cost of the range queries with varying degrees of success.

It has been shown that the application of time series can be beneficial in the detection of network anomalies. Applying time series over time-windows of increasing size has been shown to be capable of normalizing normal behaviours over time. However, at smaller time-intervals, it is possible for abnormal behaviour to traverse time windows, allowing for the possibility of becoming a false positive. This is particularly prevalent among anomalies that generate low numbers of NetFlow. It is possible for it to become hidden within other network traffic as time progresses, making detection increasingly unlikely.

Moreover, while it is possible that a large increase may occur, small deviations in established traffic behaviour may also be an indication of unauthorized activity. For example, an increase in failed login attempts may produce little difference with respect to NetFlow, it may be indicative of someone trying to guess a password. Detecting such an instance would be of paramount importance, particularly if followed by a successful login attempt [8]. This paper proposes a solution to this problem. By implementing parallel clustering algorithms, it is possible to gain a higher level of granularity while maintaining the normalization techniques gained from an incrementally increasing time window. Concurrent algorithms can detect minor deviations from established behaviour that can occur, regardless if they occur while traversing between time-windows.

In Sect. 2, an overview of related work is given. In Sect. 3, a brief overview of the technology used is presented. In Sect. 4, we propose a framework for the identification of anomalies within NetFlow data. The architecture is presented as well as an overview given on algorithms created and used. In Sect. 5, testing methodology is presented. In Sect. 6, results obtained from live data are presented and analysed. In Sect. 7, conclusions and future work are presented.

2 Related Works

Recent research into anomaly detection has largely focused on applying anomaly detection mechanisms on network data to successfully identify anomalous behaviour. Many problems still exist however. Performance is a key factor when trying to utilize anomaly detection techniques and there are many examples where this is apparent. Methods such as Principal component analysis [3], K nearest neighbour [9] and ensemble techniques [10, 11] have been used to various degrees of success in this task.

However, there comparatively expensive operations have led to a rise in clustering techniques to mitigate the calculations need when associating anomaly detection with big data [12, 13]. Using aggregated data, such as NetFlow can be used to reduce the calculations further [14]. Limitations are present with these techniques however. Kumari et al. [15] looks at a clustering technique for the use of anomaly detection over a network, setting a distance based threshold as the 100th farthest data point from the obtained cluster centroids. This threshold is a common theme across multiple anomaly detection solutions [16,17,18]. However, it can be argued that using a common threshold over all clusters within real data is non-representative of the various forms of traffic created. E.g. Traffic from different applications do not act in a similar manner.

This brings forward an interesting problem. While distance based outliers have been shown to be of significance in a plethora of works [19,20,21], we propose another indicator of possible anomalous behaviour. By monitoring cluster density over a time series, changes in underlying behaviour can be detected. Rather than only focusing on samples that are anomalous via distance based calculations, changes in the density of activity over time are also monitored.

Asmuss et al. [18] demonstrates the use and effectiveness of utilizing a time series based approach over live data. The aggregation of traffic is highly beneficial in this case, as it reduces the computational sources needed. Furthermore, it also provides a tangible benefit when comparing results across clusters. This idea of time-aware analysis has been used elsewhere also [22], and has shown to be a valuable tool in mapping continuous behaviour.

This ability to generate a normalized view of traffic over time, practically speaking, has some limitations however. The potential for an anomaly to traverse though sequential time windows can lead to the anomaly threshold not being broken, thus leading to a false positive reading. Presented in Sect. 4 is an architecture that utilizes concurrent time windows in order to mitigate the risk of this happening. Anomalies can be gathered from individual instances of the clustering algorithm, while the correlation of clustering behaviours through instances can also indicate anomalous activity.

3 Technology Overview

3.1 Cisco NetFlow

Cisco NetFlow is a system that amalgamates network traffic information into a format that successfully describes communications occurring on a network. Through a NetFlow enabled device (Fig. 1), packet traces are identified and stored as a single flow representation of a specific set of communications between two devices. These are used for multiple tasks such as network performance monitoring and used as a means of security evaluation when an incident has been detected. Visualization of the NetFlow has also proven to be of tangible benefit [23]. Using this aggregated data for anomaly detection has numerous benefits, such as data size being reduced for processing purposes and storage. For the analysis in this paper, NetFlow was used as it was found that using NetFlow for network monitoring purposes was highly common in the area [24], as well, in this case, the current infrastructure of NPD allowed for the collection of NetFlow with relative ease.

Fig. 1.
figure 1

NetFlow example

3.2 MCOD

MCOD is a distance based outlier detection mechanism that utilizes clustering to reduce the amount of distance based calculations needed to identify possible anomalies. This reduced performance cost of the algorithm makes it an ideal candidate for real-time anomaly detection over data streams when compared to other distance based algorithms such as Local Outlier Factor (LOF). As described in [7], MCOD uses a sliding window approach to identify outliers over a data stream. By using an expiring data set, the algorithm can be optimized to only use a data set that is large enough for satisfactory anomalies to be detected, while maintaining the low amount of calculations needed. Distance is calculated between cluster centroids and the NetFlow sample being queried. If the point is within this range of a cluster centroid and the cluster has the specified density, k, then the point is determined to be a non-anomaly, or inlier.

4 Proposed Framework

The architecture is a two-step approach that involves monitoring traffic at different levels of abstraction. Firstly, an adaptation of the MCOD algorithm is applied on the NetFlow data in sequential time windows (Fig. 2). This stage can outline distance based outliers contained within the NetFlow information. Following this, the clusters generated within each time-window are correlated to identify those representing similar traffic. The density values (how many input samples are contained within the cluster) are then gathered at the end of each window 5 min period.

Fig. 2.
figure 2

Architecture overview

4.1 Anomaly Detection

When using algorithms such as k-Nearest Neighbour and MCOD, a common variable that must be tuned to the dataset is k. While it’s exact use may change, the principal remains consistent. k is used to describe the limit of normality within anomaly detection frameworks. It is a single variable used to tune the classification or detection rate within an algorithm. Within MCOD, it is the value at which a micro cluster is generated and an outlier is classified as an inlier. Within the context of NetFlow anomaly detection, this is a severe limitation, as it assumes that all network traffic activity has an equal distribution across the network, which is an extremely unlikely assumption (Discussed further in Sect. 5).

In order to successfully mitigate this limitation, all cluster densities are monitored irrespective of k’s value. This allows for the successful detection of abnormal increases and decreases in cluster densities with respect to the time window being analysed.

4.2 Density Normality Measurement Generation

Within the proposed architecture, MCODT is initialized with a time window of size t. MCODT clusters data within this window, identifying anomalies using distance based calculations. The clusters generated within this time window contain samples that correspond to a type of network behaviour (Table 5). In parallel, windows are initialized in time steps of St using the same configuration as the initial algorithm (Fig. 3).

Fig. 3.
figure 3

Commonly used notation

The positions of the clusters generated are then correlated with each subsequent MCODT instance to successfully capture the clusters activity through the other MCODT instances. This correlative step is needed as, due to the cluster generation step of MCODT, a cluster might not be in the exact same position throughout the sequential time windows, even though they represent the same class of network traffic (Fig. 4).

Fig. 4.
figure 4

MCODT definition

Once the clusters have been correlated and shown to be representative of the same class of network traffic, the cluster is persisted and given an ID. At the end of each instance of MCODT, the clusters density is measured and compared to its own historical activity and its activity in the other instances (Fig. 5).

Fig. 5.
figure 5

Cluster correlation

This allows for the identification of anomalies, using the 3-standard deviation rule, within the persisted cluster. Furthermore, when a cluster is not generated in all the instances of MCOD, it is indicative of non-homogenous network activity. This specific type of traffic (as shown in Sect. 6) is highly irregular, and corresponds to network traffic that is extremely uncommon within the testing environment (Fig. 6).

Fig. 6.
figure 6

Residual calculation

5 Methodology

5.1 Collected Data

To successfully test the proposed method, NetFlow was collected in a 24-hour period from within NPD. This was live data, and no previous insight about this 24 h period was held. It was unknown if it held anomalies or not, simulating realworld conditions. The NetFlow contained all communications, both internally and externally, during this period. A total of 151,995,634 NetFlow samples. From these samples, 8 attributes were extracted (Table 1). And from these, 6 attributes were selected to be used in the anomaly detection calculations.

Table 1. Selected NetFlow attributes
  • Source/Destination IP

  • Destination Port

  • Source Port

  • Destination Bytes

  • Source Bytes

  • Protocol ID

These attributes were normalized using theoretical maximums as well as observed maximums over a 3-month period (Table 2).

Table 2. Maximum values used for normalization

The remaining collected attributes were excluded from the anomaly calculations due to various reasons. The IP addresses were excluded due to the IP address leases allocated by DHCP servers were inconsistent in both maintaining the allocated IP’s, and time-out periods for leased IP’s. This would lead to inconsistent results within networks, as IP addresses could be re-allocated in as little as 30 min, drastically changing their perceived normal traffic pattern. Instead, IP addresses were categorized as either internal or external, in order to develop separate clusters in feature space to represent internal-to internal and external-to-external traffic types. These attributes, along with the Start Time attribute, were collected for the investigation.

5.2 Program Configurations

For testing, a 24-hour example was chosen with no specific preference. No previous assumptions existed about this data before testing. The architecture was configured with an initial time window size of 5 min, and parallel instances were configured to run at one minute intervals after this, leading to a total of 5 MCOD instances processing the data in parallel. Configured variables are outlined in Table 3.

Table 3. Program configurations

6 Results

In this section, we discuss the results of the proposed method of anomaly detection. Anomalous samples that were identified at both stages of the proposed architecture are outlined and analysed. Examples of normal activity of various types will also be presented.

6.1 Point Anomalies

Due to the two-stage architecture of the proposed method, anomalies may be detected in two different manners. Firstly, anomalies outlined by the distanced based calculations are outlined at the end of every time window. An anomaly outlined in this window contains a point that never meets the required density irrespective of time. These are regarded as Point Anomalies. Due to the relatively small time window of the MCODT instances, the number of point anomalies detected within the first hour of processing was vast. The number of additional anomalies fell rapidly over the course of the analysis. Because of this, focus was placed on point anomalies that were detected after the initial 12 h of analysis. Table 4 outlines two such samples that were correctly identified as an anomaly.

Table 4. Sample point anomalies

Sample A represents a simple DNS request, which at first seemed like a false positive. However, upon investigation, it was shown that this DNS request was from an internal asset to an external DNS server. This incident was of interest to security technicians within the NPD Group. Sample B was an unauthorized UPnP (Universal Plug and Play) device connected to the network. It has been well documented how network security can be effected by having a UPnP device hosted on a network [25]. Upon Detection, the device was disconnected from the network.

6.2 Cluster Density Analysis

Cluster densities were measured at regular intervals. Five instances of MCODT were run in parallel, each with the same configuration settings (Table 3) The initial instance, MCODTN0, was initialized at 00:00am on the day in question. One minute after this, MCODTN1, was initialized, followed by MCODTN2 and so on. This low level of analysis allows for the detection of possible malicious activity in as little as one minute after an incident Table 5 outlines sample clusters, selected based on being classified as an anomaly, and their activity over the course of one hour within the scope of all the independent instances. This totals 1440 total densities measured for 1532 clusters generated and persisted over the course of testing.

Table 5. Detected anomalies

Anomalies were shown in both the independent analysis and correlative analysis. Of interest is an anomaly detected (Table 5, Sample 9). This anomaly appears in all instances of the algorithm, and is shown to have significant divergence from observed normal behaviours. The rapid increase, once investigated, was attributed to a single asset. It was shown to be connected to an external IP. It proceeded to attempt to open a connection to the external asset, but never received any connection. This was of interest to the security team within NPD, and was swiftly resolved. Sample 4 was also anomalous within the test. The cluster only appeared in one window, showing the extremely temporal nature of the event. The other instances did not detect sufficient activity to generate a cluster. It represents an extremely short burst of activity to an external device.

7 Conclusions and Future Work

This paper proposed an architecture designed to detect anomalies within NetFlow data. To achieve this at a micro level, a clustering algorithm was run in parallel to determine anomalies in cluster activity in time-series. It was shown to be able to detect anomalies in live data without any previous knowledge on the data. These anomalies were investigated and shown to be of security interest. The result was interesting given that the testing was conducted on real world, live data, with actionable anomalies found. Future work would include refining the extensibility of the algorithm. Due to the abstraction of the density monitoring, it is possible to add attributes to MCODT’s feature space in order to monitor changes in not only network traffic, but other metrics that could attribute to the risk of a malicious attack, such as the vulnerability of an asset determined by an external program.