Network security AIOps for online stream data monitoring

Nguyen, Giang; Dlugolinsky, Stefan; Tran, Viet; López García, Álvaro

doi:10.1007/s00521-024-09863-z

Network security AIOps for online stream data monitoring

Original Article
Open access
Published: 11 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Network security AIOps for online stream data monitoring

Download PDF

Giang Nguyen ORCID: orcid.org/0000-0002-6769-0195^1,2,
Stefan Dlugolinsky²,
Viet Tran² &
…
Álvaro López García³

Abstract

In cybersecurity, live production data for predictive analysis pose a significant challenge due to the inherently secure nature of the domain. Although there are publicly available, synthesized, and artificially generated datasets, authentic scenarios are rarely encountered. For anomaly-based detection, the dynamic definition of thresholds has gained importance and attention in detecting abnormalities and preventing malicious activities. Unlike conventional threshold-based methods, deep learning data modeling provides a more nuanced perspective on network monitoring. This enables security systems to continually refine and adapt to the evolving situation in streaming data online, which is also our goal. Furthermore, our work in this paper contributes significantly to AIOps research, particularly through the deployment of our intelligent module that cooperates within a monitoring system in production. Our work addresses a crucial gap in the security research landscape toward more practical and effective secure strategies.

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

Article 26 March 2021

Cybersecurity data science: an overview from machine learning perspective

Article Open access 01 July 2020

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

An essential component of an organization’s information technology (IT) infrastructure is the Security Operations Center (SOC), which provides a centralized unit that employs people, processes and technology to continuously monitor and manage the organization’s security assessment. One of its key functions is to identify, investigate, and resolve cyber threats and attacks. This is achieved through active monitoring, which involves a continuous scan of the network for abnormalities or suspicious activities. The SOC collects and analyzes data from various sources, such as security devices, logs, and other data sources, to detect and respond to potential security incidents. By providing continuous monitoring and response capabilities, SOC helps organizations maintain their security posture and protect against cyber threats.

In network monitoring, situational awareness refers to best-practice engineering approaches used to continuously monitor IT processes and operations to improve service provision in the form of automatic action recommendation and decision support [1]. It is a challenge to deal with large volumes of data online and in real time. Many of such data are not labeled [2] or are not significant for detection purposes. Another challenge presents large quantities of alerts whose evaluation consumes unnecessary time of security analysts and thus makes the notification of real alerts more difficult. Various tools such as firewalls, virtual private networks, intrusion detection systems (IDSs), and intrusion prevention systems (IPSs) are used to ensure network security. Early detection, alerting, and rapid response are crucial components of a proactive security approach, helping prevent potential problems from impacting the network. Other challenges [3] in detecting network anomalies are a dynamic and fast changing environment, the diversity of resources and tools, and the security nature of the domain that prevents the sharing of logs among SOCs.

Artificial intelligence (AI), specifically its subfield neural networks (NNs) and deep learning (DL) in unsupervised way, aids in this process. This involves processing telemetry data from a monitored network, representing them as time sequences, and training an intelligent model to predict the most probable future states (called baseline) based on previous observations [4, 5]. In typical network operation, alarms are triggered when a monitored value exceeds or falls below predefined thresholds. However, setting these thresholds accurately can be challenging over time [6, 7]. If the thresholds are too narrow, a large number of alarms can be triggered, exceeding the analysis capacity of network managers. If the thresholds are too wide, real anomalies can be overlooked, leaving the network vulnerable to potential threats. Then, dynamically defining thresholds are becoming increasingly important. Such a nuanced soft baseline enables a complex system overview and helps to indicate possible anomalies, allowing security systems to continuously refine and adapt to changing situations over time and online. However, real data in real production are hard to obtain due to the security nature of the domain. There are a number of public, synthesized, and artificially generated datasets, but real scenarios are rarely seen.

The motivation of this work is to deploy an AI module for proactive network monitoring in real time to improve security protection. In this field, the implementation aspect of AI deployment has not yet been thoroughly investigated. By cooperating with IDS/IPS, the intelligent AI module is designed to monitor the infrastructure network in production and identify potential threats. To achieve this, the latest trend in multihorizon and multivariate time-series prediction is used, along with DL techniques in an unsupervised way, to simultaneously model multiple monitoring channels. This approach is expected to provide a more comprehensive and accurate view of the network’s state, enabling proactive measures to be taken to improve network security.

The work presented in this paper makes several key contributions to the field of AI for IT operations (AIOps) research, specifically in the context of online data stream monitoring, including:

The development of a DL-based multihorizon forecast model that simultaneously models multiple monitoring channels in an unsupervised manner, enabling a comprehensive view of network activities.
The deployment of an AIOps module containing the DL-based trained model for proactive network security monitoring that cooperates in real-time with IDS in production and allows a dynamic nuanced baseline to improve the effectiveness of threat detection.
The automated development for the entire software stack using declarative composition, which helps to simplify AIOps deployment and management over time.

These contributions demonstrate the potential for AI and advanced modeling techniques presented in this work to improve network security and improve the efficiency of IT operations.

The paper is further structured as follows: Section 2 surveys the related work. Section 3 provides the main idea about the cooperation architecture in Sect. 3.1, the learning development phase in Sect. 3.2, the online model deployment in Sect. 3.3, and the online anomaly detection in Sect. 3.4. Section 4 goes through the deployment of the architecture in Sect. 4.1, online data processing in Sect. 4.2, the quality forecast in Sect. 4.3, and anomaly detection results in Sect. 4.4. Section 5 concludes the main points of the work and its future extension.

2 Related work

2.1 Network behavior analysis

Network anomaly and detection, real-time monitoring, behavior baseline, analysis dashboards, network security measures, and access to threat intelligence are essential components of the network security strategy. By implementing these measures, organizations ensure that their networks are secure and protected against cyber threats. There are several types of IDS, including network IDS, host IDS, wireless IDS, and their combination. IDS only alerts against network threats and vulnerabilities, and more intervention is needed to act on these alerts. More than IDS, IPS alerts and blocks network threats and vulnerabilities based on a signature approach. Automated thread response is based on user-predefined thresholds.

Most of the current IDS/IPS are capable to respond in real time or near real time to network activities. This is achieved primarily by reactive solutions, typically as a set of rules. These rules are used to identify known patterns of malicious behavior, such as known attack signatures, and to trigger an alert or take an action to block activity in real time [8]. However, the effectiveness of these reactive solutions is limited by the system’s ability to recognize previously unknown or novel attack patterns. To address this limitation, the use of machine learning (ML) algorithms is merged to learn from historical data and proactively detect and prevent potential threats [9].

In this context, network behavior analysis (NBA) [10] is a technique used in cybersecurity to monitor and analyze network traffic for suspicious behavior (Table 1). Unlike misuse-based detection methods that rely on known attack patterns, NBA is designed to detect anomalous behavior that may indicate a new or unknown threat. NBA involves analyzing network traffic in real time to identify patterns of behavior that deviates from normal usage. This approach can help detect a wide range of attacks, including malware infections, data exfiltration, and insider threats. Examples of cybersecurity platforms and tools [11, 12] that can incorporate NBA techniques are IDS/IPS software such as ZEEK, Snort, Nessus, Suricata, Zabbix, OSSEC, FlowMon, and Rapid7.

Table 1 Cybersecurity tools and platforms that can incorporate NBA techniques

Full size table

2.2 Predictive analysis in cybersecurity

A data-driven approach and intelligent data modeling extract information from large volumes of data for informed decision making, using advanced analytics to identify patterns, predict trends, and respond to threats and opportunities [13]. There are two main types of network analysis:

Misuse-based detection, also called the knowledge-based or signature-based approach, aims to detect known attacks by using the signatures of attacks. Its strong side is its high accuracy and it can identify types of attacks. However, it is a reactive solution and requires a data background of attacks. There is a lack of real labeled datasets of known patterns due to the diversity of variety of IDS/IPS working in real production.
Anomaly-based detection considers an anomaly as a deviation from a known behavior (other words: profile, baseline, threshold), representing the normal or expected behaviors derived from monitoring regular activities over a period of time [14]. It is a proactive solution, predicting in advance and in time, for any type of attack and does not require a data background of attacks. However, the approach may provide low accuracy in dynamic environments where all events dynamically change.

In the context of misuse-based detection, the works [15,16,17] present a range of techniques for data preprocessing, data filtering, feature learning, and representation, which are presented across multiple topics, including semi-supervised anomaly detection and insider detection of network traffic [18]. The authors compare and select from a variety of ML methods such as logistic regression (LR), Naïve Bayes (NB), k-nearest neighbor (kNN), support vector machine (SVM), decision tree (DT), random forest (RF), extreme learning machines (ELMs), and multilayer perceptron (MLP). These works utilize various techniques to calculate anomaly scores, such as local outlier factor, one-class SVM, isolation forest, histogram-based outlier score, or distance distribution-based anomaly detection. To calculate the distances between feature spaces, dimensionality reduction techniques, such as principal component analysis (PCA) or random projection, are often used.

Anomaly-based detection is also a widely used technique to identify situations in data that deviate from expected behavior. It finds applications in various domains, including fraud detection in credit card, insurance, traffic flows, intrusion detection in the cybersecurity industry, and fault detection in industrial analytics [19]. In time series, the objective is to recognize temporal patterns in past data and to use these results for forecasting. Kalman filtering method was one of the models capable of resolving regression concerns and minimizing variance to achieve optimal results. After that, the auto-regressive integrated moving average (ARIMA) model is a well-known and standard framework for predicting short-term data flow. Numerous modifications to the ARIMA model were implemented, and the results ensured improved performance [20]. Unsupervised real-time anomaly detection for streaming data was adequately investigated in [21] to evaluate the performance of hierarchical temporal memory networks over the Numenta Anomaly Benchmark (NAB) corpus that contains a single metric with timestamps.

In works [22, 23], the authors present dynamic monitoring characteristics and propose the use of incremental DL to handle concept drift in simulated evolving data stream scenarios. The comparison result between traditional approaches such as ARIMA versus DL showed that DL outperformed traditional approaches, providing better results. We can list DL methods from references such as NN, deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural network (RNN), long short-term memory (LSTM) with incremental learning [24], gated recurrent unit (GRU), autoencoder (AED), and variational autoencoders (VAEDs), as well as their combinations. More examples are restricted Boltzmann machines (RBMs), deep Boltzmann machines (DBMs), deep belief networks (DBNs) [25, 26], and recently transformer [27, 28] for multiple metrics with timestamp problem as sequences.

Table 2 Deep learning models for multihorizon multichannel modeling

Full size table

These works provide valuable information on the strengths and weaknesses of different approaches, helping researchers and practitioners makes informed decisions about the methods to apply to specific use cases and identify inherent obstacles to applying ML/DL in practical domains [30]. The authors report a significant interest in validating the effectiveness of their network intrusion detection approach [31] to simulate a wide range of scenarios that mirror real-world conditions as closely as possible. Using various techniques such as time series, graph network, DL, and their combination [32], the authors aim to develop a robust and reliable system to detect anomalies in network traffic flows.

DL has become increasingly popular in anomaly detection with multihorizon time-series forecasting (Table 2), showing significant performance improvements compared to traditional time-series methods [33, 34]. Multihorizon forecasting is a technique in time-series data analysis [35]. Unlike one-step-ahead forecasting, which predicts the value of a variable only one time period ahead, multihorizon forecasting allows the estimation of future trends over multiple time periods [36]. This capability is especially valuable for situational awareness and decision support, as it enables optimization of actions in multiple steps in advance.

This surge in the use of DL has led to various experiments aimed at developing intelligent modules that address intrusion detection in a complex way. In this problem, patterns that deviate from a baseline, estimated only from normal traffic, are indicated as anomalous. Here is also a discussion of technical challenges of applications of deep ensemble models [37, 38] with the challenges that arise during the data management, model learning, model verification, and model deployment stages, as well as considerations that affect the whole deployment pipeline in production [39].

2.3 AI for IT operations

AIOps is the term used to describe the integration of AI and ML algorithms into IT operations to improve the efficiency and reliability of IT services [40]. It represents a significant step forward in the field of IT operations, enabling organizations to take advantage of the power of AI to improve their IT operations and provide better services.

In this context and for cybersecurity intrusion detection research, several artificially generated, public, and synthesis datasets [17, 25, 41] are widely recognized as valuable resources. The most well-known are presented in Table 3. These datasets provide a valuable resource for researchers and practitioners to test and evaluate new approaches to cybersecurity challenges, allowing the development of more effective and robust cybersecurity solutions.

Table 3 Cybersecurity datasets: public, generated, and synthesis

Full size table

Furthermore, an annotation approach is presented to generate artificial alerts in [49], which includes a pioneering automatic scheduling scheme. This method enables efficient and effective monitoring of data streams, providing timely alerts for potential anomalies or other important events. Similarly, in the work [50], an automated adaptation strategy for stream learning is presented, which has been tested on 36 publicly available datasets and five ensemble ML methods such as the simple adaptive batch learning ensemble (SABLE), dynamic weighted majority (DWM), paired learner (PL), leveraged bagging (LB), and the best last method (BLAST) for stream data analysis. These experiments demonstrate the potential of ML/DL to enable real-time analysis of data streams, helping to detect important patterns and anomalies.

As mentioned in Sects. 2.1 and 2.2, in a wide range of studies, including those mentioned and many others, researchers have conducted experiments that demonstrate promising results in the development and testing of various methods for data analysis and monitoring [51]. These studies have used a variety of public datasets that contain various monitoring information, allowing rigorous testing and comparison of different approaches. The positive results of these experiments suggest that these methods have the potential to be useful in a wide range of applications, from anomaly detection to predictive modeling and beyond. These works have made important contributions to cybersecurity research, their focus has often been on predictive analysis (that is, data analytics with ML methods but without real deployment), using publicly available, synthesis, or artificially generated datasets (Table 4).

Table 4 Predictive analysis in cybersecurity

Full size table

Although such datasets can provide valuable information on the performance of different approaches, it is important to recognize that real-world cybersecurity challenges often involve unique and complex data that are not fully represented in them. It is crucial to validate cybersecurity solutions over real data to ensure their effectiveness in practical settings to develop robust and effective cybersecurity solutions.

Network intrusion detection based on anomalies is an area of great interest in cybersecurity research, but its real-world applications pose a significant challenge. Although many studies have focused on developing intelligent approaches to detect anomalies in network traffic, deploying these methods in real production with real stream data and flows is difficult and rarely achieved due to the natural security of the domain as presented in Table 4.

3 Method description

To bridge the gap between research and real production, the AIOps integration process and the challenges involved in the deployment of intelligent approaches for anomaly detection in stream data merges, while technologies are already active [53]. The starting point of our work is the integration of the AI/DL model to cooperate with ZEEK IDS, which supervises the infrastructure network in the online real-time data stream for anomaly detection.

3.1 Cooperation architecture

Network anomalies and detection are critical to ensuring the security of computer networks. Real-time monitoring is essential to detect unusual activity or behavior that may indicate a security breach. To establish a baseline for network behavior, it is crucial to understand what normal activity looks like. This is where behavior baseline comes in, which provides a clear picture of expected network activity.

To make sense of the large amounts of data generated by real-time monitoring and behavior baseline, analysis dashboards are essential. These dashboards provide network administrators with insight into network performance, traffic patterns, and potential security threats.

The cooperation architecture (Fig. 1) in this work is designed with the main components and technologies presented in Table 5. The combination of ZEEK, Apache Kafka, Elasticsearch, Kibana, and Docker and Docker Compose creates a system architecture that is capable of integrating an intelligent module to improve the detection capacity of Zeek. It can also provide a dashboard for visualization analysis with a dynamic behavior baseline produced by the module.

The detailed description of each software component in our cooperation architecture with its purpose and key features is as follows:

Table 5 Main software components and technologies of the cooperation architecture

Full size table

ZEEK is an open-source IDS that continuously monitors network traffic to detect and identify potential intrusions in real-time. It does so by parsing network traffic and extracting application-level information to analyze and match input traffic patterns with stored signatures. ZEEK is known for its speed and efficiency, and it is capable of performing detection without dropping any network packets. Additionally, it includes an extensive set of scripts designed to minimize false alarms and support the detection of signature-based and event-oriented attacks.

Apache Kafka is an open-source distributed event streaming platform that enables high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. It operates as a cluster of one or more servers communicating through a high-performance transmission control protocol (TCP) network protocol and can span multiple data centers or cloud regions for enhanced scalability and fault tolerance.

It provides a variety of application programming interface (API) in Java and Scala, including higher-level libraries for Go, Python, C/C++, and Representational State Transfer (REST) API. The primary concepts of Kafka include topics, partitions, producers, consumers, brokers, and zookeeper. The main concept of Kafka (Fig. 2) used in our work are:

Topics represent a stream of records or events that are stored in a distributed manner across partitions. They are partitioned data (distributed placements of data) spread over a number of buckets located on different Kafka brokers.
Producers publish data to topics, they are those client applications that publish (write) events to Kafka;
Consumers subscribe to topics to consume the data, they are the clients that read and process these events;
Brokers serve as intermediaries between producers and consumers, managing the storage and replication of data across partitions.
Zookeeper is a centralized service for maintaining configuration information, providing distributed synchronization. Kafka uses it to monitor the status of cluster nodes, as well as topics and partitions. Zookeeper stores the list of existing topics, the number of partitions for each topic, the location of replicas, and access control lists (ACLs).

Elasticsearch is a distributed RESTful search and analysis engine, designed to handle a wide range of use cases. As the core component of the Elastic Stack, it provides lightning-fast search capabilities, fine-tuned relevancy, and robust analytics that can easily scale to handle even the largest datasets. Elasticsearch is the most popular enterprise search engine based on Lucene, offering a distributed multi-tenant capable full-text search engine with a user-friendly hypertext transfer protocol (HTTP) interface and schema-free JavaScript Object Notation (JSON) documents.

Kibana is an open-source front-end application that works seamlessly with the Elastic Stack. Kibana provides search and data visualization capabilities, allowing users to easily explore and analyze data indexed in Elasticsearch. It was originally part of the Elasticsearch-Logstash-Kibana (ELK) stack. Kibana also serves as a user interface to manage, monitor, and secure an Elastic Stack cluster. Its aim is to configure and monitor Elasticsearch indices, visualize data using pre-built dashboards and graphs, and manage security settings to ensure data privacy and access control.

Docker is a standardized unit of software. It is a set of platform-as-a-service products that use operating system (OS) level virtualization to deliver software in packages (containerization). Developers and operation teams use the Docker tool to create and automate the deployment of applications in lightweight containers, for example, on virtual machines, to ensure that applications work efficiently in multiple environments. Using OS-level virtualization, Docker provides a consistent and predictable environment in which applications can run, regardless of the underlying infrastructure [55].

Docker Compose is used to define and run multi-container docker applications within a single command to create and start all services from the configuration YAML Ain’t Markup Language (YAML) file. Docker Compose is used to run multiple containers as a single service. Each of the containers runs in isolation but can interact with each other when required.

The description and the online state of our monitoring stream data are in Sect. 4.2. The snapshot with details of our architecture deployed as the software stack running in its production testing environment is in Sect. 4.1.

In the following sections, the two phases of our proposed intelligent module designed for the cooperation architecture are presented as follows:

The development of our multihorizon multichannel model is described in Sect. 3.2.
The online deployment of the model is presented in Sect. 3.3.

The online anomaly detection principle is presented in Sect. 3.4.

3.2 Multihorizon multichannel modeling as dynamic baselines

Complex network monitoring is considered a data function y of the time point t indicated by $y_t$. It has the form of a multivariate vector containing multiple variables of m multichannel monitoring $( y^1_t, y^2_t, \dots , y^m_t )$ for proactive modeling of network activity.

$$\begin{aligned} y_t = \left( y^1_t, y^2_t, \dots , y^m_t \right) \end{aligned}$$

(1)

at time t. The mathematical basis for proactive forecasting of the y value at the next q time points $(t+1)~\dots ,(t+q)$ is based on the y values at the previous p time points $(t-p+1)~\dots ,(t)$ with added/subtracting error terms.

In this work, we extend the motivation based on time-series forecasting backgrounds [56, 57] from forecasting one horizon (or one time point) as in [58] to generalizing the forecast to the k horizon ahead of the direct forecast of the q horizons (Fig. 3) as follows (Eq. 2):

$$\begin{aligned} \begin{pmatrix} y^1_{t-p+1} & \cdots & y^1_{t} \\ y^2_{t-p+1} & \cdots & y^2_{t} \\ \vdots \\ y^m_{t-p+1} & \cdots & y^m_{t} \\ \end{pmatrix} \rightarrow \begin{pmatrix} y^1_{t+k} & \cdots & y^1_{t+k+q-1} \\ y^2_{t+k} & \cdots & y^2_{t+k+q-1} \\ \vdots \\ y^m_{t+k} & \cdots & y^m_{t+k+q-1} \\ \end{pmatrix} \end{aligned}$$

(2)

as vector form (Eq. 3):

$$\begin{aligned} y_{t-p+1},~\dots ,~y_{t} \rightarrow y_{t+k},~\dots ,~y_{t+k+q-1} \end{aligned}$$

(3)

where:

$1 \le k < p$;

$1 \le q < p$;

k indicates the time points ahead between the current time t and the start time point $(t+k)$ of the forecast;

q indicates the multihorizon forecast from time point $(t+k)$ to time point $(t+k+p-1)$.

The modeling values of $y_t$, called predicted $\widehat{y_t}$, are used as known behavior (other words: dynamic baseline, dynamic threshold), representing the normal behavior of regular network activities over a period of time t for anomaly detection purposes (more details in Sect. 3.4).

The direct forecasting strategy involves training models using time-series values that have been shifted to the desired number of time periods in the future. It can be improved by using a multichannel forecasting strategy, also known as a multiple-output strategy. With this approach, a single model is developed that can predict the entire forecast sequence in one go. This can make the AIOps deployment easier and more flexible compared to making separate predictions for each time period using multiple models.

Data preprocessing is a crucial step before modeling and prediction. An important aspect of preprocessing is feature engineering, which involves transforming raw logs into time-series data that can be used for modeling. To ensure the quality of transformed data, network protocols are checked against white noise, randomness, and unit root [59] using the augmented Dickey–Fuller (ADF) test [60]. A unit root is a stochastic trend in a time series, sometimes called a random walk with drift. If a time series has a unit root, it shows a systematic pattern that is unpredictable. The origin Dickey–Fuller test is to test the model (Eq. 4).

$$\begin{aligned} \Delta y_t = y_t - y_{t-1} = \alpha + \beta t + \gamma y_{t-1} + e_t \end{aligned}$$

(4)

ADF is the augmented version of the Dickey–Fuller test. It allows for higher-order autoregressive processes by including $\Delta y_{t-p}$ in the model (Eq. 5).

$$\begin{aligned} \Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \delta _1 \Delta y_{t-1} + \delta _2 \Delta y_{t-2} + \cdots \end{aligned}$$

(5)

For both tests, the null hypothesis ($H_0$) is a unit root presented in the data. The alternative hypothesis (H1) is that the unit root is not present in the data. Mathematically, $H_0$ states that if $\gamma =0$ then y is a random walk (or non-stationary) process. If the ADF test returns $p\_value < 0.05$ threshold then y is stationary. The goal of these tests as part of data preprocessing is to identify and filter out protocols that contain excessive noise or randomness, as they are not suitable for modeling. Unlike most of the other forecasting algorithms, DL architecture or models are capable of learning non-linearities and long-term dependencies in the sequence. Then, stationary is less of a concern for DL models. By performing data filtering, the quality of the data is improved, leading to more accurate predictions [61].

To model the complex behavior of network traffic, we employ the state-of-the-art DL architectures (Table 2) to handle online stream data. These DL models were investigated, customized according to our domain data, and compared (Sect. 4.3.2) to select the best, i.e., GRU model for online deployment (Sect. 3.3).

The forecast quality is evaluated using a separated error matrix (Eq. 6) for multihorizon modeling (k horizon before direct forecast of q horizons) of multichannels (m monitoring channels). We denote

$$\begin{aligned} E = \begin{pmatrix} e^1_{t+k} & e^1_{t+k+1} & \cdots & e^1_{t+k+q-1} \\ e^2_{t+k} & e^2_{t+k+1} & \cdots & e^2_{t+k+q-1} \\ \vdots \\ e^m_{t+k} & e^m_{t+k+1} & \cdots & e^m_{t+k+q-1} \\ \end{pmatrix} = \begin{bmatrix} e^1 \\ e^2 \\ \vdots \\ e^m \end{bmatrix} \end{aligned}$$

(6)

The average error value $avg(e^j)$ of the series (Eq. 8) of errors $e^j$ of each communication protocol j (Eq. 7) is calculated for the m monitoring channels with direct forecast of the horizons q.

$$\begin{aligned} e^j & = \begin{bmatrix} e^j_{t+k}&e^j_{t+k+1} \ldots&e^j_{t+k+q-1} \end{bmatrix} \end{aligned}$$

(7)

$$\begin{aligned} \text {avg}(e^j) & = \frac{1}{q} \sum _{i=t+k}^{t+k+q-1}e^j_{i} \end{aligned}$$

(8)

Errors are indicated mainly by the smooth version of symmetric mean absolute percentage error (SMAPE) (Eqs. 9). SMAPE is used to avoid zero division. It is also less sensitive to near-zero values compared to mean absolute percentage error (MAPE) (Eqs. 10).

$$\begin{aligned} \text {SMAPE} & = 100\% \frac{1}{n} \sum _{i=1}^n \frac{|y_i - \widehat{y_i} | }{ \frac{1}{2}(|y_i| + |\widehat{y_i}|)} \end{aligned}$$

(9)

$$\begin{aligned} \text {MAPE} & = 100\% \frac{1}{n} \sum _{i=1}^n \left|\frac{(y_i - \widehat{y_i})}{ y_i } \right|\end{aligned}$$

(10)

Root Mean Square Error (RMSE) (Eqs. 11) and coefficients of determination ($R^2$) (Eqs. 12) are used as supported metrics.

$$\begin{aligned} \text {RMSE} & = \sqrt{\frac{1}{n} \sum _{i=1}^n (y_i - \widehat{y_i})^2 } \end{aligned}$$

(11)

$$\begin{aligned} R^2 & = 1 - \frac{ \sum _{i=1}^n \Bigl ( y_i - \widehat{y_i} \Bigr )^2}{\sum _{i=1}^n \Bigl ( y_i - \frac{1}{n} \sum _{j=1}^n y_j \Bigr )^2 } \end{aligned}$$

(12)

All the aforementioned metrics measure the inconsistency between the ground truth y and the prediction $\widehat{y}$ of n observations. Mean squared error (MSE) (Eq. 13) and mean absolute error (MAE) (Eq. 14) are used as a loss function in model training.

$$\begin{aligned} \text {MSE} & = \frac{1}{n} \sum _{i=1}^n (y_i - \widehat{y_i})^2 \end{aligned}$$

(13)

$$\begin{aligned} \text {MAE} & = \frac{1}{n} \sum _{i=1}^n |y_i - \widehat{y_i}| \end{aligned}$$

(14)

The hyperparameter settings for our DL architectures are presented in Table 7 (Sect. 4.3.1). All features are transformed by normalization scaling [62]. The dropout layer works great with our data. The main advantage of using unsupervised (self-supervised) methods is that we do not need to label the data. The main disadvantage is that detected patterns may not be anomalous, but rather intrinsic to the dataset.

3.3 Model online deployment

Teacher forcing is a method of quickly and efficiently training and retraining neural network models using ground truth values. It is a critical training method used in DL for natural language modeling, such as machine translation, text summarization, image captioning, and many other applications. In some cases, teacher forcing is also called changeably incremental online learning. After the model is first trained, it is used to start the process and ensures the predicted output in a predefined interval. The recursive output-as-input process approach can be used when training the model for the first time, but it can result in problems like model instability or poor skill in a dynamic changing environment. Teacher forcing is used to address model skills and stability regularly over time [63].

DL comes with more complex architectures, which promise more accurate learning, but at the cost of model complexity and computing resources. Our choice of GRU architecture is based on the comparison analysis of the DL architecture [58] and based on the analysis done in this work for the implementation of online learning with real-time stream data (Sect. 4.3). GRU behaves well in the development phase, and teacher forcing is supported in realization with production-grade TensorFlow for deployment.

Due to the dynamic nature of the monitoring data, model adaptation and its updating are needed during the deployment, especially when the model is run for a longer period of time. This process is described in (Algorithm 1). The prediction model is deployed within a Kafka consumer that consumes two streams online:

$topic\_in$: used for the prediction, and
$topic\_benign$: used to update the model.

Both streams are processed using a sliding window approach, but each with different length and step settings. The resolution of time (that is, the time step) is set to 10 min according to the domain experience and the delay effect described in Sect. 4.2.2 with the statistical results presented in Fig. 6.

Regarding the prediction task (Algorithm 1, lines 9–15), the sliding window defines the prediction context. The length of the sliding window is $context\_len$ and corresponds to the number of recent time steps that the model sees in its input. Step $slide\_step$ is the same as the model forecast horizon, in our case, it is one time step corresponding to 10 min (Table 7).

Predictions are made continuously at each sliding window step (Algorithm 1, line 13) by feeding the model with a sliding window content; that is, a multivariate time series of features $features\_in$. The forecasting results in the form of a feature vector $features\_out$ enriched with model metadata (for example, model name) are sent to the output stream $topic\_pred$ with the help of a Kafka producer (Algorithm 1, line 14), where they can be further processed by other consumers, such as a consumer feeding Elasticsearch (Fig. 5).

Updates to the online model are made periodically if sufficient training data are received from the benign data stream (Algorithm 1, line 20). In our case, for the 24-h update period and the training data for 24 h, the sliding window length ($update\_context\_len$) and its step ($update\_period$) are set to 144 (24 h / 10 min = 144).

The complexity of ExecOnlineModel (Algorithm 1) is O(n), where n is the number of message in Kafka’s consumer (Algorithm 1, line 7). The algorithm presents the inference using the pre-trained model in production, which is not computationally or communication intensive. The model works in place with a regular amount of online stream data from the monitoring log flow (Algorithm 1, lines 2 and 3) in the context of one current sliding window (Algorithm 1, from line 7 to line 14).

3.4 Online anomaly detection

Anomalies in network monitoring are identified as significant deviations from the values of normal activity level [14]. The most important point is the precision of the prediction values of $\widehat{y}$ (Sect. 3.2) compared to the real values of y to keep alert warnings at an acceptable level for network administration. The overall anomaly detection score for the m monitoring channels at time t is calculated as:

$$\begin{aligned} score_t = cosine\left( \widehat{y}_{t,upper}, y_t \right) \end{aligned}$$

(15)

where:

$\widehat{y}^i _{t,upper} = (1 + \alpha ^i)~\widehat{y}^i_t$;

model is the model architecture in production;

$y_t$ is a vector of real values of m channels at time t;

$\widehat{y}_t$ is predicted values of $y_t$ calculated at time $(t-k)$;

$\widehat{y}_{t,upper}$ is an upper boundary for $y_t$;

$\alpha ^i$ is a threshold coefficient for $i^{th}$ channel in percents; where $\alpha ^{model}_i \ge \frac{1}{2} SMAPE^{model}_i$.

The warning of an abnormal state at time t is called anomaly detection rules applied for multichannel monitoring (Eq. 16), which is activated when

$$\begin{aligned} score_t > \vartheta \ge 0 \end{aligned}$$

(16)

where $\vartheta$ is a threshold constant reservation.

Conventional threshold-based methods provide a horizontal baseline line. DL data modeling provides a nuanced soft baseline in the form of the prediction $\widehat{y}$ during the monitoring time of the network (Fig. 3). Algorithm 2 works well under IDS production conditions. The core is to obtain the predicted values $\widehat{y}_t$ and use them as a flexible baseline (or threshold). The result is indicated as the blue line in Fig. 10 (Sect. 4.4), where the green line indicates the real values y.

4 Experiments and evaluation

The cooperation architecture (Fig. 1) based on the Kafka concept for ZEEK log stream processing (Fig. 2) is realized as a production testing environment as follows:

Deployment of the cooperation architecture (Sect. 3.1) as running software stack (Sect. 4.1) with online model deployment for multihorizon multichannel forecasting.
Online stream data in cooperation with ZEEK supervising network activities (Sect. 4.2);

The details of the experiments carried out in the production testing environment are presented as follows:

Stream data processing (Sect. 4.2.1);
Online delayed log effect (Sect. 4.2.2);
Forecasting quality (Sect. 4.3);
Anomaly detection results (Sect. 4.4).

4.1 Software stack deployment

The deployment [64] of the complete software stack in our work for data streams follows automation and orchestration with Ansible semantic declaration. The snapshot of the deployment is shown in Fig. 4 with the component serving instances as shown in Table 6.

Table 6 Component serving instances

Full size table

The Ansible YAML file uses a declarative composition to easily compose, deploy, and scale the software stack (Fig. 4) for our deployment into the testing environment. YAML is a human-readable data serialization language that serves the same purpose as Extensible Markup Language (XML) but with Python-style indentation for nesting. Compared to XML, YAML has a more compact format.

4.2 Online stream data

The next part of our work is cooperation with one instance of ZEEK working in production. The instance is installed on the network routing machine. Here are two kinds of logs:

Compressed log files: ZEEK collects metadata on ongoing activity within a monitored network. This activity is recorded as transaction logs and organized into compressed textual log files that are grouped by date, protocol, and capture time within specific directories $PREFIX/logs. These logs are collated and organized into compressed textual log files for long-term storage [65] that can be easily searched and analyzed to identify patterns and trends in network traffic [58].
Online stream log files: ZEEK analyzes traffic according to the predefined setting policy and produces the results of a real-time network activity in a specific and special directory $PREFIX/logs/current on the ZEEK monitoring server. These incremental logs are stored as uncompressed log files [66], which serve as a source of streaming data for subsequent analysis and modeling in this work.

By default, ZEEK regularly takes all logs from $PREFIX/logs/current and compresses them by date time and archives to $PREFIX/logs. The frequency is set to every hour by default and can be specified in the configuration file as the value $PREFIX.

Online logs are in human-readable (ASCII) format and data are organized into tab-delimited columns. Logs that deal with analysis of a network protocol often start like with:

a timestamp,
a unique connection identifier (UID),
a connection 4-tuple (originator host/port and responder host/port),
protocol-dependent activities that’s occurring.

Apart from the conventional network protocol specific log files, ZEEK also generates other important log files based on the network traffic statistics, interesting activity captured in the traffic, and detection-focused log files, for example, conn.log, notice.log, known_services.log, weird.log.

4.2.1 Data stream processing

Before any further analysis, an initial analysis of the transaction logs was performed to filter out white noise protocols with stochastic characteristics (Sect. 3.2). This filtering process helps to remove irrelevant or noisy data from the dataset, improving the accuracy and efficiency of subsequent analysis steps. For network protocols that pass data validation tests and are used for modeling, it is important to recognize that time-series data have characteristics of ordered time-dependency sequences, where each observation has a temporal dependency on previous observations. Preserving this temporal dependency is crucial for accurate modeling.

For incremental filtered log files conn.log, dns.log, http.log, sip.log, ssh.log, ssl.log of network protocols, dedicated Kafka producers are launched to monitor them for new records in real time. New records are treated as live streams (Fig. 5) and parsed as soon as they appear in the log files.

The parsed logs are immediately sent to the Kafka cluster. They are organized by topic and key (Fig. 2), where the topic refers to message origin (a particular ZEEK instance) and the key refers to originating log files (e.g., conn, http).

In our work, a concept of aggregators is engaged to consume ZEEK messages from Kafka, and compute statistics over predefined sliding windows (10min, 30min or 1 h) are applied to preprocess streaming data. Aggregated statistics are pushed back to Kafka as new messages under dedicated topics, e.g., ’zeek1-agg1-10 m’ and keys, e.g., conn, http. Aggregations are computed according to a specified set of declarative rules. There are two types of aggregation rules:

agg rule computes the number of total and unique connections (for instance conn log file), the sum of received and sent bytes, and the average duration of connections within the sliding window.
groupby type of rules is used to group data within the sliding window by a specified column before computing the aggregation statistics.

The groupby aggregation rule is useful if we want to compute the statistics for particular column values. An example of such statistics is counting the total number of values true and false for the column AA within the sliding window as follows.

The rule leads to the creation of two columns in the resulting record; i.e., dns_AA_T_count for the true value count and dns_AA_F_count for the false value count.
Column names are automatically generated according to the original log file and the expansion of the aggregation rules.

Before applying the aggregation rules, the data are grouped within the sliding window by direction of communication into three groups:

in,
out,
internal.

The direction is resolved by IP address, where applicable, and the result is taken from the perspective of the monitored network. It should be noted that before applying any aggregation rules, there is an option to prepare the data consumed using declarative data cleaning rules.

4.2.2 Online delayed logging effect

The analysis of more than 2.1 billion records collected over a 32-month period revealed that the logs in the incremental files are sometimes delayed. Delays present a problem for online aggregation, because delayed logs are not included in their aggregator sliding window. Figure 6 illustrates the analysis of the conn.log data with the graph showing how long an aggregator should wait for delayed logs to receive the desired percentage of incoming logs for the relevant sliding window, i.e., the 300-s interval is needed for delayed log aggregation to obtain $97\%$ of the logs for the actual sliding window.

The conn.log file is one of the most important ZEEK logs. It contains logs of TCP/UDP/ICMP connections that have been completed [65]. Log records are written to the incremental log file at the time of completing a connection; i.e., when the connection is closed gracefully or abruptly. There is also a third option when a connection is considered completed, despite the fact that it might still be alive. It is the case when it exceeds the inactivity timer set by ZEEK. This timer is set to 1 h by default, so longer connections are incorrectly broken into multiple sessions.

However, the timer is not the main cause of delays, as there are up to $97\%$ logs with delays below this timer. The user datagram protocol (UDP) and internet control message protocol (ICMP) connections are interpreted by ZEEK using flow semantics; that is, the sequence of packets from a source host/port to a destination host/port is grouped. Each log contains ts and duration fields, where ts is the time of the first packet and duration is the elapsed time between the first and last packets in a session. For 3-way and 4-way connection teardowns, the final acknowledgment (ACK) packet is not included in the duration computation, which could be the reason for the delays.

The delays in the analysis were calculated as follows: The logs from the beginning are chronologically iterated as they were recorded, while computing the end timestamp ts_end for each transaction log as $\texttt {ts} + \texttt {duration}$ (Eq. 17).

$$\begin{aligned} ts\_end = ts + duration \end{aligned}$$

(17)

Logs with missing duration were skipped. The difference between consecutive logs is computed as in Eq. 18. The negative difference diff is considered a delay.

$$\begin{aligned} \textit{diff} = ts\_end_t - ts\_end_{t-1} \end{aligned}$$

(18)

Based on our experience and experiments with domain data, the optimal sliding window of 10 min is set with a 10-min step for data aggregation. The waiting time for the delayed logs is set to 5 min, which according to the analysis presented above should cover up to $97\%$ of the actual window logs. Following these settings, the data aggregator is deployed as a Kafka consumer/producer, which reads new Kafka logs and performs data aggregation.

4.3 Forecasting quality

In addition to DL data modeling (Sect. 3.2), the cross-validation based on a rolling window is used, that is, the data are divided into fixed-size sliding windows. This approach allows us to train the model on a subset of the data and then test it on a different subset of the data. By repeating this process, we assess the model performance and make sure that it generalizes well to new data without overfitting and to ensure that our model makes accurate predictions on incoming online stream data.

4.3.1 Hyperparameter setting

The complete setting of the hyperparameters is presented in Table 7. Hyperparameter tuning is performed on the basis of the Bayesian optimization sequential design strategy [67] in combination with preliminary exploratory online data analysis (Sect. 4.2.2) and our previous experience [68] with data and DL modeling in the domain.

Table 7 Hyperparameter setting

Full size table

4.3.2 Model performance

The deployment model works well to monitor simultaneously $conn\_in$, $conn\_out$, $dns\_out$, $http\_in$, $ssl\_in$ network protocols. SMAPE and cosine are used as main metrics for the performance measurements of the model, and the rest are auxiliary supporting metrics. In practice, the number of metrics used is greater for the development and deployment of models. It is appropriate to mention that the range SMAPE is $<0\%, 200\%>$ (Eqs. 9).

The forecasting quality for multiple (1–10) horizons in one go with SMAPE (Tables 8, 9). The best model performance value (the lower value is the better quality) of combinations (protocol and horizon) is highlighted in bold for the values in these two tables together for DL models: MLP, CNN, GRU, s2s-GRU, CNN-GRU, and transformer.
The forecasting quality for multiple (1–10) horizons in one go with cosine (Tables 10, 11). The best model performance value (the higher value is the better quality) of combinations (protocol and horizon) is highlighted in bold for the values in these two tables together for DL models: MLP, CNN, GRU, s2s-GRU, CNN-GRU, and transformer.
The forecasting stability for one-step-ahead of GRU with SMAPE (Table 12, Fig. 8) and cosine (Table 13, Fig. 9).

Table 8 MLP, CNN, GRU: model performance by SMAPE for multihorizons (1–10)

Full size table

Table 9 s2s-GRU, CNN-GRU, Transformer: model performance by SMAPE for multihorizons (1-10)

Full size table

Table 10 MLP, CNN, GRU: model performance by cosine for multihorizons (1-10)

Full size table

Table 11 s2s-GRU, CNN-GRU, Transformer: model performance (cosine) for multihorizons (1–10)

Full size table

Experiments were carried out to measure the quality of direct multihorizon forecasts for multichannels. The selected results are presented as follows: Tables 8, 9, 10 and 11 present the quality of the model for multiple horizons (1, 10) in one go simultaneously forecast with metrics SMAPE and cosine. The model works simultaneously for all selected network protocols. The main key findings based on the evaluations of the experiments are as follows:

The forecasting quality is not equal for all protocols due to their dynamic nature in the monitored network environment. Specifically, $dns\_out$ and $http\_in$ have sharper and more spiky diagrams (Fig. 10), and their fluctuations are more significantly reflected in the metrics, especially in SMAPE.
The cosine metric corresponds more or less to SMAPE for all monitoring protocols. The average values of metrics for multihorizon forecast in one go are higher in comparison with one horizon forecast.
The result of the multichannel forecast for one horizon in one go is slightly better than the multihorizon multichannel forecast in one go (Tables 12, 13).
The quality of the model decreases gracefully with time, which is the expectation and the reason why teacher forcing is used for online deployment.

Tables 12 and 13 present the prediction stability in the n-step ahead forecast with the GRU model. Values are in stable ranges for various metrics (as visualized in Figs. 8, 9); then, we do not emphasize ranges, nor the best values in bold style like in other tables.

Table 12 GRU forecasting stability in n-step ahead forecast (SMAPE)

Full size table

Table 13 GRU forecasting stability in n-step ahead forecast (cosine)

Full size table

Regarding the selection of the best trained model to deploy, the following key findings are concluded:

GRU, CNN-GRU, and transformer are the top three best models;
MLP model can be considered as the baseline for performance comparison, and there is no big difference between MLP and CNN models;
s2s-GRU acts as an encoder-decoder with GRU blocks. The difference between the performance of GRU and the performance of s2s-GRU is not significant in favor of GRU;
CNN-GRU can be seen as an improvement that combines the advantages of both CNN and RNN. It is better than CNN and nearly as good as GRU;
Transformer gives interesting and impressive performance. Its performance in the four tables is an average of 10 times training. Sometimes, the model gives superior performance, which outperforms other models. Another time, the training is not always stable, which can be caused by the dropout rate and normalization layers. The consequence of that is that the teacher forcing for transformer is more complicated in comparison with other models, concrete GRU such as the need of learning rate warm-up and linear decay of the learning rate to ensure the model stability.

In summary, GRU belongs to the best models trained with our data. We chose GRU over transformer to deploy after considering and comparing the possible deployment obstacles of both models. GRU (Fig. 7a, c) has simpler requirements with comparable or better performance quality compared to the transformer (Fig. 7b, d).

The effect of direct multihorizon forecast in network monitoring. Multihorizon forecasts (Tables 8, 9, 10, 11) have slightly higher errors compared to the forecast for one horizon (Table 12, 13). This effect can be expected due to the stochastic character of network monitoring and the more complex computational models of multihorizon forecasting on multichannels (Figs. 8, 9). It also reflects the average errors of multiple horizon modeling (Sect. 3.2).

4.4 Anomaly detection results

Figure 10 illustrates the proactive prediction of the normal state as the blue soft line, which takes place 10 min in advance compared to the real monitoring value of the protocols http and ssl (green lines). The http protocol has a clear pattern of connections, which is well modeled by the DL approach. Such patterns are difficult to model with traditional approaches (Sect. 2).

The abnormal state in proactive network monitoring is also illustrated in Fig. 10, where the spikes of the green lines (real monitoring value) significantly exceed the blue soft line of the proactive predicted states. The difference between two lines (blue and green) is calculated in Sect. 3.4 as the angle between two vectors (Eq. 15) with threshold rules (Eq. 16).

It is suitable to note that monitoring systems usually do not have two lines (blue, green) as in Fig. 10 of our Kibana’s dashboard, but only one line of real monitoring values.

The next detected anomaly is shown in Fig. 11, where the alerts are displayed in red alert areas. The proactive effect is 10 min in advance (with $k=1$, that is, one horizon ahead forecasting, Fig. 11) in the flavor of the network administration to react to the abnormal situation.

However, the concrete type of anomaly excessive port scanning is not detected using our anomaly-based detection but reported as the feedback of the network administrator after log examination. Our anomaly-based detection only alarms the anomaly state (red alert area) in network security according to the high deviation from the baseline (blue area) of the actual connection number (gray area) using the anomaly detection rule (Sect. 3.4, Eq. 16).

The proactive time effect is configurable (with the value of steps ahead $k > 1$) which can give network administrators more time to react. The number of warning alerts is set by the value $\alpha$ as described in Eq. 15. The higher the value of $\alpha$ (in the range of 0 to 1), the lower the number of alerts for anomaly warnings. Our default value $\alpha$ is set to 0.5 for the testing production environment (Fig. 4).

The deployed dashboard (Fig. 10) allows us to add a dynamic adaptive threshold baseline to improve awareness of the situation. The development of the entire software stack using a declarative composition simplifies AIOps deployment in practice.

Currently, control over the benign data stream is up to the SOC operators, who filter malicious data out of this stream based on the monitoring results. In the event of a detected anomaly based on the rules (Eq. 16), the update of the model could also be performed automatically without manual intervention in the automatic teacher forcing process (update and exchange model). The initial model, which is used as the starting model for online prediction, is always trained on benign historical data. When the model is already running online, the quality of the model can be validated backward. If the accuracy of the prediction is below, a threshold or the anomaly level of the data does not exceed the predefined threshold (the parameter $\alpha$ in Eq. 16), then the data can be considered benign and used to update the model. If the accuracy is above the threshold, an anomaly alert is raised and the monitoring data are excluded from the teacher forcing training process (Fig. 5).

We prefer to deploy one model over multiple or ensemble models. The deployment of a single DL model is really not an easy task. Deployment is not a simple process and requires building a complicated pipeline for every model retraining. The combination of more models in production certainly increases the deployment and production cost with model and data curation online in real time.

5 Conclusion

The work presented in this paper presents the development of a DL-based multihorizon forecast model that simultaneously models multiple monitoring channels in an unsupervised manner, allowing a comprehensive view of network activities. With the trained DL model, the deployment of an AIOps module for proactive network security monitoring is realized in cooperation with the IDS working in real-time production.

The deployment of an intelligent module into a production environment is often accompanied by various obstacles, which are presented in this work. Moving forward, there are several directions for future work. A promising approach involves integrating innovative concepts for hybrid detection that combine anomaly-based detection with misuse-based detection for attack classification. It is also imperative to validate anomaly detection scores using well-defined labeled reference datasets. Data curation and model maintenance in production can be time intensive and require thorough verification, especially in the context of evolving data distribution and scale. Another direction is distributed and federated learning for interoperability to reduce the sensitive data sharing silos in cybersecurity.

Data availability

https://github.com/privacy-security/aiops. We provide a sample of anonymized and preprocessed buffer data of 6 weeks when raw stream data cannot be stored in static form.

Abbreviations

ACL:: Access control lists
AED:: Autoencoder
AI:: Artificial intelligence
AIOps:: AI for IT operations
API:: Application programming interface
ARIMA:: Auto-regressive integrated moving average
BLAST:: The best last method
CNN:: Convolutional neural networks
DBM:: Deep Boltzmann machines
DBN:: Deep belief networks
DL:: Deep learning
DNN:: Deep neural networks
DT:: Decision tree
DWM:: Dynamic weighted majority
ELK:: Elasticsearch-Logstash-Kibana
ELM:: Extreme learning machines
GRU:: Gated recurrent unit
HTTP:: Hypertext transfer protocol
ICMP:: Internet control message protocol
IDS:: Intrusion detection systems
IPS:: Intrusion prevention systems
IT:: Information technology
JSON:: JavaScript object notation
kNN:: K-nearest neighbor
LB:: Leveraged bagging
LR:: Logistic regression
LSTM:: Long short-term memory
MAE:: Mean absolute error
MAPE:: Mean absolute percentage error
ML:: Machine learning
MLP:: Multilayer perceptron
MSE:: Mean squared error
NAB:: Numenta anomaly benchmark
NB:: Naïve Bayes
NBA:: Network behavior analysis
NN:: Neural networks
OS:: Operating system
PCA:: Principal component analysis
PL:: Paired learner
RBM:: Restricted Boltzmann machines
REST:: Representational state transfer
RF:: Random forest
RMSE:: Root mean square error
RNN:: Recurrent neural network
SABLE:: Simple adaptive batch learning ensemble
SMAPE:: Symmetric mean absolute percentage error
SOC:: Security operations center
SVM:: Support vector machine
TCP:: Transmission control protocol
UDP:: User datagram protocol
VAED:: Variational Autoencoders
XML:: Extensible markup language
YAML:: YAML Ain’t markup language

References

Tan L, Yu K, Ming F, Cheng X, Srivastava G (2021) Secure and resilient artificial intelligence of things: a honeynet approach for threat detection and situational awareness. IEEE Consum Electron Mag 11(3):69–78. https://doi.org/10.1109/MCE.2021.3081874
Article Google Scholar
Fahy C, Yang S, Gongora M (2022) Scarcity of labels in non-stationary data streams: a survey. ACM Comput Surv 55(2):1–39. https://doi.org/10.1145/3494832
Article Google Scholar
Chica JCC, Imbachi JC, Vega JFB (2020) Security in SDN: a comprehensive survey. J Netw Comput Appl 159:102595. https://doi.org/10.1016/j.jnca.2020.102595
Article Google Scholar
Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning and deep learning methods for cybersecurity. IEEE Access 6:35365–35381. https://doi.org/10.1109/ACCESS.2018.2836950
Article Google Scholar
Nisioti A, Loukas G, Laszka A, Panaousis E (2021) Data-driven decision support for optimizing cyber forensic investigations. IEEE Trans Inf Forensics Secur 16:2397–2412. https://doi.org/10.1109/TIFS.2021.3054966
Article Google Scholar
Chen Z, Xu G, Mahalingam V, Ge L, Nguyen J, Yu W, Lu C (2016) A cloud computing based network monitoring and threat detection system for critical infrastructures. Big Data Res 3:10–23. https://doi.org/10.1016/j.bdr.2015.11.002
Article Google Scholar
Bhatia S, Liu R, Hooi B, Yoon M, Shin K, Faloutsos C (2022) Real-time anomaly detection in edge streams. ACM Trans Knowl Discov Data 16(4):1–22. https://doi.org/10.1145/3494564
Article Google Scholar
Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2(1):1–22. https://doi.org/10.1186/s42400-019-0038-7
Article Google Scholar
Dasgupta D, Akhtar Z, Sen S (2022) Machine learning in cybersecurity: a comprehensive survey. J Def Model Simul 19(1):57–106. https://doi.org/10.1177/1548512920951275
Article Google Scholar
Azeez NA, Bada TM, Misra S, Adewumi A, Vyver C, Ahuja R (2020) Intrusion detection and prevention systems: an updated review. In: Data management, analytics and innovation: proceedings of ICDMAI 2019, vol 1, pp 685–696. https://doi.org/10.1007/978-981-32-9949-8_48
Cooper S (2023) Intrusion detection systems explained: 14 Best IDS software tools reviewed. Accessed 11 Nov 2023. https://www.comparitech.com/net-admin/network-intrusion-detection-tools/
BasuMallick C (2022) Top 10 network behavior analysis software in 2022. Accessed 11 Nov 2023. https://www.spiceworks.com/tech/networking/articles/best-nba-software/
Ibrahim A, Thiruvady D, Schneider J, Abdelrazek M (2020) The challenges of leveraging threat intelligence to stop data breaches. Front Comput Sci 2:36. https://doi.org/10.3389/fcomp.2020.00036
Article Google Scholar
Melgar-García L, Gutiérrez-Avilés D, Rubio-Escudero C, Troncoso A (2023) Identifying novelties and anomalies for incremental learning in streaming time series forecasting. Eng Appl Artif Intell 123:106326. https://doi.org/10.1016/j.engappai.2023.106326
Article Google Scholar
Nguyen G, Nguyen BM, Tran D, Hluchy L (2018) A heuristics approach to mine behavioural data logs in mobile malware detection system. Data Knowl Eng 115:129–151. https://doi.org/10.1016/j.datak.2018.03.002
Article Google Scholar
Monshizadeh M, Khatri V, Atli BG, Kantola R, Yan Z (2019) Performance evaluation of a combined anomaly detection platform. IEEE Access 7:100964–100978. https://doi.org/10.1109/ACCESS.2019.2930832
Article Google Scholar
Kilincer IF, Ertam F, Sengur A (2021) Machine learning methods for cyber security intrusion detection: datasets and comparative study. Comput Netw 188:107840. https://doi.org/10.1016/j.comnet.2021.107840
Article Google Scholar
Le DC, Zincir-Heywood N (2020) Exploring anomalous behaviour detection and classification for insider threat identification. Int J Netw Manag. https://doi.org/10.1002/nem.2109
Article Google Scholar
Djeddi AZ, Hafaifa A, Hadroug N, Iratni A (2022) Gas turbine availability improvement based on long short-term memory networks using deep learning of their failures data analysis. Process Saf Environ Prot 159:1–25. https://doi.org/10.1016/j.psep.2021.12.050
Article Google Scholar
Sayed SA, Abdel-Hamid Y, Hefny HA (2023) Artificial intelligence-based traffic flow prediction: a comprehensive review. J Electr Syst Inf Technol. https://doi.org/10.1186/s43067-023-00081-6
Article Google Scholar
Ahmad S, Lavin A, Purdy S, Agha Z (2017) Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262:134–147. https://doi.org/10.1016/j.neucom.2017.04.070
Article Google Scholar
Sun Y, Dai H (2021) Constructing accuracy and diversity ensemble using pareto-based multi-objective learning for evolving data streams. Neural Comput Appl 33:6119–6132. https://doi.org/10.1007/s00521-020-05386-5
Article Google Scholar
Aguiar GJ, Cano A (2023) Enhancing concept drift detection in drifting and imbalanced data streams through meta-learning. In: 2023 IEEE international conference on big data (BigData). IEEE Computer Society, pp 2648–2657. https://doi.org/10.1109/BigData59044.2023.10386364
Takele AK, Villány B (2023) LSTM-autoencoder based incremental learning for industrial Internet of Things. IEEE Access. https://doi.org/10.1109/ACCESS.2023.3339556
Article Google Scholar
Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020) Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inf Secur Appl 50:102419. https://doi.org/10.1016/j.jisa.2019.102419
Article Google Scholar
Thaseen IS, Chitturi AK, Al-Turjman F, Shankar A, Ghalib MR, Abhishek K (2020) An intelligent ensemble of long-short-term memory with genetic algorithm for network anomaly identification. Trans Emerg Telecommun Technol. https://doi.org/10.1002/ett.4149
Article Google Scholar
Tang B, Matteson DS (2021) Probabilistic transformer for time series analysis. In: Advances in neural information processing systems, vol 34, pp 23592–23608. https://proceedings.neurips.cc/paper/2021/file/c68bd9055776bf38d8fc43c0ed283678-Paper.pdf
Zerveas G, Jayaraman S, Patel D, Bhamidipaty A, Eickhoff C (2021) A transformer-based framework for multivariate time series representation learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 2114–2124. https://doi.org/10.1145/3447548.3467401
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Pérez D, Alonso S, Morán A, Prada MA, Fuertes JJ, Domínguez M (2021) Evaluation of feature learning for anomaly detection in network traffic. Evol Syst 12(1):79–90. https://doi.org/10.1007/s12530-020-09342-5
Article Google Scholar
Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 Military communications and information systems conference (MilCIS). IEEE, pp 1–6. https://doi.org/10.1109/MilCIS.2015.7348942
Pérez SI, Moral-Rubio S, Criado R (2021) A new approach to combine multiplex networks and time series attributes: building intrusion detection systems (IDS) in cybersecurity. Chaos Solitons Fractals 150:111143. https://doi.org/10.1016/j.chaos.2021.111143
Article MathSciNet Google Scholar
Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963. https://doi.org/10.1007/s10618-019-00619-1
Article MathSciNet Google Scholar
Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Philos Trans R Soc A 379(2194):20200209. https://doi.org/10.1098/rsta.2020.0209
Article MathSciNet Google Scholar
Lim B, Arık SÖ, Loeff N, Pfister T (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecast 37(4):1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012
Article Google Scholar
Quaedvlieg R (2021) Multi-horizon forecast comparison. J Bus Econ Stat 39(1):40–53. https://doi.org/10.1080/07350015.2019.1620074
Article MathSciNet Google Scholar
Yang Y, Lv H, Chen N (2022) A survey on ensemble learning under the era of deep learning. Artif Intell Rev. https://doi.org/10.1007/s10462-022-10283-5
Article Google Scholar
Ganaie MA, Hu M, Malik A, Tanveer M, Suganthan P (2022) Ensemble deep learning: a review. Eng Appl Artif Intell 115:105151. https://doi.org/10.1016/j.engappai.2022.105151
Article Google Scholar
Paleyes A, Urma R-G, Lawrence ND (2022) Challenges in deploying machine learning: a survey of case studies. ACM Comput Surv 55(6):1–29. https://doi.org/10.1145/3533378
Article Google Scholar
Gil L, Liska A (2019) Security with AI and machine learning, 1st edn. O’Reilly Media, Inc., Sebastopol
Google Scholar
Alshaibi A, Al-Ani M, Al-Azzawi A, Konev A, Shelupanov A (2022) The comparison of cybersecurity datasets. Data 7(2):22. https://doi.org/10.3390/data7020022
Article Google Scholar
Lippmann R, Haines JW, Fried DJ, Korba J, Das K (2000) The 1999 DARPA off-line intrusion detection evaluation. Comput Netw 34(4):579–595. https://doi.org/10.1016/S1389-1286(00)00139-0
Article Google Scholar
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications. IEEE, pp 1–6. https://doi.org/10.1109/CISDA.2009.5356528
UNB: Canadian Institute for Cybersecurity ISCX datasets. Accessed 11 Nov 2023 (2016). https://www.unb.ca/cic/datasets/index.html
Fontugne R, Borgnat P, Abry P, Fukuda K (2010) Mawilab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In: ACM CoNEXT 10. http://www.fukuda-lab.org/mawilab/index.html
Ring M, Wunderlich S, Grüdl D, Landes D, Hotho A (2017) Creation of flow-based data sets for intrusion detection. J Inf Warf 16(4):41–54
Google Scholar
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1:108–116. https://doi.org/10.5220/0006639801080116
Article Google Scholar
Division C (2023) Insider threat test dataset. Accessed 11:2016. https://doi.org/10.1184/R1/12841247.v1
Perdices D, García-Dorado JL, Ramos J, De Pool R, Aracil J (2021) Towards the automatic and schedule-aware alerting of internetwork time series. IEEE Access 9:61346–61358. https://doi.org/10.1109/ACCESS.2021.3073598
Article Google Scholar
Bakirov R, Fay D, Gabrys B (2021) Automated adaptation strategies for stream learning. Mach Learn 110(6):1429–1462. https://doi.org/10.1007/s10994-021-05992-x
Article MathSciNet Google Scholar
Basati A, Faghih MM (2022) DFE: efficient IoT network intrusion detection using deep feature extraction. Neural Comput Appl 34(18):15175–15195. https://doi.org/10.1007/s00521-021-06826-6
Article Google Scholar
Perdices D, Vergara JEL, Ramos J (2021) Deep-FDA: using functional data analysis and neural networks to characterize network services time series. IEEE Trans Netw Serv Manag 18(1):986–999. https://doi.org/10.1109/TNSM.2021.3053835
Article Google Scholar
Martín C, Langendoerfer P, Zarrin PS, Díaz M, Rubio B (2022) Kafka-ML: connecting the data stream with ML/AI frameworks. Futur Gener Comput Syst 126:15–33. https://doi.org/10.1016/j.future.2021.07.037
Article Google Scholar
Keserwani PK, Govil MC, Pilli ES (2023) An effective NIDS framework based on a comprehensive survey of feature optimization and classification techniques. Neural Comput Appl 35(7):4993–5013. https://doi.org/10.1007/s00521-021-06093-5
Article Google Scholar
García ÁL, De Lucas JM, Antonacci M, Zu Castell W, David M, Hardt M, Iglesias LL, Moltó G, Plociennik M, Tran V et al (2020) A cloud-based framework for machine learning workloads and applications. IEEE Access 8:18681–18692. https://doi.org/10.1109/ACCESS.2020.2964386
Article Google Scholar
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, Hoboken
Google Scholar
Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. OTexts, Melbourne
Google Scholar
Nguyen G, Dlugolinsky S, Tran V, López García Á (2020) Deep learning for proactive network monitoring and security protection. IEEE Access 8:1–21. https://doi.org/10.1109/ACCESS.2020.2968718
Article Google Scholar
Goubeaud M, Joußen P, Gmyrek N, Ghorban F, Kummert A (2021) White noise windows: data augmentation for time series. In: 2021 7th International conference on optimization and applications (ICOA). IEEE, pp 1–5. https://doi.org/10.1109/ICOA51614.2021.9442656
Dickey DA, Fuller WA (1979) Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74(366a):427–431. https://doi.org/10.1080/01621459.1979.10482531
Article MathSciNet Google Scholar
Cerqueira V, Torgo L, Soares C (2023) Model selection for time series forecasting an empirical analysis of multiple estimators. Neural Process Lett. https://doi.org/10.1007/s11063-023-11239-8
Article Google Scholar
Talavera E, Iglesias G, González-Prieto Á, Mozo A, Gómez-Canaval S (2023) Data augmentation techniques in time series domain: a survey and taxonomy. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08459-3
Article Google Scholar
Lamb AM, Alis Parth Goyal AG, Zhang Y, Zhang S, Courville AC, Bengio Y (2016) Professor forcing: a new algorithm for training recurrent networks. In: Advances in neural information processing systems, vol 29. https://proceedings.neurips.cc/paper/2016/file/16026d60ff9b54410b3435b403afd226-Paper.pdf
Pokhrel SR (2022) Learning from data streams for automation and orchestration of 6g industrial IoT: toward a semantic communication framework. Neural Comput Appl 34(18):15197–15206. https://doi.org/10.1007/s00521-022-07065-z
Article Google Scholar
ZEEK (2023) Zeek’s example logs. Accessed 11 Nov 2023. https://docs.zeek.org/en/current/examples/logs/
Zeek (2024) Zeek documentation—Quick Start Guide—Book of Zeek. Accessed 26 Jan 2024. https://docs.zeek.org/en/master/quickstart.html
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, vol 25
Nguyen G, Dlugolinsky S, Bobák M, Tran V, López García Á, Heredia I, Malík P, Hluchý L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52(1):77–124. https://doi.org/10.1007/s10462-018-09679-z
Article Google Scholar

Download references

Acknowledgements

This work is funded by the European Union through the Horizon Europe AI4EOSC project under grant number 101058593.

Funding

Open access funding provided by The Ministry of Education, Science, Research and Sport of the Slovak Republic in cooperation with Centre for Scientific and Technical Information of the Slovak Republic.

Author information

Authors and Affiliations

Faculty of Informatics and Information Technologies, Slovak University of Technology, Ilkovičova 2, 84216, Bratislava, Slovakia
Giang Nguyen
Institute of Informatics, Slovak Academy of Sciences, Dúbravská cesta 9, 84507, Bratislava, Slovakia
Giang Nguyen, Stefan Dlugolinsky & Viet Tran
Instituto de Física de Cantabria (IFCA), CSIC-UC, Avda. los Castros s/n, 39005, Santander, Cantabria, Spain
Álvaro López García

Authors

Giang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Dlugolinsky
View author publications
You can also search for this author in PubMed Google Scholar
Viet Tran
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro López García
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.N. was contributed conceptualization, formal analysis, investigation, methodology, project administration, software, supervision, visualization, writing–original draft, and writing–review and editing. S.D. was involved in data curation, investigation, resources, software, validation, visualization, writing–original draft, and writing–review and editing. V.T.: was performed data curation, resources, software, validation, and writing–review and editing. A.L.G. was responble for methodology, software, validation, and writing–review and editing.

Corresponding author

Correspondence to Giang Nguyen.

Ethics declarations

Conflict of interest

The authors declare that they have no known conflict of interest or competing financial interests that could appear to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nguyen, G., Dlugolinsky, S., Tran, V. et al. Network security AIOps for online stream data monitoring. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09863-z

Download citation

Received: 13 November 2023
Accepted: 12 April 2024
Published: 11 May 2024
DOI: https://doi.org/10.1007/s00521-024-09863-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Network security AIOps for online stream data monitoring

Abstract

Similar content being viewed by others

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

Cybersecurity data science: an overview from machine learning perspective

A comprehensive survey of AI-enabled phishing attacks detection techniques

1 Introduction

2 Related work

2.1 Network behavior analysis

2.2 Predictive analysis in cybersecurity

2.3 AI for IT operations

3 Method description

3.1 Cooperation architecture

3.2 Multihorizon multichannel modeling as dynamic baselines

3.3 Model online deployment

3.4 Online anomaly detection

4 Experiments and evaluation

4.1 Software stack deployment

4.2 Online stream data

4.2.1 Data stream processing

4.2.2 Online delayed logging effect

4.3 Forecasting quality

4.3.1 Hyperparameter setting

4.3.2 Model performance

4.4 Anomaly detection results

5 Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation