Automated root cause identification of security alerts: Evaluation in a SaaS Cloud
Introduction
Critical computing infrastructures are equipped with a variety of independent security monitors, such as intrusion detection systems, network audit, vulnerability scans, and event logs. The analysis of the security alerts generated by the monitors at node- and network-level provides a goldmine of information to detect attacks and to pinpoint potential system misuse [1], [2], [3], [4]. However, comprehensive system monitoring, which guarantees high coverage at detecting suspicious system activities, causes the generation of large volumes of alerts and false positives [5], [6], [3].
The analysis of the security alerts is a process that consists of several phases and involves a number of professional roles in a security team, as shown by Fig. 1. The alerts generated by the security monitors are first analyzed by the security admins: when further investigation is needed, a ticket is created in a ticket management system. The ticket management team analyzes the ticket and identifies a response action. The response team implements the action and initiates a corrective task, such as the reboot of a critical system service, a software reconfiguration, the cleanup of the file system of a given node. The alerts management process is strongly human-intensive. As a result, the ever-increasing volume and heterogeneity of the collected alerts prevent timely security analysis and forensic activities conducted by the operations team.
Finance systems, corporate networks, datacenter facilities, and industrial systems are intertwined in many modern human activities. Recent attacks resulting into the sabotage of physical devices, e.g., Stuxnet (2010), and data breaches episodes, which caused credentials and credit card numbers leaks, e.g., Linkedin (2012), Global Payments (2012), TripAdvisor (2014), emphasize the strong technical advances achieved by the attackers’ community over the past decade and pose new societal challenges. More important, the popularity gained by the Software as a Service (SaaS) paradigm to deliver business critical applications [7] has made the Cloud an increasingly sensitive security target. In the near future, the massive user bases, the number of applications, and companies handled by Cloud services will represent a sensitive source of confidential data to feed the underground economy trading sensitive information over the Internet [8].
This paper proposes a framework consisting of a filter and a decision tree to address large volumes of security alerts and to support the automated identification of the root causes of the alerts.
The proposed framework leverages a key observation of any security dataset: most of the alerts are generated by the regular system operations rather than by actual incidents [5], [6], [9], [10], [11]. As a result, we use a term weighting approach to filter the alerts. Term weighting computes the relevance of a term, i.e., a sequence of characters separated by whitespace(s), across a given dataset [12]: the smaller the relevance, the higher the chance the term is generated by the regular system operation. We investigate three term weighting schemes, i.e., term frequency (tf), term frequency-inverse document frequency (tf.idf), and logarithmic entropy (log.entropy) to discard not relevant alerts.
The framework encompasses a decision tree to support the identification of the root causes of the alerts retained by the filter. A root cause is a human-understandable description of the events that caused the triggering of an alert; moreover, the root cause is contextualized with a list of actions needed to deal with the problem, the response, and the list and count of the alerts pertained with that root cause. The identification of the root causes is accomplished through a conceptual clustering approach [5], which identifies the general structure of the alert, i.e., the cluster the alert belongs. Each cluster is explained by one root cause. A cluster is manually added to the decision tree by the security analyst at the first occurrence of a given alert: future occurrences of the same alert are automatically assigned to its cluster, and the root cause is established by traversing the tree.
We evaluate the framework by analyzing two security datasets from a production SaaS Cloud, which generates an average volume of 800 alerts/day. The datasets are named Unix and Windows through the rest of the paper: the former consists of 13,200 alerts generated by Unix nodes, while the latter contains 150,170 alerts of Windows nodes in the SaaS Cloud. Each dataset spans a timeframe of seven months. The key findings of our data-driven measurement study are summarized in the following:
- •
The filtering step of the framework reduces the volume of the Unix alerts by 4 times; in the case of Windows the alerts are reduced by a factor of 2. The result indicates that the filter is valuable at reducing the volume of alerts; however, the effectiveness of the filter is related to the number of alert types in a dataset. Our classification of the root causes shows that the Windows dataset contains a larger number of alert types when compared to Unix.
- •
The framework significantly reduces the effort it takes to identify the root causes of the alerts. Our evaluation shows that, while the number of alerts generated by the system is almost steady over different months, the number of clusters that must be manually handled by the security analysts decreases after a short bootstrap time of the framework. Total 98.8% of the alerts are assigned to the corresponding root cause with no manual intervention.
- •
The framework is extremely valuable to infer the classification of the root causes and the distribution of the alerts across the root causes from a security dataset. The root causes identified by means of the framework in the target SaaS Cloud range from credential management, authentication, application errors, and misconfigurations. The root causes are distributed according to the trends observed by other studies in the context of large-scale computing organizations, such as [3], [13], and reports on major security threats in the Cloud [14].
The paper is organized as follows. Section 2 discusses related work in the area. Section 3 introduces the datasets and their characterization. Section 4 describes the proposed framework. Section 5 discusses the selection of the term weighting scheme and the results of the filtering step. Section 6 presents the evaluation of the framework and the classification of the root causes. Finally, Section 7 concludes the work.
Section snippets
Related work
Our framework addresses textual security alerts. In the following we discuss the novelty of our proposal along three directions, i.e., data sources, existing filtering approaches, nature of the adopted datasets.
Datasets
The datasets consist of total 163,370 alerts generated by a production SaaS Cloud from February 2013 to August 2013. On average, around 800 alerts/day have been raised over the considered timeframe. The Table 1 shows sample alert notifications. The Date and Time field reports the time the alert was generated, whereas Originating Address is the IP address1 of the node that caused the alert. The Message
The proposed framework
The framework consists of different data processing steps, which fill the gap between the unstructured textual information of the alerts and the formalization of the decision tree. Table 2 introduces the terminology used throughout the description of the framework.
Fig. 4 provides an overview of the framework. The term weighting step computes a weight for the alerts collected in a given day: the filtering module retains the alerts whose weight exceeds a filtering threshold. The threshold is
Filtering tuning and results
We compared the daily tf, tf.idf, and log.entropy scores of each node of the system. The scores are computed as described by Section 4.1. The analysis, which is based on our previous measurements [36], indicates that log.entropy is the most suitable term weighting scheme at characterizing the alerts available in this study. As a result, log.entropy has been used to implement the filtering step of the proposed framework. We estimated the number of nodes that generate the alerts retained by the
Framework evaluation
The filtering step of the framework has been trained with three months of data (February–April): the framework is run on a daily basis over the remaining four months of the datasets (May–August). It is worth noting that the results presented in this Section can be generalized to other combinations of training- and test-set sizes because the size of the training set does not significantly impact the filtering results, as shown by Table 3, Table 4.
We present sample investigations that contributed
Conclusion
The analysis of security alerts plays a key role to protect Cloud infrastructures and to support timely response against incidents. This paper investigated the use of three different term weighting schemes to filter security alerts in a SaaS Cloud. We proposed a log.entropy filter to retain relevant information from an average volume of 800 alerts/day. With respect to our datasets, the proposed framework was strongly effective at identifying the root causes of the alerts by means of a decision
Acknowledgments
This work has been partially supported by the TENACE PRIN Project (no. 20103P34XC) funded by the Italian Ministry of Education, University and Research, and by the COSMIC public–private laboratory, projects SVEVIA (PON02 00485 3487758) and MINIMINDS (PON02 00485 3164061), funded by the Italian Ministry of Education, University and Research.
Domenico Cotroneo received the M.Sc. degree in Computer Engineering from University of Naples in 1998, and he received his Ph.D. in 2001 from the Department of Computer Engineering and Systems at the University of Naples, Italy. He is currently an Associate Professor at the University of Naples. His main interests include software fault injection, dependability assessment techniques, and field-based measurements techniques. Dr. Cotroneo is serving/served as Program Committee member in several
References (39)
- et al.
A data mining analysis of RTID alarms
Comput. Netw.
(2000) - et al.
A multi-model approach to the detection of web-based attacks
Comput. Netw.
(2005) - et al.
Reducing false positives in intrusion detection systems
Comput. Secur.
(2010) A virtual honeypot framework
- et al.
A statistical analysis of attack data to separate attacks
- A. Sharma, Z. Kalbarczyk, J. Barlow, R. Iyer, Analysis of security data from a large computing organization, in: The...
- et al.
Identifying compromised users in shared computing infrastructures: A data-driven Bayesian network approach
Clustering intrusion detection alarms to support root cause analysis
ACM Trans. Inf. Syst. Secur.
(2003)- et al.
A framework for the application of association rule mining in large intrusion detection infrastructures
- Software as a service, worldwide, 2010–2015, Gartner, 2010....
Learning more about the underground economy: A case-study of keyloggers and dropzones
Shedding light on log correlation in network forensics analysis
Cryptography and Network Security: Principles and Practice
A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection
IEEE Netw.
Cited by (24)
New approach for threat classification and security risk estimations based on security event management
2020, Future Generation Computer SystemsCitation Excerpt :In [35], an interactive method of data visualization for machine learning is presented, whereas massive log prediction based in predictive models is proposed in [36]. In [37] decision trees are used to classify and infer the root cause of security alerts. Therefore, the importance of predictive models applied to security systems is clear.
Contextual filtering and prioritization of computer application logs for security situational awareness
2020, Future Generation Computer SystemsCitation Excerpt :This approach requires the human intervention to (i) label the training data, and (ii) review the predictions made by the classifier. Analysis frameworks encompassing a filtering step are proposed in [24,25]. The former uses a decision tree to organize the alert types; the tree is traversed to automatically infer the root cause of runtime alerts.
Topics in cloud incident management
2017, Future Generation Computer SystemsCitation Excerpt :Some of these challenges were considered in various works. In [7], Cotroneo et al. offered an approach for automated root cause identification in the case of security alerts in order to “initiate effective responses against attacks and intentional system misuse”. A conceptual clustering approach was used, approach which is based on a filtering mechanism and a decision tree.
Some Recent Advances in Utility and Cloud Computing
2016, Future Generation Computer SystemsBig Data Analytics Adoption Factors in Improving Information Systems Security
2022, Research Anthology on Big Data Analytics, Architectures, and ApplicationsSystematic Literature Review of Security Event Correlation Methods
2022, IEEE Access
Domenico Cotroneo received the M.Sc. degree in Computer Engineering from University of Naples in 1998, and he received his Ph.D. in 2001 from the Department of Computer Engineering and Systems at the University of Naples, Italy. He is currently an Associate Professor at the University of Naples. His main interests include software fault injection, dependability assessment techniques, and field-based measurements techniques. Dr. Cotroneo is serving/served as Program Committee member in several dependability conferences, including DSN, EDCC, ISSRE, SRDS, and LADC.
Andrea Paudice completed his B.S. degree and his M.S. degree cum laude in Computer Engineering at the Federico II University of Naples respectively in 2011 and 2014. Since January 2015 he is a research fellow at the Consorzio Interuniversitario Nazionale per l’Informatica (CINI) in Naples. His research interests include data analysis, applied machine learning, and computer and network security.
Antonio Pecchia, Ph.D., IEEE Member, received cum laude the B.S. and M.S. degree in Computer Engineering from the Federico II University of Naples, Italy in 2005 and 2008, respectively. He received the Ph.D. degree from the University of Naples in 2011. Currently, he is a post-doc researcher at the Department of Electrical Engineering and Information Technologies (DIETI) at the University of Naples. His research interests include dependable computing, log-based failure analysis, security, defect analysis. He serves as reviewer, PC member, and chair in several dependability conferences and workshops. He is currently involved in national and European projects aiming to develop novel techniques for the analysis and validation of critical systems.