Automated root cause identification of security alerts: Evaluation in a SaaS Cloud

https://doi.org/10.1016/j.future.2015.09.009Get rights and content

Highlights

  • A framework to support the analysis of security alerts.

  • Adoption of term weighting and clustering techniques.

  • Root causes classification.

  • Evaluation of real-world security datasets collected in a SaaS Cloud.

Abstract

The analysis of the security alerts collected during the system operations is a crucial task to initiate effective responses against attacks and intentional system misuse. A variety of monitors are today available to generate security alerts, such as intrusion detection systems, network audit, vulnerability scans, and event logs. While the amount of alerts generated by the security monitors represents a goldmine of information, the ever-increasing volume and heterogeneity of the collected alerts pose a major threat to timely security analysis and forensic activities conducted by the operations team.

This paper proposes a framework consisting of a filter and a decision tree to address large volumes of security alerts and to support the automated identification of the root causes of the alerts. The framework adopts both term weighting and conceptual clustering approaches to fill the gap between the unstructured textual alerts and the formalization of the decision tree. We evaluated the framework by analyzing two security datasets in a production SaaS Cloud, which generates an average volume of 800 alerts/day. The framework significantly reduced the volume of alerts and inferred the root causes of around 98.8% of alerts with no human intervention with respect to the datasets available in this study. More important, we leveraged the output of the framework to provide a classification of the root causes of the alerts in the target SaaS Cloud.

Introduction

Critical computing infrastructures are equipped with a variety of independent security monitors, such as intrusion detection systems, network audit, vulnerability scans, and event logs. The analysis of the security alerts generated by the monitors at node- and network-level provides a goldmine of information to detect attacks and to pinpoint potential system misuse  [1], [2], [3], [4]. However, comprehensive system monitoring, which guarantees high coverage at detecting suspicious system activities, causes the generation of large volumes of alerts and false positives   [5], [6], [3].

The analysis of the security alerts is a process that consists of several phases and involves a number of professional roles in a security team, as shown by Fig. 1. The alerts generated by the security monitors are first analyzed by the security admins: when further investigation is needed, a ticket is created in a ticket management system. The ticket management team analyzes the ticket and identifies a response action. The response team implements the action and initiates a corrective task, such as the reboot of a critical system service, a software reconfiguration, the cleanup of the file system of a given node. The alerts management process is strongly human-intensive. As a result, the ever-increasing volume and heterogeneity of the collected alerts prevent timely security analysis and forensic activities conducted by the operations team.

Finance systems, corporate networks, datacenter facilities, and industrial systems are intertwined in many modern human activities. Recent attacks resulting into the sabotage of physical devices, e.g., Stuxnet (2010), and data breaches episodes, which caused credentials and credit card numbers leaks, e.g., Linkedin (2012), Global Payments (2012), TripAdvisor (2014), emphasize the strong technical advances achieved by the attackers’ community over the past decade and pose new societal challenges. More important, the popularity gained by the Software as a Service (SaaS) paradigm to deliver business critical applications  [7] has made the Cloud an increasingly sensitive security target. In the near future, the massive user bases, the number of applications, and companies handled by Cloud services will represent a sensitive source of confidential data to feed the underground economy trading sensitive information over the Internet  [8].

This paper proposes a framework consisting of a filter and a decision tree to address large volumes of security alerts and to support the automated identification of the root causes of the alerts.

The proposed framework leverages a key observation of any security dataset: most of the alerts are generated by the regular system operations rather than by actual incidents  [5], [6], [9], [10], [11]. As a result, we use a term weighting approach to filter the alerts. Term weighting computes the relevance of a term, i.e., a sequence of characters separated by whitespace(s), across a given dataset  [12]: the smaller the relevance, the higher the chance the term is generated by the regular system operation. We investigate three term weighting schemes, i.e., term frequency (tf), term frequency-inverse document frequency (tf.idf), and logarithmic entropy (log.entropy) to discard not relevant alerts.

The framework encompasses a decision tree to support the identification of the root causes of the alerts retained by the filter. A root cause is a human-understandable description of the events that caused the triggering of an alert; moreover, the root cause is contextualized with a list of actions needed to deal with the problem, the response, and the list and count of the alerts pertained with that root cause. The identification of the root causes is accomplished through a conceptual clustering approach  [5], which identifies the general structure of the alert, i.e., the cluster the alert belongs. Each cluster is explained by one root cause. A cluster is manually added to the decision tree by the security analyst at the first occurrence of a given alert: future occurrences of the same alert are automatically assigned to its cluster, and the root cause is established by traversing the tree.

We evaluate the framework by analyzing two security datasets from a production SaaS Cloud, which generates an average volume of 800 alerts/day. The datasets are named Unix and Windows through the rest of the paper: the former consists of 13,200 alerts generated by Unix nodes, while the latter contains 150,170 alerts of Windows nodes in the SaaS Cloud. Each dataset spans a timeframe of seven months. The key findings of our data-driven measurement study are summarized in the following:

  • The filtering step of the framework reduces the volume of the Unix alerts by 4 times; in the case of Windows the alerts are reduced by a factor of 2. The result indicates that the filter is valuable at reducing the volume of alerts; however, the effectiveness of the filter is related to the number of alert types in a dataset. Our classification of the root causes shows that the Windows dataset contains a larger number of alert types when compared to Unix.

  • The framework significantly reduces the effort it takes to identify the root causes of the alerts. Our evaluation shows that, while the number of alerts generated by the system is almost steady over different months, the number of clusters that must be manually handled by the security analysts decreases after a short bootstrap time of the framework. Total 98.8% of the alerts are assigned to the corresponding root cause with no manual intervention.

  • The framework is extremely valuable to infer the classification of the root causes and the distribution of the alerts across the root causes from a security dataset. The root causes identified by means of the framework in the target SaaS Cloud range from credential management, authentication, application errors, and misconfigurations. The root causes are distributed according to the trends observed by other studies in the context of large-scale computing organizations, such as  [3], [13], and reports on major security threats in the Cloud  [14].

The paper is organized as follows. Section  2 discusses related work in the area. Section  3 introduces the datasets and their characterization. Section  4 describes the proposed framework. Section  5 discusses the selection of the term weighting scheme and the results of the filtering step. Section  6 presents the evaluation of the framework and the classification of the root causes. Finally, Section  7 concludes the work.

Section snippets

Related work

Our framework addresses textual security alerts. In the following we discuss the novelty of our proposal along three directions, i.e., data sources, existing filtering approaches, nature of the adopted datasets.

Datasets

The datasets consist of total 163,370 alerts generated by a production SaaS Cloud from February 2013 to August 2013. On average, around 800 alerts/day have been raised over the considered timeframe. The Table 1 shows sample alert notifications. The Date and Time field reports the time the alert was generated, whereas Originating Address is the IP address1 of the node that caused the alert. The Message

The proposed framework

The framework consists of different data processing steps, which fill the gap between the unstructured textual information of the alerts and the formalization of the decision tree. Table 2 introduces the terminology used throughout the description of the framework.

Fig. 4 provides an overview of the framework. The term weighting step computes a weight for the alerts collected in a given day: the filtering module retains the alerts whose weight exceeds a filtering threshold. The threshold is

Filtering tuning and results

We compared the daily tf, tf.idf, and log.entropy scores of each node of the system. The scores are computed as described by Section  4.1. The analysis, which is based on our previous measurements  [36], indicates that log.entropy is the most suitable term weighting scheme at characterizing the alerts available in this study. As a result, log.entropy has been used to implement the filtering step of the proposed framework. We estimated the number of nodes that generate the alerts retained by the

Framework evaluation

The filtering step of the framework has been trained with three months of data (February–April): the framework is run on a daily basis over the remaining four months of the datasets (May–August). It is worth noting that the results presented in this Section can be generalized to other combinations of training- and test-set sizes because the size of the training set does not significantly impact the filtering results, as shown by Table 3, Table 4.

We present sample investigations that contributed

Conclusion

The analysis of security alerts plays a key role to protect Cloud infrastructures and to support timely response against incidents. This paper investigated the use of three different term weighting schemes to filter security alerts in a SaaS Cloud. We proposed a log.entropy filter to retain relevant information from an average volume of 800 alerts/day. With respect to our datasets, the proposed framework was strongly effective at identifying the root causes of the alerts by means of a decision

Acknowledgments

This work has been partially supported by the TENACE PRIN Project (no. 20103P34XC) funded by the Italian Ministry of Education, University and Research, and by the COSMIC public–private laboratory, projects SVEVIA (PON02 00485 3487758) and MINIMINDS (PON02 00485 3164061), funded by the Italian Ministry of Education, University and Research.

Domenico Cotroneo received the M.Sc. degree in Computer Engineering from University of Naples in 1998, and he received his Ph.D. in 2001 from the Department of Computer Engineering and Systems at the University of Naples, Italy. He is currently an Associate Professor at the University of Naples. His main interests include software fault injection, dependability assessment techniques, and field-based measurements techniques. Dr. Cotroneo is serving/served as Program Committee member in several

References (39)

  • S. Manganaris et al.

    A data mining analysis of RTID alarms

    Comput. Netw.

    (2000)
  • C. Kruegel et al.

    A multi-model approach to the detection of web-based attacks

    Comput. Netw.

    (2005)
  • G.P. Spathoulas et al.

    Reducing false positives in intrusion detection systems

    Comput. Secur.

    (2010)
  • N. Provos

    A virtual honeypot framework

  • M. Cukier et al.

    A statistical analysis of attack data to separate attacks

  • A. Sharma, Z. Kalbarczyk, J. Barlow, R. Iyer, Analysis of security data from a large computing organization, in: The...
  • A. Pecchia et al.

    Identifying compromised users in shared computing infrastructures: A data-driven Bayesian network approach

  • K. Julisch

    Clustering intrusion detection alarms to support root cause analysis

    ACM Trans. Inf. Syst. Secur.

    (2003)
  • J.J. Treinen et al.

    A framework for the application of association rule mining in large intrusion detection infrastructures

  • Software as a service, worldwide, 2010–2015, Gartner, 2010....
  • T. Holz et al.

    Learning more about the underground economy: A case-study of keyloggers and dropzones

  • E. Bloedorn, B. Hill, A. Christiansen, C. Skorupka, L. Talboot, J. Tivel, Data mining for improving intrusion...
  • C. Clifton, G. Gengo, Developing custom intrusion detection filters using data mining, in: MILCOM 2000. 21st Century...
  • M.W. Berry et al.
  • E. Raftopoulos et al.

    Shedding light on log correlation in network forensics analysis

  • W. Stallings

    Cryptography and Network Security: Principles and Practice

    (2002)
  • J. Hu et al.

    A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection

    IEEE Netw.

    (2009)
  • S.S. Murtaza, W. Khreich, A. Hamou-Lhadj, M. Couture, A host-based anomaly detection approach by representing system...
  • L. Portnoy, E. Eskin, S. Stolfo, Intrusion detection with unlabeled data using clustering, in: Proceedings of ACM CSS...
  • Cited by (24)

    • New approach for threat classification and security risk estimations based on security event management

      2020, Future Generation Computer Systems
      Citation Excerpt :

      In [35], an interactive method of data visualization for machine learning is presented, whereas massive log prediction based in predictive models is proposed in [36]. In [37] decision trees are used to classify and infer the root cause of security alerts. Therefore, the importance of predictive models applied to security systems is clear.

    • Contextual filtering and prioritization of computer application logs for security situational awareness

      2020, Future Generation Computer Systems
      Citation Excerpt :

      This approach requires the human intervention to (i) label the training data, and (ii) review the predictions made by the classifier. Analysis frameworks encompassing a filtering step are proposed in [24,25]. The former uses a decision tree to organize the alert types; the tree is traversed to automatically infer the root cause of runtime alerts.

    • Topics in cloud incident management

      2017, Future Generation Computer Systems
      Citation Excerpt :

      Some of these challenges were considered in various works. In [7], Cotroneo et al. offered an approach for automated root cause identification in the case of security alerts in order to “initiate effective responses against attacks and intentional system misuse”. A conceptual clustering approach was used, approach which is based on a filtering mechanism and a decision tree.

    • Some Recent Advances in Utility and Cloud Computing

      2016, Future Generation Computer Systems
    • Big Data Analytics Adoption Factors in Improving Information Systems Security

      2022, Research Anthology on Big Data Analytics, Architectures, and Applications
    View all citing articles on Scopus

    Domenico Cotroneo received the M.Sc. degree in Computer Engineering from University of Naples in 1998, and he received his Ph.D. in 2001 from the Department of Computer Engineering and Systems at the University of Naples, Italy. He is currently an Associate Professor at the University of Naples. His main interests include software fault injection, dependability assessment techniques, and field-based measurements techniques. Dr. Cotroneo is serving/served as Program Committee member in several dependability conferences, including DSN, EDCC, ISSRE, SRDS, and LADC.

    Andrea Paudice completed his B.S. degree and his M.S. degree cum laude in Computer Engineering at the Federico II University of Naples respectively in 2011 and 2014. Since January 2015 he is a research fellow at the Consorzio Interuniversitario Nazionale per l’Informatica (CINI) in Naples. His research interests include data analysis, applied machine learning, and computer and network security.

    Antonio Pecchia, Ph.D., IEEE Member, received cum laude the B.S. and M.S. degree in Computer Engineering from the Federico II University of Naples, Italy in 2005 and 2008, respectively. He received the Ph.D. degree from the University of Naples in 2011. Currently, he is a post-doc researcher at the Department of Electrical Engineering and Information Technologies (DIETI) at the University of Naples. His research interests include dependable computing, log-based failure analysis, security, defect analysis. He serves as reviewer, PC member, and chair in several dependability conferences and workshops. He is currently involved in national and European projects aiming to develop novel techniques for the analysis and validation of critical systems.

    View full text